Carlos E Lourenco (Caê) (@caerib)

Claude skills for journalism | Joe Amditis A curated collection of Claude Code skills for journalists, researchers, academics, and media professionals.

Correction! @jamditis.bsky.social 's skills collection is at skills.amditis.tech (Not presented at #NICAR26 as far as I know but a worthwhile resource for journalists interested in skills!)

07.03.2026 17:25 👍 3 🔁 1 💬 2 📌 0

GitHub - amkessler/nicar2026_skills_in_codex_claude: Materials for NICAR 2026 session on using "skills" to aid in more reliable, reproducible analysis tasks when using Codex and Claude Code Materials for NICAR 2026 session on using "skills" to aid in more reliable, reproducible analysis tasks when using Codex and Claude Code - amkessler/nicar2026_skills_in_codex_claude

Interested in using AI skills for Claude/Codex in journalism? Check out @akesslerdc.bsky.social 's #NICAR26 session repo: github.com/amkessler/ni...

Another great resource: @jamditis.bsky.social 's skills collection for journalists/researchers/academics github.com/amkessler/ni...

07.03.2026 17:20 👍 9 🔁 4 💬 1 📌 0

GitHub - cribbie/negligible: This package contains many functions for conducting negligible effect statistical testing (also called equivalence testing). This package contains many functions for conducting negligible effect statistical testing (also called equivalence testing). - cribbie/negligible

This #rstats #package #negligible examine negligible effect / #equivalent testing in #SEM model after #lavaan, a few of the functions include
1) #neg.semfit (CFI, RMSEA, SRMR)
2) #neg.normal

github.com/cribbie/negl...

01.03.2026 02:46 👍 9 🔁 6 💬 0 📌 0

Lots to say on this, but one thing recently came to my mind is the isomorphism between “wide format” & “long format” for panel datasets. The latter treats time as a variable, which confuses the issue, at least conceptually. DiD with wide format is clear and here: journals.sagepub.com/doi/10.1177/...

25.02.2026 00:55 👍 5 🔁 5 💬 1 📌 0

How can different modes of survey data collection introduce bias? | LSHTM Survey data are self-reported data collected directly from respondents by a questionnaire or an interview, and are commonly used in health research. Such data are traditionally collected via a single

And if you want to hear more about this, you can attend my seminar at @lshtm-dash.bsky.social on 26 February - both IN PERSON and ONLINE!

29.01.2026 10:52 👍 11 🔁 4 💬 1 📌 0

A screenshot of the mode effects database with example survey item of "Did not always wear a seat belt (%)"

What to do if you don't know what the size of the mode effect is likely to be?

We've got you covered!

We have a database of mode effect estimates that can be used to inform this decision: cls-data.github.io/mode-effects...

👉🏼 Stay tuned for a step-by-step tutorial on how to apply QBA.

28.01.2026 12:12 👍 9 🔁 3 💬 2 📌 1

What to do instead?

With some information on the likely size of the mode effect, we can do quantitative bias analysis:

• calibrate ('correct') measures in one of the modes, as if obtained by the other

or

• estimate the likely impact mode effects would have on our substantive conclusions.

28.01.2026 12:12 👍 3 🔁 1 💬 1 📌 0

DAGs depicting an example of collider bias introduced by conditioning on mode due to it being a common consequence of the latent exposure and another unobserved variable.

As is often the case, collider bias can occur in multiple ways. For example, via an unobserved common cause of the outcome and mode.

We can therefore only use conditioning where we can plausibly assume no mode selection exists, or where we can condition on all such common causes.

28.01.2026 12:12 👍 3 🔁 1 💬 1 📌 0

DAGs depicting an example of confounding introduced by mode, due to the presence of mode effects on the measures of both the exposure and outcome.

The exact consequences will depend on the scenario. Generally:

Mode effects on the outcome -> uncertainty in the estimate
Mode effects on the exposure -> regression dilution bias
Mode effects on both -> confounding

In all of these, however, simply conditioning on mode can resolve the problem.

28.01.2026 12:12 👍 2 🔁 1 💬 1 📌 0

A measured version of a variable (X*), which is caused by its latent version (X) and Mode

Variables can be measured differently based on the mode used. These "mode effects" can be conceptualised as a type of systematic measurement error.

For example, sensitive information may be under-reported when provided to a human interviewer, compared to in a self-completion questionnaire.

28.01.2026 12:12 👍 3 🔁 2 💬 1 📌 0

The way data are collected (e.g. in person, online, telephone) is referred to as a "mode". More and more surveys are transitioning to mixed-mode designs.

This creates two interesting challenges: "mode effects" and "mode selection". As usual, DAGs are very helpful for understanding such phenomena.

28.01.2026 12:12 👍 4 🔁 1 💬 1 📌 0

How can the use of different modes of survey data collection introduce bias? An introduction to mode effects using directed acyclic graphs (DAGs) Abstract. Survey data are self-reported data collected directly from respondents by a questionnaire or an interview and are commonly used in epidemiology.

Users of survey data, lovers of DAGs, and general methodological enthusiasts, gather round!

I'm so excited to share this new paper, joint work with my brilliant colleagues @rjsilverwood.bsky.social, @pwgtennant.bsky.social, and Liam Wright.

🧵

28.01.2026 12:12 👍 73 🔁 30 💬 4 📌 7

Somewhat fittingly, here's a super recent paper discussing survey mode effects with causal graphs: bsky.app/profile/geor...
>

29.01.2026 14:57 👍 9 🔁 3 💬 1 📌 0

Two scenarios discussed with causal graphs: Survey mode causally affects the gender gap in life satisfaction and Survey mode is confounded with the gender gap in life satisfaction

This could either be a gender-specific survey mode effect, or just a reflection of selection effects, or a mix of the two. What we consider more likely determines how we should analyze the data though.>

29.01.2026 14:57 👍 12 🔁 2 💬 1 📌 0

Mean life satisfaction in the survey years 2010, 2015 and 2023, separately for female and male gender. In general, girls have lower scores but in 2023, they are drifting even further down

The city of Leipzig in Germany conducts large-scale school surveys of adolescents in secondary education schools. Following the regular surveys in 2010 and 2015, the 2020 survey had to be rescheduled to 2023 due to the COVID-19 pandemic. In this latest survey wave, the gender gap in general life satisfaction has significantly grown. While in 2010 and 2015 girls were somewhat less satisfied than boys (0.26 to 0.33 SD), in 2023 this gender gap had doubled (with girls 0.57 SD less satisfied). Why? Here, we probe various explanations, aiming to provide a template for researchers who are asking reverse causal questions (“What caused this?”). First, we find that the widening of the gender gap is much more pronounced among students with a migration background. This could plausibly be explained by a shift in the composition of the underlying population, with a strong increase of Syrian students, and a relative decrease of Vietnamese ones. Second, among students without a migration background, part of the increasing gender gap could potentially be attributed to survey mode: In 2023, for the first time, the survey was conducted on tablets—and unexpectedly, girls (but not boys) reported significantly lower satisfaction when surveyed on tablet rather than on paper. Third, beyond these two patterns, we still find significantly widening gender gaps in satisfaction with leisure time activities and relationships to friends. Thus, there may be a substantive increase in the gender gap in satisfaction in those two domains that is not readily attributable to changes in population and survey mode.

New preprint 🥳
The city of Leipzig conducts large-scale surveys of adolescents. In 2023, the gender gap in life satisfaction has significantly widened, with girls declining more steeply than boys. What's up with that?>
(work with @rmcelreath.bsky.social and @gregork.bsky.social)

29.01.2026 14:57 👍 136 🔁 37 💬 7 📌 5

Why a Hospital Is the Most Dangerous Place on Earth Statistically, you are safer as a soldier fighting in a war zone than you are in a modern American hospital. During the deadliest year of the Iraq war, in the midst of the “Surge” in 2007,...

You mean like this 🙈? link.springer.com/chapter/10.1...

28.02.2026 19:40 👍 9 🔁 1 💬 3 📌 0

There's possible reverse causality, there's potential reverse causality, and then there's the fear that young people living with their parents will hurt their job prospects.

28.02.2026 18:06 👍 89 🔁 26 💬 3 📌 1

Using AI agents (like Claude code) for research and analysis. - YouTube In this series, I will show how I use AI agents like Claude Code, Gemini CLI and OpenAI Codex for an end to end pipeline from analyzing data, building fronte...

Very nice agentic uses in academia! By @gvrkiran.bsky.social

m.youtube.com/playlist?lis...

01.03.2026 00:29 👍 0 🔁 0 💬 0 📌 0

$DAG representing the causal structure of a standard difference-in-differences design with two locations and two time periods—units in one location in the post-period receive treatment. $L$ = group or location indicator (treated vs. untreated location); $T$ = time indicator (pre vs. post period); $U$ = unobserved time-invariant confounders (e.g., GDP per capita, general health status, public health infrastructure). $X \leftarrow T \rightarrow Y$ represents a common time trend affecting both locations equally. The causal effect of $X$ on $Y$ is identified by conditioning on $\{L, T\}$, which corresponds to using location and time indicator variables in a regression like `y ~ location * period`.$

DAG representing the causal structure of a standard difference-in-differences design with two locations and two time periods—units in one location in the post-period receive treatment. $L$ = group or location indicator (treated vs. untreated location); $T$ = time indicator (pre vs. post period); $U$ = unobserved time-invariant confounders (e.g., GDP per capita, general health status, public health infrastructure). $X \leftarrow T \rightarrow Y$ represents a common time trend affecting both locations equally. The causal effect of $X$ on $Y$ is identified by conditioning on $\{L, T\}$, which corresponds to using location and time indicator variables in a regression like `y ~ location * period`.

$DAG representing the causal structure of a standard difference-in-differences design, but with explicit pre- and post-treatment outcomes. $L$ = group or location indicator (treated vs. untreated location); $T_\text{post}$ = post-period measurement (indicator that the observation occurs after the intervention); $X_\text{post}$ = treatment (which only occurs for treated locations in the post period); $Y_\text{pre}$ and $Y_\text{post}$ = outcome measured before and after the intervention. $U$ = unobserved time-invariant confounders (e.g., GDP per capita, general health status, public health infrastructure). $Y_\text{pre} \rightarrow Y_\text{post}$ represents outcome persistence (e.g. autocorrelation or slow-moving changes); $X_\text{post} \leftarrow T_\text{post} \rightarrow Y_\text{post}$ represents a common time trend affecting both locations equally. The causal effect of $X_\text{post}$ on $Y_\text{post}$ is identified by conditioning on $\{L, T_\text{post}\}$, which corresponds to using location and time indicator variables in a regression like `y ~ location * period`.$

DAG representing the causal structure of a standard difference-in-differences design, but with explicit pre- and post-treatment outcomes. $L$ = group or location indicator (treated vs. untreated location); $T_\text{post}$ = post-period measurement (indicator that the observation occurs after the intervention); $X_\text{post}$ = treatment (which only occurs for treated locations in the post period); $Y_\text{pre}$ and $Y_\text{post}$ = outcome measured before and after the intervention. $U$ = unobserved time-invariant confounders (e.g., GDP per capita, general health status, public health infrastructure). $Y_\text{pre} \rightarrow Y_\text{post}$ represents outcome persistence (e.g. autocorrelation or slow-moving changes); $X_\text{post} \leftarrow T_\text{post} \rightarrow Y_\text{post}$ represents a common time trend affecting both locations equally. The causal effect of $X_\text{post}$ on $Y_\text{post}$ is identified by conditioning on $\{L, T_\text{post}\}$, which corresponds to using location and time indicator variables in a regression like `y ~ location * period`.

spending my sunday evening once again attempting to draw a DAG for diff-in-diff

23.02.2026 04:08 👍 85 🔁 12 💬 10 📌 4

This thread is interesting, but just wanna propose that academic philosophers are also thoroughly in the "think its normal to reason with/about counter-factuals, and especially think they are vital for causal inference" crowd. Easily doubling the number of voters who appreciate that to the low 100s.

17.02.2026 08:42 👍 24 🔁 2 💬 3 📌 0

In a way, conterfactual thinking *is* arcane. Most people - including policy-makers and many who took a causal inference course in grad school and should know better - just work backwards from the conclusion they want to reach.

17.02.2026 08:32 👍 6 🔁 1 💬 1 📌 0

What the BAT chief scientist had to say about causality half a century ago E-cigarette advocates, global warming deniers and social media companies continue to sidestep the implications of research showing dangers of their products by, among other things, retreating behind the high walls of claims that the evidence is not "causal." To put these claims in context, it is worth re-reading the last couple paragraphs from our book The Cigarette Papers (University of California Press, 1996, pages 441-442) on the original set of Brown and Williamson documents that were sent to me from "Mr.

What the BAT chief scientist had to say about causality half a century ago

E-cigarette advocates, global warming deniers and social media companies continue to sidestep the implications of research showing dangers of their products by, among other things, retreating behind the high walls of claims…

17.02.2026 17:00 👍 3 🔁 1 💬 1 📌 0

Causal AI in Clinical Trials: Vin Singh of BullFrog AI How does causal AI improve clinical trial outcomes by turning messy real-world datasets into actionable biomarker insights?

Causal AI in Clinical Trials: Vin Singh of BullFrog AI
open.substack.com/pub/afurther...

19.02.2026 07:49 👍 2 🔁 1 💬 0 📌 0

A big difference between the 21st century evolution of two social sciences, economics and psychology is that economics got very serious about causal inference, and psychology... didn't.
Partly because economists are so focused on observational data, partly because they're better at math.

12.02.2026 20:13 👍 6 🔁 1 💬 1 📌 0

On Causality A History of How Economics Learned to Think About Cause and Effect

On Causality
A History of How Economics Learned to Think About Cause and Effect carloschavezp29.substack.com/p/on-causali...

12.02.2026 22:41 👍 1 🔁 1 💬 0 📌 0

One approach to the age-period-cohort problem: Just don’t. Just to cause yourself more problems, you seek for something. But there is no need for you to seek anything. You have plenty, and you have just enough problems. Shunryū Suzuki in a 1971 talk A ...

New blog post about the age-period-cohort identification problem!

In which, for the first time ever, I ask "What's the mechanism?" and also suggest that sometimes you may actually *not* be interested in causal inference.

www.the100.ci/2026/02/13/o...

13.02.2026 14:33 👍 160 🔁 42 💬 21 📌 8

#statstab #485 Bayesian ANCOVA and the ATE

Thoughts: Still grappling with the implications of using the causal inference approach to randomized experiments. But it's interesting.

#ATE #causalinference #ancova #ANOVA #rstats #estimand #counterfactuals

solomonkurz.netlify.app/blog/2025-07...

13.02.2026 16:58 👍 14 🔁 4 💬 2 📌 1

A while back, I wrote a thing. If you like experiments and causal inference, you should read it:

13.02.2026 17:38 👍 9 🔁 3 💬 0 📌 0

$On this page What’s the difference between statistical significance and substantial significance? Can we measure substantial significance with statistics? What are all the different ways we can look at model coefficients? Print the object name Use summary() Use tidy() from the {broom} package Use model_parameters() and model_details() from the {parameters} and {performance} packages Make nice polished side-by-side regression tables with {modelsummary} Make automatic coefficient plots with modelplot() from {modelsummary} Plot model predictions and marginal effects Automatic interpretation with {report}$

On this page What’s the difference between statistical significance and substantial significance? Can we measure substantial significance with statistics? What are all the different ways we can look at model coefficients? Print the object name Use summary() Use tidy() from the {broom} package Use model_parameters() and model_details() from the {parameters} and {performance} packages Make nice polished side-by-side regression tables with {modelsummary} Make automatic coefficient plots with modelplot() from {modelsummary} Plot model predictions and marginal effects Automatic interpretation with {report}

Posted a helpful little set of FAQs about regression for my causal inference class, including illustrations of statistical vs. substantive signficance and all the different things you can do with #rstats model objects

evalsp26.classes.andrewheiss.com/news/2026-02...

03.02.2026 19:49 👍 68 🔁 10 💬 3 📌 1

Happy to share our recent article on causal inference in science studies. It aims to introduce causal thinking to the science of science community with an example from Open Science.

27.01.2026 10:36 👍 8 🔁 2 💬 1 📌 0

Carlos E Lourenco (Caê)

Latest posts by Carlos E Lourenco (Caê) @caerib