Aleksander Molak's Avatar

Aleksander Molak

@alxndrmlk

"The Causal Guy" http://causalpython.io Author || Advisor || Educator Host at http://CausalBanditsPodcast.com Causal ML Tutor @ Uni of Oxford CausalSky: https://bsky.app/profile/did:plc:imz3rf35poonl7yxt7bogui4/feed/aaamrclcu3tfa

1,326
Followers
282
Following
1,513
Posts
12.10.2023
Joined
Posts Following

Latest posts by Aleksander Molak @alxndrmlk

Today, it's the last chance to get the course with 10% off:

Use the code LAUNCHWEEK: courses.decisionacademy.io/courses/intr...

Expires today

6/6

09.03.2026 09:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In the "Intro to Biostatistics" course we released last week with Justin BΓ©lair, we share our perspective that data visualization already involves modeling decisions that can impact how we and our audience perceive the problems we address and the solutions we craft.

5/

09.03.2026 09:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

If we're primarily interested in the central part of the distribution, and only care about approximate values, perhaps not.

But the "tail behavior" of this distribution will be dramatically different from a normal.

4/

09.03.2026 09:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

It's neat and conveys a clear message about the distribution: it's rather symmetrical and fits nicely with the overlaid theoretical normal PDF.

But when we increase the number of bins (the bottom figure), we see clearly that the distribution is not symmetric.

Does it matter?

It depends.

3/

09.03.2026 09:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

What you see in the figure below are two histograms representing one dataset.

The only difference?

The number of bins.

I designed the top histogram to resemble some of the histograms I encountered while reading scientific publications.

2/

09.03.2026 09:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

"Just Plot the Data"

Which of the two datasets is normally distributed?

Plotting the data might seem like the most assumption-free way to examine its properties.

But is that really the case?

1/

#StatSky #Biostatistics #EconSky #EpiSky

09.03.2026 09:36 πŸ‘ 1 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
Preview
Causal Python || Your go-to resource for learning about Causality in Python A page where you can learn about causal inference in Python, causal discovery in Python and causal structure learning in Python. How to causal inference in Python?

- New Interrupted Time Series module in CausalPy + a tutorial

--------

We'll start sending today's issue at 9am PT / 12pm ET / 6pm CET (Sunday)

Join us at: causalpython.io (it's free!)

2/2

08.03.2026 12:22 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

What a Week!

We literally had too many topics to fit into one newsletter this week.

Here's what we picked:

- Alberto D. Horner reviews the brand new book by Quentin Gallea, PhD

- David Rohde on why policies are stochastic in reinforcement learning

1/

#CausalSky #StatSky #EconSky #EpiSky #MLSky

08.03.2026 12:22 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

The correct link is here: courses.decisionacademy.io/courses/intr...

:)

6/5

06.03.2026 16:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Apparently, Popper liked the idea, although he did not fully understand it, because -- paraphrasing his own words -- he was "not very good at statistics"

We're celebrating the Launch Week with 10% off for our Bsky friends!

Join us here: courses.decisionacademy.io/courses/intr...

5/5

06.03.2026 09:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

That's why we teach about it in our new "Intro to Biostatistics" course with Justin BΓ©lair

PS: I heard many opinions that Popper either did not know about or knew about but did not like the idea of hypothesis testing. I learned from Deborah Mayo that this is not true.

4/

06.03.2026 09:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our measurements are often imperfect (or "noisy")

So are our samples - they typically don't describe the population they come from perfectly.

Without understanding the idea of falsificationism, it's very difficult to make sense of many of the modern statistical frameworks.

3/

06.03.2026 09:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

...one black swan disproves this claim.

In science, we're often in a more challenging situation than this.

Why?

Because we're not operating in the space of pure logical statements, but rather probabilistic ones.

We need probability to quantify the uncertainty:

2/

06.03.2026 09:39 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Modern Science Relies on an Idea That Disproving Is Easier Than Proving.

The idea was proposed by Karl Popper nearly a century ago.

We call it "falsificationism"

A million white swans don't prove all swans are white.

Neither does a trillion.

But...

1/

#StatSky #EpiSky #Biostatistics #EconSky

06.03.2026 09:39 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

To celebrate Launch Week, we have a 10% discount for anyone reading this post (and for your friends and family as well -- share it with them, it's on us) with the code LAUNCHWEEK

Join us here: courses.decisionacademy.io/courses/intr...

It expires on Monday, March 9.

5/5

05.03.2026 08:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

So we built it.

This new version is a self-paced version of our best-selling live cohort course.

And all coding is done in R, because...

No, just kidding all coding is done in R *and* Python, because you should pick which language you want to use, not us.

4/

05.03.2026 08:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- Second, we understand the importance of the causal perspective. We believe it should be discussed explicitly from Day 1 in any course on statistics

We haven't found a course that would do both of these things.

3/

05.03.2026 08:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- First, we believe that statistics shouldn't be taught as a set of unrelated procedures and simplified decision rules, but rather as a way of thinking

2/

05.03.2026 08:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Last Friday, We Launched "Intro to Biostatistics" with Justin BΓ©lair

Two ideas inspired us to build it:

1/

#StatSky #Biostatistics #EpiSky

05.03.2026 08:41 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
causality/38 - Optimal A-B Test Split.ipynb at main Β· AlxndrMlk/causality Notes, exercises and other materials related to causal inference, causal discovery and causal ML. - AlxndrMlk/causality

So if p_control = 0.10 and p_treatment = 0.12, the SDs are 0.300 vs 0.325 β€” the Neyman-optimal split would be ~52:48.

Not that different from 50:50.

Have you ever considered that 50:50 split might not be optimal for your setting?

7/7

Notebook: github.com/AlxndrMlk/ca...

02.03.2026 09:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Heuristically, if we're only interested in binary conversions and the expected uplift is small, sticking to a 50:50 split can be a reasonable default.

Why?

Variance of a Bernoulli = p(1-p), which is a smooth function that changes slowly across realistic conversion rates.

6/

02.03.2026 09:37 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

If we observe variance mismatch, we can use the Neyman allocation rule to find the approximate optimal split:

SD_t / (SD_t + SD_c)

As you can see in the plot below, it does a pretty good job of approximating the optimum.

5/

02.03.2026 09:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The intuition behind this is simple: the higher the variance, the more observations we need to confidently describe the distribution.

In practice, before we decide on the exact split, it might be good to run a pilot study to get variance estimates for both treatment and control conditions.

4/

02.03.2026 09:36 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

This pattern results in variance inflation in the treatment grp compared to control

The 50:50 split becomes suboptimal

What does this mean in practice?

Statistical power could be increased by allocating more than 50% of units to the higher variance group (in our example - the treatment group)

3/

02.03.2026 09:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

That the outcome var are equal between the treatment and control groups

Imagine you're testing a new AI assistant in your online store

You're measuring revenue per visitor

Some people love it and their order value goes up significantly

Others don't like it and spend much less than previously

2/

02.03.2026 09:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The Optimal Split for an A/B Test Is 50:50

Unless...

I recently saw a post explaining why the optimal split between treatment and control groups should be 50:50.

The optimal split is indeed 50:50, but only under one assumption:

1/

#CausalSky #StatSky #ABTest #EpiSky #MLSky #EconSky

02.03.2026 09:33 πŸ‘ 7 πŸ” 2 πŸ’¬ 2 πŸ“Œ 1
Preview
Causal Python || Your go-to resource for learning about Causality in Python A page where you can learn about causal inference in Python, causal discovery in Python and causal structure learning in Python. How to causal inference in Python?

We'll start sending today's issue at 9am PT / 12pm ET / 6pm CET (Sunday)
Subscribe here (it's free): causalpython.io

3/3

01.03.2026 10:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

...a new paper by Matteo Ceriscioli and Karthika Mohan

- 5 upcoming causal events you don't want to miss (online & in-person)

- Justin BΓ©lair's causally-aware "Intro to Biostatistics" is now live on Decision Academy

2/

01.03.2026 10:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

You Train Your Robot in August, It Trashes Your Garden in September.

In today's issue of causal Python Weekly:

- Causal POMDPs (Partially Observed Markov Decision Processes): Planning when the world changes - a review of...

1/

#CausalSky #MLSky #AISky #EconSky #StatSky

01.03.2026 10:17 πŸ‘ 5 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
Decision Academy Decision Academy is an upcoming online course platform focused on data-based decision-making. Sign up for our upcoming courses and free statistical challenges!

Day 3: From Struggle to Structure

- Debrief: common pitfalls and blind spots
- A step-by-step reasoning framework
- Where statistical tests actually enter the process

Register here: decisionacademy.io#the-challenge

We start today at 9am PT / 12pm ET / 6pm CET

5/5

25.02.2026 10:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0