Check out my talk at the Online Causal Inference Seminar last week on a practical way to deal with positivity violations using bounds 👇 #causalsky
Check out my talk at the Online Causal Inference Seminar last week on a practical way to deal with positivity violations using bounds 👇 #causalsky
New joint work published with @adrianraftery.bsky.social on methods for Bayesian probabilistic projections of migration
They also have a very neat way of deriving the efficient influence function for their infinite-dimensional parameter of interest based on Luedtke's autodiff work
Figure S1: Illustration of the basic notions of semiparametric theory
The "basic" notions of semiparametric theory, from today's arxiv.org/abs/2510.18843 from Morzywolek, Gilbert, & Luedtke
great great plenty of time to procrastinate on this
Ideally letters wouldn't be required at all, but I'd settle for them only being required at a much later stage of the process after the first stage of review
trying to find a way to compare against previous years, unfortunately the archive.org snapshots of the job board are spotty
State of the stats job market:
here's the cumulative number of stats tenure-track jobs posted on the UF Statistics Job Board so far, since August
#statsky
my interest in putting bounds on things now
Tricks you can use Identification fails: try finding bounds that hold under weaker assumptions. Non-smooth parameters: try defining a smooth approximation. Uniform inference: try a multiplier bootstrap. Having clever collaborators helps a lot!
some of the tricks we found useful -- the last bullet especially, I learned a lot from working closely with @alecmcclean.bsky.social on this
what's neat about our approach is that you can vary the propensity score threshold that defines the overlap and non-overlap population, and then choose the threshold that yields the smallest bounds -- with frequentist guarantees
Proposition 1 (non-overlap bounds)
The idea is very simple: we divide the population into a part in which overlap is satisfied, and a part in which overlap is violated. The non-overlap part is the one that poses problems, so we just apply worst-case bounds on the ATE in that subpopulation.
Non-overlap Average Treatment Effect Bounds by Herbert P. Susmann, Alec McClean, and Iván Díaz
New preprint out on a way to handle structural and practical violations of the overlap (also known as positivity) assumption in causal inference -- as long as the outcome is bounded, we derive simple partial identification bounds on the ATE. With @alecmcclean.bsky.social and @idiaz.bsky.social
a related tip i've heard for talks is to use author + year + journal abbreviation for references on the slides (e.g. Robins 1995 JASA), makes it easier for people to find what you're talking about
The paper includes a friendly (I hope) introduction to causal inference and TMLE, and has sample R code you can use to run this type of analysis
Diagram illustrating the bounds on the true average treatment effect
The insight is that while you can't point identify a treatment effect when the outcome is left-censored, it's possible to derive bounds on the true average treatment effect. It turns out you can estimate these bounds using standard causal inference methods like TMLE
I have a new paper out on a simple way to do causal inference with left-censored outcomes. This comes up with environmental data because measurements often have a lower limit of detection -- e.g. a chemical is undetectable below a certain level
www.tandfonline.com/doi/full/10....
the setup in this template uses slurm job arrays to spin up a bunch of workers, each of which then simulates some data, runs your estimators, saves the results in a cache directory, and then helps you collect all the results and generate tables/figures
if you are also in the niche position of needing to run a lot of simulation studies in R on slurm clusters, I have just the thing for you: github.com/herbps10/sim...
Just published: Antoine Chambaz and I did the formal work to prove you can use Super Learner (also known as model stacking) for estimating quantiles, both in i.i.d. and streaming data settings
www.sciencedirect.com/science/arti...
The DHS Program is officially done. As I tell my statistics students, good data is ESSENTIAL to improve the world. We can’t make things better if we don’t know the current state of things. No new DHS data collection is an incalculable loss.
www.nytimes.com/2025/02/26/h...
i offer a delightful array of asymptotically valid schemes and elixers
leading off my working group talk with the traveling quack to remind everyone the healthy level of skepticism they should be bringing to the table
Looking forward to digging into this, new on ArXiv today: arxiv.org/pdf/2501.06024