Andrei (@andrei-wonge)

fuck, I’m sorry, I can’t compete with this

03.03.2026 13:29 👍 17727 🔁 4074 💬 53 📌 833

We have developed and tested a spatial scan statistic for categorical, functional data (CFSS) - a data structure within which current approaches cannot identify spatial clusters. Our methodology combines an encoding scheme for categorical, functional observations with a nonparametric scan statistic. In a simulation study with three distinct scenarios, the CFSS accurately recovered the simulated spatial clusters and gave very low false positive rates, high true positive rates, and high positive predictive values. We have also used the CFSS to identify and characterize spatial clusters in French air pollution data from the winter of 2024.

arXiv📈🤖
A spatial scan statistical for categorical, functional data
By Fr\'event, Sarr, Dabo-Niang

03.03.2026 17:52 👍 2 🔁 1 💬 0 📌 0

I find this article cacm.acm.org/opinion/the-... a far more compelling and interesting discussion of the role of agents in the broader economy.

Written by very many smart economists at Microsoft research

28.02.2026 15:23 👍 27 🔁 2 💬 1 📌 0

Timely contribution! Focus on time-dynamic boundaries is the frontier in segregation/neighbourhood studies.

26.02.2026 18:09 👍 1 🔁 0 💬 0 📌 0

Example for the two staged unsupervised machine learning algorithm using point data as input. Backlayer maps depict Hamburg. The map shows neighborhoods in different sizes and forms, sometimes following administrative borders (black lines) sometimes not. Three differently colored neighborhood types are displayed, each representing a different social group of residents.

xample for the two staged unsupervised machine learning algorithm using 500x500m grid cells as input. Backlayer maps depict Hamburg. The map shows large neighborhoods in different sizes and forms, sometimes following administrative borders (black lines) sometimes not. Three differently colored neighborhood types are displayed, each representing a different social group of residents.

Looking for a measure of #neighborhoods, micro or macro #segregation?

I've got something for you!

My newly published paper in Sociological Methods & Research presents a machine-learning-based algorithm to delineate neighborhoods with grid-cell or point data:
journals.sagepub.com/doi/10.1177/...

24.02.2026 06:29 👍 42 🔁 14 💬 2 📌 1

Wouldn’t using a class S7 object be more efficient and less error prone?

13.02.2026 19:40 👍 1 🔁 0 💬 0 📌 0

Compositional data (proportions that sum to 1) behave in ways standard models aren’t built for

I walk through why Dirichlet regression is often the right tool & what extra insight it gives using a real ex of eyetracking

#Dirichlet #r #brms #guide #eyetracking

open.substack.com/pub/mzlotean...

09.02.2026 16:05 👍 23 🔁 11 💬 2 📌 0

One approach to the age-period-cohort problem: Just don’t. Just to cause yourself more problems, you seek for something. But there is no need for you to seek anything. You have plenty, and you have just enough problems. Shunryū Suzuki in a 1971 talk A ...

New blog post about the age-period-cohort identification problem!

In which, for the first time ever, I ask "What's the mechanism?" and also suggest that sometimes you may actually *not* be interested in causal inference.

www.the100.ci/2026/02/13/o...

13.02.2026 14:33 👍 160 🔁 42 💬 21 📌 8

Results for CEM and neighborhood fixed effects regressions. We find robust and significant positive effects (blue coefficients) of childhood exposure to different ethnicities on the likelihood of interethnic marriage.

Left: georeferenced households in 1880. Right: Ethnic organic neighborhoods in Manhattan in 1880. Six ethnic groups are prevalent: first/second generation Americans, Asians, Germans, Irish and Others (residual category). We use these neighborhoods to account for segregation by applying organic neighborhood fixed effects.

New preprint out with @wendering.bsky.social & Nan Zhang!

We show that childhood exposure to ethnic outgroups increases the prob. of #interethnic marriage decades later, using historical linked US census data (1880–1910) and next-door neighbor comparisons 🏠🌃

Read more here:
shorturl.at/U0IHR

12.02.2026 15:55 👍 14 🔁 4 💬 0 📌 0

The joke of the week in my causal inference methods course this week 😂

06.02.2026 21:52 👍 46 🔁 11 💬 1 📌 1

Opinion | Students Are Skipping the Hardest Part of Growing Up

We ran a massive, uncontrolled social experiment on kids with social media. Outcomes: anxiety, comparison, fractured attention. Jon Haidt called the direction of travel. We’re doing it again with AI, outsourcing thinking and social skills, to act surprised later. www.nytimes.com/2026/01/30/o...

31.01.2026 11:37 👍 11 🔁 4 💬 0 📌 0

Learn linux, that's it...

30.01.2026 11:21 👍 0 🔁 0 💬 0 📌 0

$Bayesian model comparison implements Occam's razor through its sensitivity to the prior. However, prior-dependence makes it important to assess the influence of plausible alternative priors. Such prior sensitivity analyses for the Bayesian evidence are expensive, either requiring repeated, costly model re-fits or specialised sampling schemes. By exploiting the learned harmonic mean estimator (LHME) for evidence calculation we decouple sampling and evidence calculation, allowing resampled posterior draws to be used directly to calculate the evidence without further likelihood evaluations. This provides an alternative approach to prior sensitivity analysis for Bayesian model comparison that dramatically alleviates the computational cost and is agnostic to the method used to generate posterior samples. We validate our method on toy problems and a cosmological case study, reproducing estimates obtained by full Markov chain Monte Carlo (MCMC) sampling and nested sampling re-fits. For the cosmological example considered our approach achieves up to $6000\times$ lower computational cost.$

Bayesian model comparison implements Occam's razor through its sensitivity to the prior. However, prior-dependence makes it important to assess the influence of plausible alternative priors. Such prior sensitivity analyses for the Bayesian evidence are expensive, either requiring repeated, costly model re-fits or specialised sampling schemes. By exploiting the learned harmonic mean estimator (LHME) for evidence calculation we decouple sampling and evidence calculation, allowing resampled posterior draws to be used directly to calculate the evidence without further likelihood evaluations. This provides an alternative approach to prior sensitivity analysis for Bayesian model comparison that dramatically alleviates the computational cost and is agnostic to the method used to generate posterior samples. We validate our method on toy problems and a cosmological case study, reproducing estimates obtained by full Markov chain Monte Carlo (MCMC) sampling and nested sampling re-fits. For the cosmological example considered our approach achieves up to $6000\times$ lower computational cost.

arXiv📈🤖
Efficient prior sensitivity analysis for Bayesian model comparison
By Hu, McEwen

22.01.2026 16:40 👍 2 🔁 2 💬 0 📌 0

Finally got around to removing broom::tidy(), broom::glance(), and broom::augment() from my class examples in favor of parameters::model_parameters(), performance::model_performance() and marginaleffects::predictions() because they're *so nice* for teaching! #rstats #easystats

22.01.2026 22:22 👍 79 🔁 6 💬 4 📌 2

Agent Psychosis: Are We Going Insane? What’s going on with the AI builder community right now?

Great cogent post on the issues of agentic coding (Eg our buddy Claude Code) and the spillover costs on everyone else

lucumr.pocoo.org/2026/1/18/ag...

@gmcd.bsky.social hits on the things you were complaining about

18.01.2026 12:51 👍 29 🔁 8 💬 3 📌 1

The way I understand Trumpism is similar to the rise of fascism in the 30s:

There was a wave of globalization and liberalization of trade and migration which had clear winners and losers. When the working class lost faith in the system, capitalists aligned with ethno-nationalists to preserve it.

18.01.2026 19:48 👍 206 🔁 43 💬 8 📌 1

I am thrilled that my book, "Claiming the Right to the City: Rethinking Urban Transformations in Brazil," is now published at @ubcpress.bsky.social. It explores profound divisions between the right to the city on paper and the reality in practice. See bit.ly/49v96gQ for more info, or get in touch.

16.01.2026 10:06 👍 4 🔁 2 💬 0 📌 0

New AI agentic research agent. Claims to procide accelerated scientific discovery pipeline by:

- 🧠 Research Co-Pilot Intelligence
- ⚙️ Autonomous Algorithm Innovation
- 📊 Intelligent Data Orchestration
- 🔬 Scientific Reproducibility Engine
- 📚 AI-Powered Deep Survey

See for yourself: novix.science

18.01.2026 10:07 👍 2 🔁 0 💬 0 📌 0

«Sin vivienda social no puede haber proximidad en las ciudades» Carlos Moreno Urbanista

Mi entrevista en La Razón de España
La vivienda social es indispensable tenerla en toda la ciudad para generar mixtura social, luchar contra la gentrificación, acceder a servicios, calidad de vida y promover una proximidad feliz.
Alojarse no es vivir en la ciudad!
www.larazon.es/medio-ambien...

18.01.2026 08:58 👍 6 🔁 1 💬 0 📌 0

Contextual Distraction: RAG isn't a Seatbelt A laboratory safety benchmark finds retrieval augmented generation (RAG) can make strong models worse.

Contextual Distraction: RAG isn't a Seatbelt. A laboratory safety benchmark finds retrieval augmented generation (RAG) can make strong models worse. doi.org/10.59350/8v4... #biosecurity 🧬🖥️🧪

17.01.2026 16:53 👍 9 🔁 5 💬 0 📌 0

The beauty of academia? Total freedom to pick which 7 days you work each week.

15.01.2026 20:52 👍 121 🔁 16 💬 2 📌 0

Count-compositional data arise in many different fields, including high-throughput microbiome sequencing and palynology experiments, where a common, important goal is to understand how covariates relate to the observed compositions. Existing methods often fail to simultaneously address key challenges inherent in such data, namely: overdispersion, an excess of zeros, cross-sample heterogeneity, and nonlinear covariate effects. To address these concerns, we propose novel Bayesian models based on ensembles of regression trees. Specifically, we leverage the recently introduced zero-and-$N$-inflated multinomial distribution and assign independent nonparametric Bayesian additive regression tree (BART) priors to both the compositional and structural zero probability components of our model, to flexibly capture covariate effects. We further extend this by adding latent random effects to capture overdispersion and more general dependence structures among the categories. We develop an efficient inferential algorithm combining recent data augmentation schemes with established BART sampling routines. We evaluate our proposed models in simulation studies and illustrate their applicability with two case studies in microbiome and palaeoclimate modelling.

arXiv📈🤖
Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees
By Menezes, Parnell, Murphy

14.01.2026 16:27 👍 2 🔁 1 💬 0 📌 0

💯

13.01.2026 22:37 👍 25540 🔁 7568 💬 476 📌 358

I’m a data scientist who is skeptical about data After millennia of relying on anecdotes, instincts, and old wives’ tales as evidence of our opinions, most of us today demand that people use data to support their arguments and ideas. Whether it’s cu...

For the "data scientists" out there who are throwing things at machines (e.g., LLMs) and trying to extract "insights," I recommend (re)reading this piece:

"Data is therefore only as useful as its quality and the skills of the person wielding it."

qz.com/1664575/is-d...

06.01.2026 19:39 👍 51 🔁 17 💬 2 📌 1

wordcount plug in output with word counts

My #rstats / #python tip of the day is @andrew.heiss.phd wordcount Quarto plug in.

Just change one line in your YAML header and your Quarto doc will produce separate word counts for body, notes, references, etc.

See link here: github.com/andrewheiss/...

06.01.2026 15:57 👍 44 🔁 6 💬 0 📌 0

Young adult life courses in the Global South: A comparative framework and research agenda As of 2025, 85% of the world population lives in countries typically associated with the Global South. Fifty percent of the world’s population is unde…

🚨New year, new paper with the fab Anette Fasang, Ignacio Cabib, Adam Cooper &
@robgruijters.bsky.social!

We introduce our SI on 'Young Adult Life Courses in the Global South' & develop a new framework and concepts to advance the field. www.sciencedirect.com/science/arti...

@sriucl.bsky.social

02.01.2026 10:20 👍 12 🔁 3 💬 1 📌 1

$arXiv:2307.09404v3 Announce Type: replace Abstract: The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. Here we introduce a framework for extending techniques of multivariate analysis to such settings. The proposed continuous-time multivariate analysis (CTMVA) framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as $B$-splines, as in the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We present continuous-time extensions of the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering. We show that CTMVA can improve on the performance of classical MVA, in particular for correlation estimation and clustering, and can be applied in some settings where classical MVA cannot, including variables observed at disparate time points. CTMVA is illustrated with a novel perspective on a well-known Canadian weather data set, and with applications to data sets involving international development, brain signals, and air quality. The proposed methods are implemented in the publicly available R package \texttt{ctmva}.$

arXiv:2307.09404v3 Announce Type: replace Abstract: The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. Here we introduce a framework for extending techniques of multivariate analysis to such settings. The proposed continuous-time multivariate analysis (CTMVA) framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as $B$-splines, as in the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We present continuous-time extensions of the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering. We show that CTMVA can improve on the performance of classical MVA, in particular for correlation estimation and clustering, and can be applied in some settings where classical MVA cannot, including variables observed at disparate time points. CTMVA is illustrated with a novel perspective on a well-known Canadian weather data set, and with applications to data sets involving international development, brain signals, and air quality. The proposed methods are implemented in the publicly available R package \texttt{ctmva}.

arXiv📈🤖 New Paper
Continuous-time multivariate analysis
By

02.01.2026 19:04 👍 2 🔁 1 💬 0 📌 0

Mamdani has only been Mayor for a day & criminals are already fleeing New York

02.01.2026 16:32 👍 37716 🔁 6153 💬 1229 📌 414

Temperatures have struggled to climb above freezing in places today, with frost returning already

Significantly colder Arctic air is gathering just to our N now though, behind the cold front you can see on the pressure chart and on satellite imagery

This sweeps S next 24hrs

31.12.2025 16:24 👍 9 🔁 5 💬 0 📌 0

Happy New Year, everyone! 🎉
Because my work focuses on 𝘩𝘰𝘶𝘴𝘪𝘯𝘨 𝘢𝘧𝘧𝘰𝘳𝘥𝘢𝘣𝘪𝘭𝘪𝘵𝘺 and 𝘶𝘳𝘣𝘢𝘯 𝘴𝘶𝘴𝘵𝘢𝘪𝘯𝘢𝘣𝘪𝘭𝘪𝘵𝘺, I put together a page that collects and periodically updates 𝗿𝗲𝗰𝗲𝗻𝘁 𝗲𝗺𝗽𝗶𝗿𝗶𝗰𝗮𝗹 𝘀𝘁𝘂𝗱𝗶𝗲𝘀 𝗼𝗻 𝗿𝗲𝗻𝘁 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 𝗮𝗻𝗱 𝗶𝘁𝘀 𝗲𝗳𝗳𝗲𝗰𝘁𝘀 𝘄𝗼𝗿𝗹𝗱𝘄𝗶𝗱𝗲. 📚🌍
👉 sites.google.com/view/hjiang/...

31.12.2025 15:58 👍 3 🔁 1 💬 1 📌 0

Andrei

Latest posts by Andrei @andrei-wonge