fuck, I’m sorry, I can’t compete with this
fuck, I’m sorry, I can’t compete with this
We have developed and tested a spatial scan statistic for categorical, functional data (CFSS) - a data structure within which current approaches cannot identify spatial clusters. Our methodology combines an encoding scheme for categorical, functional observations with a nonparametric scan statistic. In a simulation study with three distinct scenarios, the CFSS accurately recovered the simulated spatial clusters and gave very low false positive rates, high true positive rates, and high positive predictive values. We have also used the CFSS to identify and characterize spatial clusters in French air pollution data from the winter of 2024.
arXiv📈🤖
A spatial scan statistical for categorical, functional data
By Fr\'event, Sarr, Dabo-Niang
I find this article cacm.acm.org/opinion/the-... a far more compelling and interesting discussion of the role of agents in the broader economy.
Written by very many smart economists at Microsoft research
Timely contribution! Focus on time-dynamic boundaries is the frontier in segregation/neighbourhood studies.
Example for the two staged unsupervised machine learning algorithm using point data as input. Backlayer maps depict Hamburg. The map shows neighborhoods in different sizes and forms, sometimes following administrative borders (black lines) sometimes not. Three differently colored neighborhood types are displayed, each representing a different social group of residents.
xample for the two staged unsupervised machine learning algorithm using 500x500m grid cells as input. Backlayer maps depict Hamburg. The map shows large neighborhoods in different sizes and forms, sometimes following administrative borders (black lines) sometimes not. Three differently colored neighborhood types are displayed, each representing a different social group of residents.
Looking for a measure of #neighborhoods, micro or macro #segregation?
I've got something for you!
My newly published paper in Sociological Methods & Research presents a machine-learning-based algorithm to delineate neighborhoods with grid-cell or point data:
journals.sagepub.com/doi/10.1177/...
Wouldn’t using a class S7 object be more efficient and less error prone?
Compositional data (proportions that sum to 1) behave in ways standard models aren’t built for
I walk through why Dirichlet regression is often the right tool & what extra insight it gives using a real ex of eyetracking
#Dirichlet #r #brms #guide #eyetracking
open.substack.com/pub/mzlotean...
New blog post about the age-period-cohort identification problem!
In which, for the first time ever, I ask "What's the mechanism?" and also suggest that sometimes you may actually *not* be interested in causal inference.
www.the100.ci/2026/02/13/o...
Results for CEM and neighborhood fixed effects regressions. We find robust and significant positive effects (blue coefficients) of childhood exposure to different ethnicities on the likelihood of interethnic marriage.
Left: georeferenced households in 1880. Right: Ethnic organic neighborhoods in Manhattan in 1880. Six ethnic groups are prevalent: first/second generation Americans, Asians, Germans, Irish and Others (residual category). We use these neighborhoods to account for segregation by applying organic neighborhood fixed effects.
New preprint out with @wendering.bsky.social & Nan Zhang!
We show that childhood exposure to ethnic outgroups increases the prob. of #interethnic marriage decades later, using historical linked US census data (1880–1910) and next-door neighbor comparisons 🏠🌃
Read more here:
shorturl.at/U0IHR
The joke of the week in my causal inference methods course this week 😂
We ran a massive, uncontrolled social experiment on kids with social media. Outcomes: anxiety, comparison, fractured attention. Jon Haidt called the direction of travel. We’re doing it again with AI, outsourcing thinking and social skills, to act surprised later. www.nytimes.com/2026/01/30/o...
Learn linux, that's it...
Bayesian model comparison implements Occam's razor through its sensitivity to the prior. However, prior-dependence makes it important to assess the influence of plausible alternative priors. Such prior sensitivity analyses for the Bayesian evidence are expensive, either requiring repeated, costly model re-fits or specialised sampling schemes. By exploiting the learned harmonic mean estimator (LHME) for evidence calculation we decouple sampling and evidence calculation, allowing resampled posterior draws to be used directly to calculate the evidence without further likelihood evaluations. This provides an alternative approach to prior sensitivity analysis for Bayesian model comparison that dramatically alleviates the computational cost and is agnostic to the method used to generate posterior samples. We validate our method on toy problems and a cosmological case study, reproducing estimates obtained by full Markov chain Monte Carlo (MCMC) sampling and nested sampling re-fits. For the cosmological example considered our approach achieves up to $6000\times$ lower computational cost.
arXiv📈🤖
Efficient prior sensitivity analysis for Bayesian model comparison
By Hu, McEwen
Finally got around to removing broom::tidy(), broom::glance(), and broom::augment() from my class examples in favor of parameters::model_parameters(), performance::model_performance() and marginaleffects::predictions() because they're *so nice* for teaching! #rstats #easystats
Great cogent post on the issues of agentic coding (Eg our buddy Claude Code) and the spillover costs on everyone else
lucumr.pocoo.org/2026/1/18/ag...
@gmcd.bsky.social hits on the things you were complaining about
The way I understand Trumpism is similar to the rise of fascism in the 30s:
There was a wave of globalization and liberalization of trade and migration which had clear winners and losers. When the working class lost faith in the system, capitalists aligned with ethno-nationalists to preserve it.
I am thrilled that my book, "Claiming the Right to the City: Rethinking Urban Transformations in Brazil," is now published at @ubcpress.bsky.social. It explores profound divisions between the right to the city on paper and the reality in practice. See bit.ly/49v96gQ for more info, or get in touch.
New AI agentic research agent. Claims to procide accelerated scientific discovery pipeline by:
- 🧠 Research Co-Pilot Intelligence
- ⚙️ Autonomous Algorithm Innovation
- 📊 Intelligent Data Orchestration
- 🔬 Scientific Reproducibility Engine
- 📚 AI-Powered Deep Survey
See for yourself: novix.science
Mi entrevista en La Razón de España
La vivienda social es indispensable tenerla en toda la ciudad para generar mixtura social, luchar contra la gentrificación, acceder a servicios, calidad de vida y promover una proximidad feliz.
Alojarse no es vivir en la ciudad!
www.larazon.es/medio-ambien...
Contextual Distraction: RAG isn't a Seatbelt. A laboratory safety benchmark finds retrieval augmented generation (RAG) can make strong models worse. doi.org/10.59350/8v4... #biosecurity 🧬🖥️🧪
The beauty of academia? Total freedom to pick which 7 days you work each week.
Count-compositional data arise in many different fields, including high-throughput microbiome sequencing and palynology experiments, where a common, important goal is to understand how covariates relate to the observed compositions. Existing methods often fail to simultaneously address key challenges inherent in such data, namely: overdispersion, an excess of zeros, cross-sample heterogeneity, and nonlinear covariate effects. To address these concerns, we propose novel Bayesian models based on ensembles of regression trees. Specifically, we leverage the recently introduced zero-and-$N$-inflated multinomial distribution and assign independent nonparametric Bayesian additive regression tree (BART) priors to both the compositional and structural zero probability components of our model, to flexibly capture covariate effects. We further extend this by adding latent random effects to capture overdispersion and more general dependence structures among the categories. We develop an efficient inferential algorithm combining recent data augmentation schemes with established BART sampling routines. We evaluate our proposed models in simulation studies and illustrate their applicability with two case studies in microbiome and palaeoclimate modelling.
arXiv📈🤖
Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees
By Menezes, Parnell, Murphy
💯
For the "data scientists" out there who are throwing things at machines (e.g., LLMs) and trying to extract "insights," I recommend (re)reading this piece:
"Data is therefore only as useful as its quality and the skills of the person wielding it."
qz.com/1664575/is-d...
wordcount plug in output with word counts
My #rstats / #python tip of the day is @andrew.heiss.phd wordcount Quarto plug in.
Just change one line in your YAML header and your Quarto doc will produce separate word counts for body, notes, references, etc.
See link here: github.com/andrewheiss/...
🚨New year, new paper with the fab Anette Fasang, Ignacio Cabib, Adam Cooper &
@robgruijters.bsky.social!
We introduce our SI on 'Young Adult Life Courses in the Global South' & develop a new framework and concepts to advance the field. www.sciencedirect.com/science/arti...
@sriucl.bsky.social
arXiv:2307.09404v3 Announce Type: replace Abstract: The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. Here we introduce a framework for extending techniques of multivariate analysis to such settings. The proposed continuous-time multivariate analysis (CTMVA) framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as $B$-splines, as in the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We present continuous-time extensions of the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering. We show that CTMVA can improve on the performance of classical MVA, in particular for correlation estimation and clustering, and can be applied in some settings where classical MVA cannot, including variables observed at disparate time points. CTMVA is illustrated with a novel perspective on a well-known Canadian weather data set, and with applications to data sets involving international development, brain signals, and air quality. The proposed methods are implemented in the publicly available R package \texttt{ctmva}.
arXiv📈🤖 New Paper
Continuous-time multivariate analysis
By
Mamdani has only been Mayor for a day & criminals are already fleeing New York
Temperatures have struggled to climb above freezing in places today, with frost returning already
Significantly colder Arctic air is gathering just to our N now though, behind the cold front you can see on the pressure chart and on satellite imagery
This sweeps S next 24hrs
Happy New Year, everyone! 🎉
Because my work focuses on 𝘩𝘰𝘶𝘴𝘪𝘯𝘨 𝘢𝘧𝘧𝘰𝘳𝘥𝘢𝘣𝘪𝘭𝘪𝘵𝘺 and 𝘶𝘳𝘣𝘢𝘯 𝘴𝘶𝘴𝘵𝘢𝘪𝘯𝘢𝘣𝘪𝘭𝘪𝘵𝘺, I put together a page that collects and periodically updates 𝗿𝗲𝗰𝗲𝗻𝘁 𝗲𝗺𝗽𝗶𝗿𝗶𝗰𝗮𝗹 𝘀𝘁𝘂𝗱𝗶𝗲𝘀 𝗼𝗻 𝗿𝗲𝗻𝘁 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 𝗮𝗻𝗱 𝗶𝘁𝘀 𝗲𝗳𝗳𝗲𝗰𝘁𝘀 𝘄𝗼𝗿𝗹𝗱𝘄𝗶𝗱𝗲. 📚🌍
👉 sites.google.com/view/hjiang/...