Justin Silverman (@inschool4life)

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

@cellpress.bsky.social
We submitted a presubmission inquiry on 9/12 and followed up again on 9/24. We have not heard a response. Is this typical? Could you please help us, we are trying to confirm how we should submit, as a matters arising or as a research article
www.biorxiv.org/content/10.1...

30.09.2025 13:03 👍 4 🔁 0 💬 0 📌 1

Our analysis is the largest to date, we used our newly created MUTT database which consists of over 15,000 samples, from over 30 studies, each with paired sequence counts and microbial load measurements.

Core takeaway, its important to accurately model uncertainty and error.

@ggloor.bsky.social

17.09.2025 17:41 👍 0 🔁 1 💬 0 📌 0

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

New Paper!

Machine learning models that attempt to predict microbial load collapse outside of their training context with an R2<0!

In contrast, our Bayesian Partially Identified Models embrace uncertainty in unmeasured microbial load and consistently outpreform.

www.biorxiv.org/content/10.1...

17.09.2025 17:41 👍 7 🔁 3 💬 1 📌 0

Excited to summarize our most recent paper, "Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2" on controlling the false discovery rate (FDR) when analyzing high throughput sequencing (HTS) data. This has been an open problem since the dawn of HTS.

21.08.2025 20:59 👍 6 🔁 3 💬 1 📌 0

PCR Bias Impacts Microbiome Ecological Analyses Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-ass...

New preprint!

PCR bias doesn’t just distort relative abundances—it reshapes microbiome ecological analyses.

We show that commonly used diversity metrics (e.g., UniFrac or Shannon) are not robust to amplification bias, while perturbation-invariant alternatives are.

www.biorxiv.org/content/10.1...

01.08.2025 13:55 👍 2 🔁 0 💬 0 📌 0

Thanks! We think so. I think this will help enhance the cost-effectiveness and efficiency of biomarker discovery, our methods grealy enhance positive predictive value of analyses -reducing false signals that cost money to validate and detecting true signals that would otherwise be missed.

01.08.2025 13:53 👍 1 🔁 0 💬 0 📌 0

Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses - BMC Bioinformatics Background Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply ...

New Paper:

We relax normalizations to produce statistical methods for bioinformatics that are much more robust and powerful. We see FDR drop from 45% to 5% with increases in power!

This adds to our ongoing work on Scale Reliant Inference.

link.springer.com/article/10.1...

01.07.2025 16:46 👍 3 🔁 0 💬 1 📌 0

Our paper explaining why Gihawi et al. failed to prove an error in the normalization used by the 2020 cancer #microbiome analysis now out as a Matters Arising in @asm.org #mSystems (w/ @george-austin.bsky.social) 🖥️ 🧬

Thread explaining the key points below.

journals.asm.org/doi/10.1128/...

02.05.2025 13:59 👍 8 🔁 3 💬 0 📌 0

@ggloor.bsky.social

22.05.2025 16:44 👍 0 🔁 0 💬 0 📌 0

Scale Reliant Inference Many scientific fields, including human gut microbiome science, collect multivariate count data where the sum of the counts is unrelated to the scale of the underlying system being measured (e.g., tot...

Scale models are not just heuristics but have a rich theoretical foundation based on Bayesian Partially Identified Models. That theory is presented here:

arxiv.org/abs/2201.03616

22.05.2025 16:43 👍 2 🔁 0 💬 1 📌 0

GitHub - jsilve24/ALDEx3 Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

We are also developing a new ALDEx3 library that is about 1000 times faster than ALDEx2 with a streamlined user interface (although its still in beta I am using it regularly)
github.com/jsilve24/ALD...

22.05.2025 16:43 👍 1 🔁 0 💬 1 📌 0

GitHub - jsilve24/ALDEx3 Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

To facilitate adoption, we've update the popular ALDEx2 software package on Bioconductor to support scale model analysis.

22.05.2025 16:43 👍 1 🔁 0 💬 1 📌 0

In real data analysesd simulation studies we find our methods often lead to dramatic decreases in false positves (FDR can drop from >75% to a nominal 5%) while simultaneously maintaining or improving statistical power.

22.05.2025 16:43 👍 1 🔁 0 💬 1 📌 0

We present scale mdoels, which extend normalization by modeling potential errors in these assumptions (reducing false positives), or by allowing researchers to make more biologically plausible assumptions (reducing false negatives).

22.05.2025 16:43 👍 1 🔁 0 💬 1 📌 0

Traditional normalization methods often make implicit assumptions abou thte biological system's scale, such as microbial load or total RNA content. These assumptions can lead to false positives and negatives.

22.05.2025 16:43 👍 2 🔁 0 💬 1 📌 0

Incorporating scale uncertainty in microbiome and gene expression analysis as an extension of normalization - Genome Biology Statistical normalizations are used in differential analyses to address sample-to-sample variation in sequencing depth. Yet normalizations make strong, implicit assumptions about the scale of biologic...

New paper in Genome Biology!

genomebiology.biomedcentral.com/articles/10....

We introduce scale models, a generalization of normalizations that explciitly account for uncertainty in biological system scale (e.g., microbial load).

22.05.2025 16:43 👍 8 🔁 3 💬 2 📌 0

Microsoft Forms

🚨PA colleagues:

"Senator Fetterman wants to hear from you about how the federal funding freeze is affecting Pennsylvania."

"If your project has been impacted, please fill out our constituent impact form:" forms.office.com/g/mFv2JAPxpC

Get out your Other Support and share that info!

22.02.2025 20:12 👍 122 🔁 131 💬 4 📌 6

NIH funding freeze stalls applications on $1.5 billion in medical research funds The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

22.02.2025 18:39 👍 3569 🔁 1480 💬 160 📌 131

Our whole point is that there is information missing from the data -- overcoming that requires additional thought and a careful consideration of what assumptions are biologically plausible in a particular study. e.g., studying antibiotics Microbial load likely decreases post-treatment etc...

19.02.2025 15:56 👍 1 🔁 0 💬 0 📌 0

An important point if you look to benchmark our methods. Normalizations are kinda "point and click", no additional thought needed by user. We can generalize normalilzations and it helps reduce false positives. But the real advances -- when we see the massive FN/FP decreases is when care is taken.

19.02.2025 15:56 👍 0 🔁 0 💬 1 📌 0

Love it! Will deffinetly check that out as it would be super helpful for us. An yes, our methods are not yet common (thought they are available in ALDEx2 now!). Reviewers have been resistant as they love normalizations and our methods seem foreign.

19.02.2025 15:56 👍 2 🔁 0 💬 1 📌 0

NeurIPS Efficient Bayesian Additive Regression Models For Microbiome and Gene Expression StudiesNeurIPS 2024

Non-linear additive regression (using scalable Bayesian Multinomial Logistic Normal models) is now available in fido (on CRAN)!
neurips.cc/virtual/2024...

Also includes extreemly fast marginal likelihood estimation for hyperparameter tuning.
cran.r-project.org/web/packages...

19.02.2025 14:52 👍 3 🔁 0 💬 0 📌 0

This builds on our prior work
jmlr.org/papers/v23/1...
where we introduced the CU Sampler for Bayesian MLN models. This is even 1-2 orders of magnitude faster than those methods while still be extreemly accurate.

19.02.2025 14:46 👍 0 🔁 0 💬 0 📌 0

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

New paper was recently accepted to AIStats

arxiv.org/abs/2410.05548

Flexible Multinomial Logistic-Normal time series models (state space models) that scale to extreemly large datasets. Inference is 5-6 orders of magnitude faster than alternatives. R package will soon be released.

19.02.2025 14:46 👍 2 🔁 0 💬 1 📌 0

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

Here is a better link to the new paper:

arxiv.org/abs/2410.05548

19.02.2025 14:41 👍 0 🔁 0 💬 0 📌 0

In short, we have already made public a fair amount of benchmarking studies against real data. Your manuscript just didn't cite any of it.

19.02.2025 14:29 👍 0 🔁 0 💬 1 📌 0

New results soon to be released:

We have developed specialized PIMs that account for uncertainty in sparsity assumptions. 6 datasets with ground truth, comparing against 8 methods. When our assumptions hold (first 4 datasets) our methods do well. When violated (last two) they fail gracefully.

19.02.2025 14:27 👍 1 🔁 0 💬 1 📌 0

Compositional data analysis enables statistical rigor in comparative glycomics - PubMed Comparative glycomics data are compositional data, where measured glycans are parts of a whole, indicated by relative abundances. Applying traditional statistical analyses to these data often results ...

www.nature.com/articles/s41...

Here is a completely independent group validating our methods for glycomics. Again, main conclusion -- the problem lies in normalizations and scale uncertainty is critical.

19.02.2025 14:23 👍 0 🔁 0 💬 1 📌 0

Vaginal metatranscriptome meta-analysis reveals functional BV subgroups and novel colonisation strategies - PubMed Our findings highlight a need to focus on functional rather than taxonomic differences when considering the role of microbiomes in disease and identify pathways for further research as potential BV tr...

pubmed.ncbi.nlm.nih.gov/39709449/

Another real data validation. @ggloor.bsky.social said he sat on this data for almost 10 years because existing methods were given nonsensical answers. Only when uncertainty in scale was considered did things start to make sense.

19.02.2025 14:23 👍 2 🔁 2 💬 1 📌 0

Explicit Scale Simulation for analysis of RNA-sequencing with ALDEx2 In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (e.g., total size, microbial load, or tota...

Here @ggloor.bsky.social found our methods drastically improve metatranscriptomic analyses as well -- again real data analyses but less of a focus on benchmarking.

www.biorxiv.org/content/10.1...

19.02.2025 14:23 👍 6 🔁 2 💬 1 📌 0

Justin Silverman

Latest posts by Justin Silverman @inschool4life