it is that time of the year! #aoc2025 adventofcode.com/2025
it is that time of the year! #aoc2025 adventofcode.com/2025
In October 2024, I twote that "something is deeply wrong" with what we now call virtual cell models. A lot has happened since then. How am I updating? New blog post: ekernf01.github.io/virtual-cell...
Beeswarm plot of the prediction error across different methods of double perturbations showing that all methods (scGPT, scFoundation, UCE, scBERT, Geneformer, GEARS, and CPA) perform worse than the additive baseline.
Line plot of the true positive rate against the false discovery proportion showing that none of the methods is better at finding non additive interactions than simply predicting no change.
Our paper benchmarking foundation models for perturbation effect prediction is finally published ππ₯³π
www.nature.com/articles/s41...
We show that none of the available* models outperform simple linear baselines. Since the original preprint, we added more methods, metrics, and prettier figures!
π§΅
Thanks to all the coauthors, Eszter Varga, Daniel Dimitrov,
@juliosaezrod.bsky.social, LΓ‘szlΓ³ Hunyady and especially to first author Szilvia Barsi who led this project from start to finish! 7/7
RIDDEN is available as an easy to use command line tool at github.com/basvaat/RIDD.... 6/n
Using an IO dataset of #nivolumab-treated renal cancer patients, we found that:
- #PD1 or #PDL1 expression didn't predict survival
- PD1 activity (from RIDDEN) did associate with survival.
Showing that receptor activity could be a more effective biomarker than expression. 5/n
RIDDENβs receptor-specific signatures align with regulons of transcription factors downstream of receptors. 4/n
We benchmarked RIDDEN on various receptor perturbation datasets, including the recent in vivo Immune Dictionary data www.nature.com/articles/s41..., and found good predictive performance. 3/n
We used prior knowledge from omnipathdb.org & perturbation data from clue.io to build models for receptor activity inference. Instead of focusing on receptor expression, we predicted activity from the expression of receptor-regulated genes. 2/n
Our tool, RIDDEN (Receptor actIvity Data Driven inferENce) for predicting #receptor activity from #transcriptomics data is published in
PLOS Computational Biology journals.plos.org/ploscompbiol... 1/n
Thanks all the co-authors, Gema Sanz, @kristofszalay.com and especially Gerold Csendes, who led this project! 5/5
Several popular Perturb-seq based benchmark datasets lack heterogeneity, making it difficult to distinguish between strong and weak models. 4/5
Gene embeddings from foundation models align more closely with gene regulatory networks than with signaling networks, which may underlie their weaker performance in perturbation tasks. 3/5
Even the most trivial baseline (mean of train samples) outperformed recent foundation models, while basic ML models using biologically meaningful features won by a large margin. 2/5
Single-cell foundation models, trained on large-scale scRNA-seq datasets, are increasingly used for post-perturbation RNA-seq prediction.
But how do they actually perform?
Our new paper from @turbine-ai.bsky.social is now out in BMC Genomics. bmcgenomics.biomedcentral.com/articles/10....
1/5
Hello #AACR25 !
We have an opening for a PhD/Postdoc in Spatial Omics for Cardiology in Heidelberg π« #job #academicjob #phd #postdoc #cadiology π
jobrxiv.org/job/heidelbe...
Turns out the way we usually pre-train foundational cell models adds very little information to the system - definitely not enough to make drug effect predictions work. Not what I've expected.
#virtualcells #foundationalmodels #compbio
blog.turbine.ai/p/pretrainin...
easter holiday is coming: 2 revised versions submitted today
Youβve probably heard about how AI/LLMs can solve Math Olympiad problems ( deepmind.google/discover/blo... ).
So naturally, some people put it to the test β hours after the 2025 US Math Olympiad problems were released.
The result: They all sucked!
Does @iscb.bsky.social have a statement on this? We need to be there to support bioinformatics researchers.
based on some recent papers / presentations, the hill(s) I'll die on:
1) Raw IC50 values != good metrics for comparing cancer drug sensitivity
2) Raw TPMs / counts are meaningless for comparing gene expression similarity
3) Diverging color palettes need a meaningful midpoint.
Post-doc opening to join our lab @ebi.embl.org to develop&apply #bioinformatics and #machine-learning methods to study intra-/extra cellular networks to extract disease mechanisms from #single-cell and #spatial multiomic data - please share π : embl.wd103.myworkdayjobs.com/EMBL/job/Hin...
Flier for the ICSB2025 conference in Dublin 5-9 October 2025
We will be hosting the International Conference on Systems Biology (ICSB) in Dublin this year β 5-9thΒ October icsb2025.com/registration/ #ICSB2025
It's been over 24 hours since #AdventOfCode 2024 ended!
This AoC season:
- 273,313 users collected at least one star
- 3,654,949 stars were collected
Since 2015:
- 779 users have all 500 stars
- 1,077,226 users collected at least one star
- 23,170,305 were collected
holiday season is coming: 2 papers submitted today
Thanks to all the coauthors, Eszter Varga, Daniel Dimitrov, @juliosaezrod.bsky.social , LΓ‘szlΓ³ Hunyady and especially to first author Szilvia Barsi who led this project from start to finish! 6/6
Using an IO dataset of nivolumab-treated renal cancer patients, we found that:
- PD1 or PDL1 expression didn't predict survival
- PD1 activity (from RIDDEN) did associate with survival.
This shows that receptor activity could be a more effective biomarker than expression. 5/n
RIDDENβs receptor-specific signatures align with regulons of transcription factors downstream of receptors. 4/n
We benchmarked RIDDEN on various receptor perturbation datasets, including the recent in vivo Immune Dictionary data nature.com/articles/s41..., and found good predictive performance. 3/n