Our collaborators running Cellbender on 20Tb of data and having it cost them thousands of $ of GPU time + months in the queue is literally what motivated the CellSweep project!
Our collaborators running Cellbender on 20Tb of data and having it cost them thousands of $ of GPU time + months in the queue is literally what motivated the CellSweep project!
CellSweep is available at github.com/pachterlab/c...
5/5
CellSweep was motivated by analysis of the 8^3 dataset from the Mortazavi lab, where they noticed contamination in Parse single-cell RNA-seq by assessing marker genes for tissues from multiplexed plates. CellSweep can sweep away these problems as well. 4/
We validate CellSweep in many ways. Here is the comparison to the CellBender validation, although CellSweep is much faster. 3/
CellSweep is also very useful for spatial assays, where we find it can greatly reduce contamination. Interestingly, the lowest quality cells inferred by CellSweep in this VisiumHD dataset are on the border of the image. 2/
Ambient RNA & barcode swapping is a serious issue in single-cell genomics. Tools such as CellBender, scAR, DecontX & SoupX. We have developed CellSweep which is faster (in some cases by a lot) and much more accurate. Extensively tested and benchmarked. www.biorxiv.org/content/10.6... 1/
"In the end, I started wondering if all of BioConductor could be ported to Python - or perhaps Julia or Rust for pure language implementations that also cover the optimized parts written in C, C++, or Fortran."
Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.
The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...
Figure 1 shows they key result
kallisto sped up RNA-seq quantification by 50x. Now another 50x speedup... quantify hundreds of millions of reads in a few seconds.
This seems too good to be true. But it's true!
Incredible accomplishment by @pmelsted.bsky.social.
After years of work, the centerpiece of my PhD is published in @natmethods.nature.com! Read it to learn about the biophysical insights we can get from single-cell data!
But first, I would like to talk a bit about RNA velocity and normalization. 1/
Ok, but: deduction for no gloves
Where are the fume hood vents on the roof?
Introducing a Clytia planula cell atlas, and demonstrating broad-level relations with medusa cells via another updated atlas. By @annaferraioli.bsky.social with @juliarmateu.bsky.social and collaborators in a project led by @rcply.bsky.social @biodev-vlfr.bsky.social www.biorxiv.org/content/10.6...
years ago my student and I translated LD score regression form Python to R, it took months, and LDscore at the time was like 3/4 reasonably functions! Translating EdgeR to Python in a week is absolutely crazyβ¦
got curious, codex translated the core features of LD score regression from python to R in under an hour... Obv not a whole package but enough to see you'd soon be able to flick things between Python and R as it suits you...
On April 24, 2026 a new rule kicks in whereby state and local governments will require WCAG 2.1 Level AA for web content.
The PDF for the preprint describing this project is UA-2 which is a "gold standard" for PDF accessibility. I built it using luatex in combination with verapdf and LLM assistance.
This MCP server for edgePython should be useful for anyone building (comp)bio agents: github.com/pachterlab/e...
About $500. But I was learning as I was doing. I am now doing related work and the cost is lower.
inMoose did an excellent job but the port covers only about 25% of edgeR.
Some thoughts on AI based on working on this project: Some thoughts on AI based on working on this project: liorpachter.wordpress.com/2026/02/19/t...
The port is also complete (save for a handful of functions that I determined were best not to port). For example, I did not port processAmplicons since I think handling of fastq should be separate for modularity. This can be easily done in Python.
Results are near identical. See the tutorial / example github.com/pachterlab/e...
I used Claude Opus 4.5/4.6 (and a bit of Codex GPT-5.3) to port edgeR to Python. See edgePython github.com/pachterlab/e...
This allowed me to develop a single-cell DE method that extends NEBULA with edgeR Empirical Bayes. All in one week. Details in doi.org/10.64898/202...
While my edgeR port was primarily executed with Claude I found that in some cases Codex was able to handle tricky problems that Claude struggled with. Specifically, Codex was helpful for resolving subtle discrepancies in tagwise dispersion estimates. github.com/pachterlab/e...
Cool new paper on how to think about aggregation for spatial transcriptomics by Lambda Moses et al. www.biorxiv.org/content/10.6...
I unfortunately have a different story. I served on a hiring committee which initially failed. A late candidate then emerged with a superb publication record. There were reports of sexual misbehavior as a PhD student circulating but they were now married with kids. I voted yes 1/n
Interesting article on LLM text extraction and the connection of that problem to (computational biology) sequence alignment. www.biorxiv.org/content/10.6... by @sina.bio and Aaron Streets.
Yes- in our setup (trying to identify related claims / results via neighbors in a kNN in a latent space) it's similar words / terminology. There are undoubtedly many other causes in other settings right now.
There is a reason for a different word when machines do it. Machines are definitely making an error, and it is a type of error that maybe can be fixed.
At least one can hope.
4/4