Lior Pachter's Avatar

Lior Pachter

@lpachter

Bren Professor of Computational Biology @Caltech.edu. Blog at http://liorpachter.wordpress.com. Posts represent my views, not my employer's. #methodsmatter

10,076
Followers
836
Following
153
Posts
20.08.2023
Joined
Posts Following

Latest posts by Lior Pachter @lpachter

Our collaborators running Cellbender on 20Tb of data and having it cost them thousands of $ of GPU time + months in the queue is literally what motivated the CellSweep project!

10.03.2026 01:44 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - pachterlab/cellsweep Contribute to pachterlab/cellsweep development by creating an account on GitHub.

CellSweep is available at github.com/pachterlab/c...
5/5

09.03.2026 22:40 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

CellSweep was motivated by analysis of the 8^3 dataset from the Mortazavi lab, where they noticed contamination in Parse single-cell RNA-seq by assessing marker genes for tissues from multiplexed plates. CellSweep can sweep away these problems as well. 4/

09.03.2026 22:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We validate CellSweep in many ways. Here is the comparison to the CellBender validation, although CellSweep is much faster. 3/

09.03.2026 22:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

CellSweep is also very useful for spatial assays, where we find it can greatly reduce contamination. Interestingly, the lowest quality cells inferred by CellSweep in this VisiumHD dataset are on the border of the image. 2/

09.03.2026 22:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Ambient RNA & barcode swapping is a serious issue in single-cell genomics. Tools such as CellBender, scAR, DecontX & SoupX. We have developed CellSweep which is faster (in some cases by a lot) and much more accurate. Extensively tested and benchmarked. www.biorxiv.org/content/10.6... 1/

09.03.2026 22:40 πŸ‘ 53 πŸ” 13 πŸ’¬ 2 πŸ“Œ 0

"In the end, I started wondering if all of BioConductor could be ported to Python - or perhaps Julia or Rust for pure language implementations that also cover the optimized parts written in C, C++, or Fortran."

07.03.2026 21:15 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.

The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...

Figure 1 shows they key result

06.03.2026 19:32 πŸ‘ 181 πŸ” 86 πŸ’¬ 6 πŸ“Œ 8

kallisto sped up RNA-seq quantification by 50x. Now another 50x speedup... quantify hundreds of millions of reads in a few seconds.

This seems too good to be true. But it's true!
Incredible accomplishment by @pmelsted.bsky.social.

06.03.2026 20:28 πŸ‘ 58 πŸ” 9 πŸ’¬ 0 πŸ“Œ 0

After years of work, the centerpiece of my PhD is published in @natmethods.nature.com! Read it to learn about the biophysical insights we can get from single-cell data!

But first, I would like to talk a bit about RNA velocity and normalization. 1/

03.03.2026 22:30 πŸ‘ 38 πŸ” 15 πŸ’¬ 1 πŸ“Œ 0

Ok, but: deduction for no gloves

02.03.2026 04:10 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Where are the fume hood vents on the roof?

02.03.2026 04:06 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Introducing a Clytia planula cell atlas, and demonstrating broad-level relations with medusa cells via another updated atlas. By @annaferraioli.bsky.social with @juliarmateu.bsky.social and collaborators in a project led by @rcply.bsky.social @biodev-vlfr.bsky.social www.biorxiv.org/content/10.6...

18.02.2026 06:59 πŸ‘ 35 πŸ” 14 πŸ’¬ 0 πŸ“Œ 1

years ago my student and I translated LD score regression form Python to R, it took months, and LDscore at the time was like 3/4 reasonably functions! Translating EdgeR to Python in a week is absolutely crazy…

19.02.2026 20:32 πŸ‘ 16 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - MichelNivard/ldsc-r-codex-translation Contribute to MichelNivard/ldsc-r-codex-translation development by creating an account on GitHub.

got curious, codex translated the core features of LD score regression from python to R in under an hour... Obv not a whole package but enough to see you'd soon be able to flick things between Python and R as it suits you...

19.02.2026 21:53 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1

On April 24, 2026 a new rule kicks in whereby state and local governments will require WCAG 2.1 Level AA for web content.
The PDF for the preprint describing this project is UA-2 which is a "gold standard" for PDF accessibility. I built it using luatex in combination with verapdf and LLM assistance.

19.02.2026 19:54 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - pachterlab/edgePython: edgePython is a Python implementation of the Bioconductor edgeR package for differential analysis of genomics count data. It also includes a new single-cell differentia... edgePython is a Python implementation of the Bioconductor edgeR package for differential analysis of genomics count data. It also includes a new single-cell differential expression method that exte...

This MCP server for edgePython should be useful for anyone building (comp)bio agents: github.com/pachterlab/e...

19.02.2026 17:19 πŸ‘ 19 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

About $500. But I was learning as I was doing. I am now doing related work and the cost is lower.

19.02.2026 17:16 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

inMoose did an excellent job but the port covers only about 25% of edgeR.

19.02.2026 17:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
The Quickening In a paper titled β€œTHEOREMS FOR A PRICE: Tomorrow’s Semi-Rigorous Mathematics Culture” published in 1993, mathematician Doron Zeilberger wrote: There are writings on the wall that…

Some thoughts on AI based on working on this project: Some thoughts on AI based on working on this project: liorpachter.wordpress.com/2026/02/19/t...

19.02.2026 17:02 πŸ‘ 23 πŸ” 12 πŸ’¬ 5 πŸ“Œ 1

The port is also complete (save for a handful of functions that I determined were best not to port). For example, I did not port processAmplicons since I think handling of fastq should be separate for modularity. This can be easily done in Python.

19.02.2026 16:50 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
edgePython/examples/mammary/mouse_mammary_R_vs_Python.ipynb at main Β· pachterlab/edgePython edgePython is a Python implementation of the Bioconductor edgeR package for differential analysis of genomics count data. It also includes a new single-cell differential expression method that exte...

Results are near identical. See the tutorial / example github.com/pachterlab/e...

19.02.2026 16:48 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I used Claude Opus 4.5/4.6 (and a bit of Codex GPT-5.3) to port edgeR to Python. See edgePython github.com/pachterlab/e...
This allowed me to develop a single-cell DE method that extends NEBULA with edgeR Empirical Bayes. All in one week. Details in doi.org/10.64898/202...

19.02.2026 16:46 πŸ‘ 68 πŸ” 25 πŸ’¬ 3 πŸ“Œ 3
Preview
GitHub - pachterlab/edgePython: edgePython is a Python implementation of the Bioconductor edgeR package for differential analysis of genomics count data. It also includes a new single-cell differentia... edgePython is a Python implementation of the Bioconductor edgeR package for differential analysis of genomics count data. It also includes a new single-cell differential expression method that exte...

While my edgeR port was primarily executed with Claude I found that in some cases Codex was able to handle tricky problems that Claude struggled with. Specifically, Codex was helpful for resolving subtle discrepancies in tagwise dispersion estimates. github.com/pachterlab/e...

19.02.2026 15:57 πŸ‘ 0 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Cool new paper on how to think about aggregation for spatial transcriptomics by Lambda Moses et al. www.biorxiv.org/content/10.6...

19.02.2026 09:00 πŸ‘ 7 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
18.02.2026 05:15 πŸ‘ 9 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

I unfortunately have a different story. I served on a hiring committee which initially failed. A late candidate then emerged with a superb publication record. There were reports of sexual misbehavior as a PhD student circulating but they were now married with kids. I voted yes 1/n

15.02.2026 16:24 πŸ‘ 47 πŸ” 13 πŸ’¬ 5 πŸ“Œ 2

Interesting article on LLM text extraction and the connection of that problem to (computational biology) sequence alignment. www.biorxiv.org/content/10.6... by @sina.bio and Aaron Streets.

11.02.2026 17:53 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Yes- in our setup (trying to identify related claims / results via neighbors in a kNN in a latent space) it's similar words / terminology. There are undoubtedly many other causes in other settings right now.

04.02.2026 09:22 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

There is a reason for a different word when machines do it. Machines are definitely making an error, and it is a type of error that maybe can be fixed.

At least one can hope.

4/4

04.02.2026 06:28 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0