Splicer: Phylogenetic Placement in Sub-Linear Time https://www.biorxiv.org/content/10.64898/2026.02.10.705130v1
Splicer: Phylogenetic Placement in Sub-Linear Time https://www.biorxiv.org/content/10.64898/2026.02.10.705130v1
Huge congratulations @martibartfast.bsky.social and @zaminiqbal.bsky.social on the publication of this fantastic and massive paper. A huge achievement!
Embarrassingly_FASTA: Enabling Recomputable, Population-Scale Pangenomics by Reducing Commercial Genome Processing Costs from $100 to less than $1 https://www.biorxiv.org/content/10.64898/2026.02.02.703356v1
So anyway:
BiRank & QuadRank: single-cache-miss rank queries that are double the throughput of other Rust crates and fully saturate the memory bandwidth.
Side effect: QuadFm is smaller and 2-4x faster than the next-best FM-index.
github.com/RagnarGrootK...
raw.githubusercontent.com/RagnarGrootK...
Very proud to have played a small part in this important work!
EDEN: a family of genomic language models trained on up to 9.7 trillion nucleotides from @basecamp-research.bsky.social's BaseData can design large serine recombinases, bridge recombinases, and antimicrobial peptides.
www.biorxiv.org/content/10.6...
Happy to have played a small part in this!
Rapid and Consistent Genome Clustering for Navigating Bacterial Diversity with Millions of MAGs and Isolates https://www.biorxiv.org/content/10.64898/2025.12.30.695181v1
Rewriting protein alphabets with language models https://www.biorxiv.org/content/10.1101/2025.11.27.690975v1
Deciphering enzymatic potential in metagenomic reads through DNA language model https://www.biorxiv.org/content/10.1101/2024.12.10.627786v1
A General Transformer-Based Multi-Task Learning Framework for Predicting Interaction Types between Enzyme and Small Molecule https://www.biorxiv.org/content/10.1101/2025.10.09.681419v1
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models
Zinnia Ma, Neville P. Bethel
bioRxiv 2025.09.23.678152; doi: doi.org/10.1101/2025...
Precisely calling mutations across hundreds of bacterial isolates has been hard, requiring manual filtering and expertise.
Until now, using AccuSNV.
Herui Liao trained an ML model based on our previous meticulously called SNVs.
www.biorxiv.org/content/10.1...
Now published in @natcomms.nature.com π
www.nature.com/articles/s41...
With Gillian Rodger, @nstoesser.bsky.social, @samlipworth.bsky.social, @stat-sarah.bsky.social, and many others!
Machine learning for biosecurity: A probabilistic framework for invasive species management. Journal of Applied Ecology, 00, 1β13. doi.org/10.1111/1365...
Our preprint on our new metagenomic HiFi assembler Alice is out π₯³ Based on a *new sketching method* (π§΅1/6)
π Preprint www.biorxiv.org/content/10.1...
π Github github.com/rolandfaure/...
There are millions of openly available microbial genomes, but searching them can be slow.
Until now π₯
Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.
www.ebi.ac.uk/about/news/r...
π¦
"We show that, despite this compression factor, SSEs can be used as a highly effective tertiary structure comparison tool, with accuracy that approaches that of Foldseek, while offering a 200-fold speedup. "
www.biorxiv.org/content/10.1...
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Couldnβt have said it better myself!