I am pleased to share that our paper is now published in Cell!
www.cell.com/cell/fulltex...
I am deeply grateful to all co-authors for making this possible.
This work was made possible through the guidance of Dr. Peer Bork. I share this in grateful memory and with deep respect for his mentorship.
09.02.2026 21:07
π 28
π 13
π¬ 1
π 2
Multiple protein structure alignment at scale with FoldMason
Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended ou...
FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
π www.science.org/doi/10.1126/...
π search.foldseek.com/foldmason
πΎ github.com/steineggerla...
30.01.2026 06:11
π 300
π 147
π¬ 4
π 3
We used a feature of the MAFFT software suite that adds sequences while keeping the column structure of the original MSA intact, implemented via the --add and --addfragments options. We refer to MSAs generated with this feature as amplified MSAs. Comparisons of normalized TCS gains between enriched and amplified MSAs showed comparable improvements in phylogenetic inference, regardless of the methodology. This suggests that we can preserve linear scalability for both sequence alignment and phylogenetic inference, while retaining the improvement in inference quality provided by the additional sequences.
This effect is maintained when MAFFT adds homologs onto an existing MSA without disrupting column structure. Based on these findings, we developed AmpliPhy, a Nextflow pipeline that automates database-driven homolog enrichment for improved gene tree inference at scale. π§΅4/n
28.01.2026 06:10
π 3
π 0
π¬ 1
π 0
We hypothesized that the gain in inference quality could be driven by more precise root placement, leveraging the additional information provided by the enriched taxa. To test this, we modified the processing of the enriched tree by switching the order of rooting and pruning. In our original workflow, we processed trees inferred from enriched MSAs by rooting first and then pruning leaves that were not present in the original MSA (post-pruning). By reversing the order, i.e. pruning first and rooting afterwards, we prevented the additional taxa from contributing to the rooting step (pre-pruning). We then compared the congruence gain of pre-pruned and post-pruned phylogenetic trees. To quantify the effect of pre-pruning, we computed the loss of congruence by pre-pruning the tree. The effect of pre-pruning was observed as a notable decrease in inference quality for Amniota HOGs.
At lower taxonomic levels (e.g., Aminotes), this improvement was associated with more precise root placement. This provides empirical evidence that denser taxon sampling can ameliorate gene tree inference of closely related species by adding information for accurate rooting. π§΅3/n
28.01.2026 06:10
π 3
π 0
π¬ 1
π 0
For each orthologous gene family, we constructed three MSAs: original, computed by applying sequence aligners directly to the input sequences; enriched, computed by aligning the combined set of orthologs and homologs identified by database search; and impoverished, obtained by removing the added homologs from the enriched MSA. We then used TCS to quantify congruence of the resulting trees against the known taxonomy. The normalized difference in congruence between the original and enriched trees captures the joint impact of sequence addition on alignment and tree inference. The normalized difference between the original and impoverished trees reflects the effect on alignment quality alone. The effect on tree inference can then be estimated by subtraction. We observed a positive impact of homolog enrichment on phylogenetic tree inference step, regardless of the sequence aligner used to build the alignments. Notably, the impact of sequence addition on alignments was marginal.
We devised a benchmark method to quantify the impact of homolog enrichment on phylogenetic inference, decomposing the effects on MSA quality, tree inference quality, and rooting. We show homolog enrichment improves tree inference, while effects on alignments remain marginal. π§΅2/n
28.01.2026 06:10
π 7
π 2
π¬ 1
π 0
AmpliPhy improves gene trees by adding homologs without affecting alignments
In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy. ### Competing Interest Statement The authors have declared no competing interest. Swiss National Science Foundation, https://ror.org/00yjd3n13, 216623, 10005715
Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. π§΅1/n
π doi.org/10.64898/2026.01.26.701724
28.01.2026 06:10
π 25
π 14
π¬ 1
π 0
Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning
Mirdita Lab builds scalable bioinformatics methods.
My time in @martinsteinegger.bsky.social's group is ending, but Iβm staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
20.01.2026 11:07
π 105
π 55
π¬ 7
π 1
OrthoFinder just dropped a major update
Itβs faster, more accurate, and ready for thousands of genomes
Letβs break it down (1/10)
github.com/OrthoFinder/...
www.biorxiv.org/content/10.1...
16.07.2025 17:51
π 126
π 73
π¬ 1
π 1
Folddisco finds similar (dis)continuous 3D motifs in large protein structure databases. Its efficient index enables fast uncharacterized active site annotation, protein conformational state analysis and PPI interface comparison. 1/9π§Άπ§¬
π www.biorxiv.org/content/10.1...
π search.foldseek.com/folddisco
07.07.2025 08:21
π 155
π 71
π¬ 8
π 3
A general substitution matrix for structural phylogenetics.
Abstract. Sequence-based maximum likelihood (ML) phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the
New paper from the lab from Sriram Garg in my group. We introduce a general substitution matrix for structural phylogenetics. I think this is a big deal, so read on below if you think deep history is important. academic.oup.com/mbe/advance-...
11.06.2025 14:01
π 96
π 52
π¬ 3
π 2
This work was done by talented @sukhwanpark.bsky.social and me, supervised by amazing @martinsteinegger.bsky.social !
Try Unicore now π conda install -c bioconda unicore
Code and tutorial: π github.com/steineggerlab/unicore
Manuscript: π doi.org/10.1093/gbe/evaf109
03.06.2025 06:54
π 5
π 0
π¬ 0
π 0
Unicore is fast, accurate, and universal. Unicore reconstructed consistent phylogeny of bacterial/fungal species, while maintaining linear time scale over the input size. Besides, Unicore works with any given taxa, presenting scalable and universal method for structure-based phylogeny. π§΅3/n
03.06.2025 06:54
π 6
π 0
π¬ 1
π 0
With Unicore, we identified 13 structural core genes from 166 species across the Tree of Life, where 8 of them could only be defined using structures. Projected on the Tree of Life reconstructed with Unicore, you can see the universally conserved structure of one of the structural core genes. π§΅2/n
03.06.2025 06:54
π 5
π 0
π¬ 1
π 0
Unicore is now published on GBE π
Unicore rapidly identifies structural single-copy core genes from input species proteomes for phylogenetic analysis. Powered by Foldseek and ProstT5, Unicore enables linear-scale structure-based phylogeny of any given set of taxa. π§΅1/n
π doi.org/10.1093/gbe/evaf109
03.06.2025 06:54
π 68
π 31
π¬ 3
π 2
AFESM: a metagenomic guide through the protein structure universe! We clustered 821M structures (AFDB&ESMatlas) into 5.12M groups; revealing biome-specific groups, only 1 new fold even after AlphaFold2 re-prediction & many novel domain combos. π§΅
π afesm.foldseek.com
π www.biorxiv.org/content/10.1...
27.04.2025 00:13
π 141
π 71
π¬ 4
π 4
Visit our posters at #RECOMB2025 for:
Structural: MSAs, Virus DB, Core Genes, Motif Discovery, Multimer Clustering & Search, pLM Foldseek, Environmental analysis
Metagenomics: Classification & Metabuli App
GPU-based & RNA search, Proteome clustering, Novel Ribozyme discovery
& get Marv stickers!
25.04.2025 07:45
π 64
π 19
π¬ 2
π 4
IQ-TREE 3: Phylogenomic Inference Software using Complex Evolutionary Models
Not really my announcement to make--I am but a lesser co-author--but IQ-TREE 3 has just been released!
(Most credit to Minh Bui and @roblanfear.bsky.social and their labs)
ecoevorxiv.org/repository/v...
10.04.2025 14:13
π 178
π 96
π¬ 2
π 6
π #AlphaFold Database update
AlphaFold DB now integrates The Encyclopedia of Domains (TED) β a resource designed to systematically identify & classify structural domains within AlphaFold-predicted protein structures.
www.ebi.ac.uk/about/news/u...
@pdbeurope.bsky.social
03.03.2025 16:33
π 118
π 44
π¬ 1
π 2
The PAN-GO paper is a remarkable milestone. It not only provides the most comprehensive picture of human gene function to date, but also carefully maps this knowledge across the tree of life! Congratulations @marcfeuermann.bsky.social, Pascale Gaudet & collaborators!
www.sib.swiss/news/sib-hel...
26.02.2025 22:37
π 16
π 12
π¬ 0
π 0
In our latest review, we explore 12 deep-learning tools for metagenomic analysis, covering their strengths, limitations, and key applications. We hope it serves as both a resource and inspiration for new ways to analyze metagenomic data. Great work by Eli Levy Karin!
π doi.org/10.1093/nsr/...
22.02.2025 05:47
π 106
π 44
π¬ 2
π 1
FastOMA retains OMAβs high precision accuracy and even improves upon it in terms of recall, positioning it on the Pareto frontier of orthology inference methods.
FastOMA is not only fast but also accurate. a, QfO benchmar, agreement with SwissTree reference phylogeny covering manually curated gene trees. The error bars indicate 95% confidence intervals comparing FastOMA with EnsemblCompara, Domainoid, OrthoMCL, Ortholnspector, sonicparanoid, PANTHER, OrthoFinder, Hieranoid26 and the OMA family including OMA pairs, OMA groups and OMA GETHOGs (graph-based efficient technique for HOGs).
c) A computation time comparison of FastOMA and state-of-the-art alternatives.
https://www.nature.com/articles/s41592-024-02552-8
FastOMA is out now in Nature Methods π: nature.com/articles/s41592-024-02552-8 A new orthology inference algorithm that scales linearly and is highly accurate. FastOMA can process all >2000 eukaryotic UniProt ref proteomes <24 hours π. Try it out github.com/DessimozLab/fastoma @dessimoz.bsky.social
03.01.2025 14:14
π 40
π 18
π¬ 1
π 0
Unicore identifies single-copy protein structures across genomes using Foldseek, bypassing slow structure predictions by utilizing 3Di predictions from ProstT5, enabling rapid phylogenetic inference at the tree-of-life scale. 1/n
π www.biorxiv.org/content/10.1...
πΎ github.com/steineggerla...
23.12.2024 16:39
π 121
π 57
π¬ 2
π 3
Unicore enables scalable and accurate phylogenetic reconstruction with structural core genes https://www.biorxiv.org/content/10.1101/2024.12.22.629535v1
23.12.2024 03:51
π 5
π 3
π¬ 0
π 0
Scientists, academics, researchers: Weβre excited to share that @altmetric.com is now tracking mentions of your research on Bluesky! π§ͺ
03.12.2024 14:10
π 29661
π 5024
π¬ 458
π 279
South Korean citizens helped lawmakers scale the National Assembly walls so they could bypass military barricades and vote against martial law.
03.12.2024 17:15
π 13584
π 3142
π¬ 81
π 419
Interested in bioinformatics method development for proteins, structures or metagenomic analysis? Please check out my labβs starter pack!
π go.bsky.app/VJhXcSs
28.11.2024 12:36
π 56
π 11
π¬ 3
π 0