6/ Huge thanks to my mentor @mikeschatz.bsky.social and all our collaborators for their support!
6/ Huge thanks to my mentor @mikeschatz.bsky.social and all our collaborators for their support!
5/ Across simulated datasets and real microbiomes, Perseus:
- substantially increases lineage-consistent precision
- reduces false assignments
- maintains strong classification coverage
The goal is more reliable taxonomic predictions when reference are incomplete.
4/ To evaluate this, we built controlled taxonomic exclusion experiments.
We systematically removed species, genera, and families from the Kraken2 reference database and classified reads from those missing taxa to simulate novel organisms.
3/ Implemented as a post-processing step for Kraken2, Perseus models the spatial consistency and taxonomic hierarchy of k-mer evidence along each sequence.
It estimates confidence across ranks (domain โ species) and allows predictions to back off to higher ranks when species-level evidence is weak.
2/ Tools like Kraken2 are widely used because they enable ultra fast taxonomic classification.
But when reference databases are incomplete, a common problem for environmental microbiomes, they can produce overly specific species-level assignments that are actually false positives.
1/ Excited to share my first first-author preprint from my PhD!
We introduce Perseus, a lineage-aware confidence estimation framework for taxonomic classification in long-read metagenomics.
Preprint: www.biorxiv.org/content/10.6...
Code: github.com/matnguyen/Pe...
Johns Hopkins University geneticists and researchers from across the country, including students, are helping map the U.S. soil microbiome, uncovering thousands of previously unknown microbes to better understand the planet's most biodiverse habitats. https://bit.ly/3YBMmqD
Finally on BlueSky, and happy to have had the opportunity to give a talk at Genome Informatics!