Josipa Lipovac (@jlipovac)

GTDB - skani calculator An interface to compute pairwise ANI of NCBI genomes using the GTDB taxonomy.

The GTDB website now has an ANI calculator based on skani that supports uploading of user genomes. Try it at gtdb.ecogenomic.org/tools/skani.

Find more information about @jimshaw.bsky.social fantastic tool at www.nature.com/articles/s41....

11.12.2025 14:59 👍 42 🔁 24 💬 0 📌 0

I’m recruiting a postdoc to work on algorithms for cancer genome reconstruction. We have access to a rich set of tumour samples sequenced across multiple technologies. If interested, feel free to DM. Please share.

11.12.2025 03:04 👍 13 🔁 12 💬 0 📌 1

Preprint out! Check out our new long-read metagenomic SNP-caller, SNooPy 😀. Work with Chris Quince. Thread 🧵
👉 www.biorxiv.org/content/10.6...

04.12.2025 13:18 👍 13 🔁 8 💬 1 📌 0

Optimized k-mer search across millions of bacterial genomes on laptops https://www.biorxiv.org/content/10.1101/2025.11.23.690050v1

26.11.2025 16:47 👍 26 🔁 13 💬 0 📌 1

Congrats Martin! 🥳🥳

07.11.2025 09:19 👍 1 🔁 0 💬 1 📌 0

Choose your human genome reference wisely - Nature Methods Scientists can choose between multiple human genome references, and a pangenome reference is coming. Deciding what to use when is not quite straightforward.

As new human assemblies become available on beta.ensembl.org - which human reference genome will you choose? This article explores the question with insights from Ensembl’s own Fergal Martin - www.nature.com/articles/s41...

#HumanGenomics #Pangenomes #ReferenceGenomes

04.11.2025 12:28 👍 20 🔁 14 💬 0 📌 0

🚀 Looking for talented PhD students!
Join us in 🇸🇬 Singapore for 1-2 years to push the frontiers of AI for Genomics.
Work on:
🧬 Cancer genome reconstruction
🧫 Cancer genome & cell foundation models
💊 RNA drug & mRNA therapeutic design

#AI #Genomics #PhD
1/5

04.11.2025 07:32 👍 10 🔁 8 💬 1 📌 0

Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...

Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...

03.10.2025 14:51 👍 25 🔁 21 💬 2 📌 0

A complete diploid human genome benchmark for personalized genomics Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...

Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]

22.09.2025 17:01 👍 97 🔁 57 💬 4 📌 4

Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right). https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

Excited to share our EvANI benchmarking workflow, published in Briefings in Bioinformatics doi.org/10.1093/bib/...
Computing average nucleotide identity (ANI) is neither conceptually nor computationally trivial. Its definition has evolved over years, with different meanings and assumptions (1/5)

21.09.2025 15:26 👍 30 🔁 12 💬 1 📌 1

Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.

Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...

10.09.2025 09:12 👍 190 🔁 99 💬 5 📌 4

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 👍 114 🔁 80 💬 5 📌 5

🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

03.09.2025 08:39 👍 218 🔁 118 💬 3 📌 16

Our high-precision metagenomic strain caller, PHLAME, is now published in Cell Reports!! www.cell.com/cell-reports...

PHLAME works on tough sample types -- including those with coexisting strains of a species and low depth.

15.08.2025 16:33 👍 47 🔁 26 💬 2 📌 0

Proud to share our work on the first complete genome of an Indian individual - now on bioRxiv! 😄

20.07.2025 09:19 👍 3 🔁 0 💬 0 📌 0

Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals https://www.biorxiv.org/content/10.1101/2025.07.08.663658v1

11.07.2025 14:47 👍 2 🔁 2 💬 0 📌 0

📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8

27.05.2025 12:06 👍 25 🔁 16 💬 1 📌 1

Thanks Roland! It’s good to mention here that Hairsplitter is also part of it 😄

16.05.2025 18:01 👍 1 🔁 0 💬 0 📌 0

Overview of pipeline. Map reads to NCBI database of references, use EM to choose a minimal set, then use mapping info to do read classification

This looks cool from @jlipovac.bsky.social ! Strain-level metag assignment; first use EM +mapping to shrink your ref db, then do read classification
www.biorxiv.org/content/10.1...

16.05.2025 07:16 👍 12 🔁 2 💬 1 📌 0

Thanks! 😊

16.05.2025 08:37 👍 0 🔁 0 💬 1 📌 0

GitHub - lbcb-sci/MADRe: Strain-level metagenomic classification with Metagenome Assembly driven Database Reduction approach Strain-level metagenomic classification with Metagenome Assembly driven Database Reduction approach - lbcb-sci/MADRe

Work with @msikic.bsky.social, @rvicedomini.bsky.social, Kresimir Krizanovic
MADRe is open-source, modular, and ready to use.
Check it out:
🔗 github.com/lbcb-sci/MADRe
9/9

16.05.2025 08:36 👍 2 🔁 1 💬 0 📌 0

A key feature of MADRe is its focus on organisms with sufficient abundance to be assembled.
While low-abundance strains may be underrepresented, this trade-off significantly reduces false-positive identifications, a common issue in strain-level metagenomics.
8/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

We evaluated MADRe on both real and simulated datasets and observed:
✅ Comparable or improved accuracy over existing tools
✅ Clearer and more realistic abundance profiles
✅ Substantial reductions in runtime and memory usage
7/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

While assembly is often considered computationally expensive, we demonstrate that MADRe, by combining assembly with contig-level mapping, is more efficient than directly mapping large volumes of reads to a full reference database. 6/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

To complete the pipeline, MADRe maps reads to the reduced reference database and applies a second round of probabilistic reassignment.
This enhances classification sensitivity and filters false-positive identifications, enabling precise strain-level profiling. 5/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

This reduction identifies candidate genomes present in the sample.
However, this step alone does not eliminate all false positives and does not provide accurate abundance estimates. 4/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

MADRe begins by assembling the metagenomic sample and mapping the resulting contigs - often representing collapsed strains - to a (large) reference database.
Using EM-based read reassignment and info about strain collapses, we construct reduced database. 3/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

Strain-level classification requires large reference databases, especially when there is no prior knowledge about sample composition.
However, mapping reads to such large databases is computationally expensive and often impractical at scale. 2/9

16.05.2025 08:36 👍 0 🔁 0 💬 1 📌 0

I am happy to share our new preprint introducing MADRe - a pipeline for Metagenomic Assembly-Driven Database Reduction, enabling accurate and computationally efficient strain-level metagenomic classification.

🔗https://www.biorxiv.org/content/10.1101/2025.05.12.653324v1
1/9

16.05.2025 08:36 👍 15 🔁 8 💬 2 📌 0

High-quality metagenome assembly from nanopore reads with nanoMDBG https://www.biorxiv.org/content/10.1101/2025.04.22.649928v1

25.04.2025 00:46 👍 15 🔁 17 💬 0 📌 1

Josipa Lipovac

Latest posts by Josipa Lipovac @jlipovac