Gaëtan Benoit's Avatar

Gaëtan Benoit

@gaetanbenoit

Postdoc researcher in bioinformatics at Pasteur institute. Scalable methods and software for metagenomics. https://github.com/GaetanBenoitDev

116
Followers
59
Following
7
Posts
06.01.2024
Joined
Posts Following

Latest posts by Gaëtan Benoit @gaetanbenoit

Preview
Water mass specific genes dominate the Southern Ocean microbiome Nature Communications - Southern Ocean microbial communities are less well studied. Here, the authors generate a circumpolar-scale gene catalog from 218 metagenomics samples revealing broadscale...

Very happy to share the latest paper of our group (et al.)! rdcu.be/e7zyX . This one has a special place… 1/n

10.03.2026 19:45 👍 12 🔁 6 💬 1 📌 1
Preview
Multi-context seeds enable fast and high-accuracy read mapping - Genome Biology A key step in sequence similarity search is to identify shared seeds between a query and a reference sequence. A well-known tradeoff is that longer seeds offer fast searches but reduce sensitivity in ...

1/ Our paper on Multi-Context Seeds is now out, with @tolyan.bsky.social spearheading the work and contributions from Nicolas and @marcelm.net. We introduce a new seeding concept that improves read alignment accuracy while maintaining speed.
link.springer.com/article/10.1...

09.03.2026 12:22 👍 19 🔁 12 💬 1 📌 0
Preview
Release SeqKit v2.13.0 (10-year-old birthday version) · shenwei356/seqkit Changelog SeqKit is 10 years old! SeqKit v2.13.0 - 2026-02-28 seqkit: add support for reading and writing LZ4 compression format. new command: seqkit sample2: improved seqkit sample by @stahiga....

Can't wait to release a 10-year-old birthday version for SeqKit!

- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads

Thank you all, dear contributors and users!
I'll keep maintaining it.

github.com/shenwei356/s...

27.02.2026 13:25 👍 124 🔁 35 💬 6 📌 1

How would you design a *multithreaded*, *concurrent* & *dynamic* hash table if you are focused specifically on common k-mer workloads, where streaming query & insertion are common? Jamshed, Prashant and I explore this in kache-hash, a cache-friendly k-mer hash table!
www.biorxiv.org/content/10.6...

17.02.2026 18:49 👍 20 🔁 13 💬 0 📌 0
EBAME workshop EBAME - Computational Microbial Ecogenomics Workshop

Interested in developing your skills in microbial 'omics? Consider joining us in Brest 🇫🇷, Oct. 10-24 for two weeks of intensive lectures an tutorial from top faculties and TA! maignienlab.gitlab.io/ebame/
Bonus: beautiful seascape and friendly spirit!

16.02.2026 10:03 👍 9 🔁 7 💬 0 📌 0
Preview
ZOR filters: fast and smaller than fuse filters Probabilistic membership filters support fast approximate membership queries with a controlled false-positive probability $\varepsilon$ and are widely used across storage, analytics, networking, and b...

Preprint alert!
arxiv.org/abs/2602.03525
TLDR:
ZOR filters are STATIC filters with false positives.
-Almost memory optimal: <1% overhead over the theoretical lower bound (!!!)
-Fast queries: ~100 ns
-Construction cannot fail

A thread:

04.02.2026 12:28 👍 31 🔁 12 💬 1 📌 1
Preview
Multiple protein structure alignment at scale with FoldMason Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended ou...

FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
📄 www.science.org/doi/10.1126/...
🌐 search.foldseek.com/foldmason
💾 github.com/steineggerla...

30.01.2026 06:11 👍 300 🔁 147 💬 4 📌 3
Preview
GitHub - bluenote-1577/savont: Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads - bluenote-1577/savont

Announcing a new tool for "denoising" long-read amplicon sequences: savont.

Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.

1/4

github.com/bluenote-157...

28.01.2026 18:45 👍 51 🔁 28 💬 1 📌 2

Very excited about this latest work led by @jermp.bsky.social! Since it's initial release, SSHash has served as the basis for several other tools (Fulgor, piscem, etc.). It was already very fast. It is now *substantially* faster!

www.biorxiv.org/content/10.6...

22.01.2026 21:16 👍 24 🔁 10 💬 1 📌 1
Preview
GitHub - ebiggers/libdeflate: Heavily optimized library for DEFLATE/zlib/gzip compression and decompression Heavily optimized library for DEFLATE/zlib/gzip compression and decompression - ebiggers/libdeflate

🗜️⚡ If you use gzip/gunzip a lot in your pipelines, switch to the faster"libdeflate" versions instead! They use modern CPU capabilities to achieve a 2-3x speedup.

libdeflate is in conda, and "libdeflate-gzip" and "libdeflate-gunzip" are drop-in replacements. #unix

github.com/ebiggers/lib...

20.01.2026 01:37 👍 71 🔁 23 💬 1 📌 0
Post image

We just released #anvio v9, "eunice" 🎉

This version represents over 2,000 changes in the codebase since v8, increasing the total number of programs in the anvi'o ecosystem to 176.

Read the release notes:

github.com/merenlab/anv...

Visit our up-to-date web page:

anvio.org

20.01.2026 11:48 👍 71 🔁 34 💬 2 📌 3

🫣

17.01.2026 13:37 👍 3 🔁 0 💬 1 📌 0

Now published in Algorithms for Molecular Biology: link.springer.com/article/10.1.... Key message: a tiny CNN model with 7k parameters can capture main splice signals across vertebrates+insect and halves the minimap2 & miniprot junction error rate. I always use this new feature now.

06.01.2026 23:02 👍 58 🔁 20 💬 1 📌 0

Now published in Nature Biotechnology:
go.nature.com/44P7nSm
If you missed it, the TL;DR is in my April thread below

06.01.2026 09:38 👍 59 🔁 35 💬 1 📌 0
Release Heading into the sunset · tseemann/prokka The future This is probably the last release of Prokka. I won't be making any code changes except bug fixes. I will update the databases occasionally. I strongly recommend you use Bakta by @oschwen...

💾 Prokka 1.15.6 is released!

This is the last major release of Prokka. But don't be sad, because @oschwengers.bsky.social already has an excellent replacement called Bakta you can migrate to.
#bioinformatics #microbiology #genomics

github.com/tseemann/pro...

15.12.2025 21:09 👍 117 🔁 60 💬 3 📌 2

Preprint Alert!
With @tmthrz.bsky.social and @rayanchikhi.bsky.social we aim to tackle practical unitigs compression!
A thread:

15.12.2025 15:18 👍 18 🔁 12 💬 2 📌 0

1/9 Just out:

k-mer indexes are the backbone of fast search in genomic data, but many degrade under small k, subsampling, or high diversity.

With Ondřej Sladký and @pavelvesely.bsky.social we asked: can we build one that works efficiently for any k-mer set?

05.12.2025 17:42 👍 27 🔁 13 💬 1 📌 1
Post image

Preprint out! Check out our new long-read metagenomic SNP-caller, SNooPy 😀. Work with Chris Quince. Thread 🧵
👉 www.biorxiv.org/content/10.6...

04.12.2025 13:18 👍 13 🔁 8 💬 1 📌 0

Preprint alert!

We introduce new ideas to revisit the notion of sampling with window guarantees, also known as minimizers.

A thread:

02.12.2025 11:11 👍 15 🔁 7 💬 1 📌 2
Intro to Bedder – The Quinlan Lab

We are thrilled to announce the first official release (v0.1.8) of #𝗯𝗲𝗱𝗱𝗲𝗿, the successor to one of our flagship tool, #𝗯𝗲𝗱𝘁𝗼𝗼𝗹𝘀! Based on ideas we conceived of long ago (!), this was achieved thanks to the dedication of Brent Pedersen.

1/n

02.12.2025 02:28 👍 298 🔁 152 💬 5 📌 11

Preprint Alert!
We present new strategies to accelerate large-scale document comparison using MinHash-like sketches.

A thread:

01.12.2025 14:57 👍 12 🔁 8 💬 1 📌 0
Preview
GitHub - COMBINE-lab/mim: A small, auxiliary index to massively improve parallel fastq parsing A small, auxiliary index to massively improve parallel fastq parsing - COMBINE-lab/mim

Ok; mim (github.com/COMBINE-lab/...) preprint submitted! Excited for folks to see it and share thoughts. The key takeaway; mim allows the quick, one-time, building of a small auxiliary index that then allows scaling gzipped FASTQ parsing linearly in # of threads. 1/2

25.11.2025 14:13 👍 27 🔁 13 💬 1 📌 3

Yohan Hernandez–Courbevoie presenting REINDEER2 at Seqbim!

For those who missed it, the introduction thread of REINDEER2

bsky.app/profile/npma...

24.11.2025 12:41 👍 6 🔁 2 💬 0 📌 1
Preview
GitHub - COMBINE-lab/mim: A small, auxiliary index to massively improve parallel fastq parsing A small, auxiliary index to massively improve parallel fastq parsing - COMBINE-lab/mim

@wytamma.bsky.social : so, it took a little bit of extra time (not the flight back from the CZI meeting), but I decided to just f#&$ing do it, and the basic code to build and parse with the auxiliary fastq index is working (github.com/COMBINE-lab/...). 1/2

19.11.2025 03:01 👍 25 🔁 15 💬 3 📌 0
Logo of Bin Chicken (Australian white ibis) on a rubbish bin, pulling out a strand of DNA

Logo of Bin Chicken (Australian white ibis) on a rubbish bin, pulling out a strand of DNA

“Bin Chicken” is now published in Nature Methods! It substantially improves genome recovery through rational coassembly 🧬🖥️. Applied to public 🌍 metagenomes, we recovered 24,000 novel species 🦠, including 6 new phyla.
doi.org/10.1038/s415...
@benjwoodcroft.bsky.social @rhysnewell.bsky.social
🧵1/6

13.11.2025 10:08 👍 73 🔁 38 💬 2 📌 4

Metagenomics colleagues!

I'm looking for studies where both Illumina and ONT sequencing were performed on the same samples from soil, human, ruminent, and other sample types for comparison. Bonus if those studies include PacBio data.

Please help and share!

11.11.2025 20:21 👍 15 🔁 21 💬 6 📌 1
Preview
Genome size estimation from long read overlaps AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin

Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...

07.11.2025 03:18 👍 37 🔁 16 💬 1 📌 1

I was here 😺

26.10.2025 15:00 👍 1 🔁 0 💬 1 📌 0
Preview
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi

1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...

21.10.2025 20:00 👍 44 🔁 24 💬 1 📌 2
Preview
‘Google for DNA’ brings order to biology’s big data MetaGraph compresses vast data archives into a search engine for scientists, opening up new frontiers of biological discovery.

Ca n'est pas si souvent, un article publié dans Nature met ma communauté à l'honneur (la bioinformatique des séquences). Je vous raconte ?
www.nature.com/articles/d41...

09.10.2025 15:00 👍 28 🔁 14 💬 1 📌 1