Very happy to share the latest paper of our group (et al.)! rdcu.be/e7zyX . This one has a special place… 1/n
Very happy to share the latest paper of our group (et al.)! rdcu.be/e7zyX . This one has a special place… 1/n
1/ Our paper on Multi-Context Seeds is now out, with @tolyan.bsky.social spearheading the work and contributions from Nicolas and @marcelm.net. We introduce a new seeding concept that improves read alignment accuracy while maintaining speed.
link.springer.com/article/10.1...
Can't wait to release a 10-year-old birthday version for SeqKit!
- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads
Thank you all, dear contributors and users!
I'll keep maintaining it.
github.com/shenwei356/s...
How would you design a *multithreaded*, *concurrent* & *dynamic* hash table if you are focused specifically on common k-mer workloads, where streaming query & insertion are common? Jamshed, Prashant and I explore this in kache-hash, a cache-friendly k-mer hash table!
www.biorxiv.org/content/10.6...
Interested in developing your skills in microbial 'omics? Consider joining us in Brest 🇫🇷, Oct. 10-24 for two weeks of intensive lectures an tutorial from top faculties and TA! maignienlab.gitlab.io/ebame/
Bonus: beautiful seascape and friendly spirit!
Preprint alert!
arxiv.org/abs/2602.03525
TLDR:
ZOR filters are STATIC filters with false positives.
-Almost memory optimal: <1% overhead over the theoretical lower bound (!!!)
-Fast queries: ~100 ns
-Construction cannot fail
A thread:
FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
📄 www.science.org/doi/10.1126/...
🌐 search.foldseek.com/foldmason
💾 github.com/steineggerla...
Announcing a new tool for "denoising" long-read amplicon sequences: savont.
Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.
1/4
github.com/bluenote-157...
Very excited about this latest work led by @jermp.bsky.social! Since it's initial release, SSHash has served as the basis for several other tools (Fulgor, piscem, etc.). It was already very fast. It is now *substantially* faster!
www.biorxiv.org/content/10.6...
🗜️⚡ If you use gzip/gunzip a lot in your pipelines, switch to the faster"libdeflate" versions instead! They use modern CPU capabilities to achieve a 2-3x speedup.
libdeflate is in conda, and "libdeflate-gzip" and "libdeflate-gunzip" are drop-in replacements. #unix
github.com/ebiggers/lib...
We just released #anvio v9, "eunice" 🎉
This version represents over 2,000 changes in the codebase since v8, increasing the total number of programs in the anvi'o ecosystem to 176.
Read the release notes:
github.com/merenlab/anv...
Visit our up-to-date web page:
anvio.org
🫣
Now published in Algorithms for Molecular Biology: link.springer.com/article/10.1.... Key message: a tiny CNN model with 7k parameters can capture main splice signals across vertebrates+insect and halves the minimap2 & miniprot junction error rate. I always use this new feature now.
Now published in Nature Biotechnology:
go.nature.com/44P7nSm
If you missed it, the TL;DR is in my April thread below
💾 Prokka 1.15.6 is released!
This is the last major release of Prokka. But don't be sad, because @oschwengers.bsky.social already has an excellent replacement called Bakta you can migrate to.
#bioinformatics #microbiology #genomics
github.com/tseemann/pro...
Preprint Alert!
With @tmthrz.bsky.social and @rayanchikhi.bsky.social we aim to tackle practical unitigs compression!
A thread:
1/9 Just out:
k-mer indexes are the backbone of fast search in genomic data, but many degrade under small k, subsampling, or high diversity.
With Ondřej Sladký and @pavelvesely.bsky.social we asked: can we build one that works efficiently for any k-mer set?
Preprint out! Check out our new long-read metagenomic SNP-caller, SNooPy 😀. Work with Chris Quince. Thread 🧵
👉 www.biorxiv.org/content/10.6...
Preprint alert!
We introduce new ideas to revisit the notion of sampling with window guarantees, also known as minimizers.
A thread:
We are thrilled to announce the first official release (v0.1.8) of #𝗯𝗲𝗱𝗱𝗲𝗿, the successor to one of our flagship tool, #𝗯𝗲𝗱𝘁𝗼𝗼𝗹𝘀! Based on ideas we conceived of long ago (!), this was achieved thanks to the dedication of Brent Pedersen.
1/n
Preprint Alert!
We present new strategies to accelerate large-scale document comparison using MinHash-like sketches.
A thread:
Ok; mim (github.com/COMBINE-lab/...) preprint submitted! Excited for folks to see it and share thoughts. The key takeaway; mim allows the quick, one-time, building of a small auxiliary index that then allows scaling gzipped FASTQ parsing linearly in # of threads. 1/2
Yohan Hernandez–Courbevoie presenting REINDEER2 at Seqbim!
For those who missed it, the introduction thread of REINDEER2
bsky.app/profile/npma...
@wytamma.bsky.social : so, it took a little bit of extra time (not the flight back from the CZI meeting), but I decided to just f#&$ing do it, and the basic code to build and parse with the auxiliary fastq index is working (github.com/COMBINE-lab/...). 1/2
Logo of Bin Chicken (Australian white ibis) on a rubbish bin, pulling out a strand of DNA
“Bin Chicken” is now published in Nature Methods! It substantially improves genome recovery through rational coassembly 🧬🖥️. Applied to public 🌍 metagenomes, we recovered 24,000 novel species 🦠, including 6 new phyla.
doi.org/10.1038/s415...
@benjwoodcroft.bsky.social @rhysnewell.bsky.social
🧵1/6
Metagenomics colleagues!
I'm looking for studies where both Illumina and ONT sequencing were performed on the same samples from soil, human, ruminent, and other sample types for comparison. Bonus if those studies include PacBio data.
Please help and share!
Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...
I was here 😺
1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...
Ca n'est pas si souvent, un article publié dans Nature met ma communauté à l'honneur (la bioinformatique des séquences). Je vous raconte ?
www.nature.com/articles/d41...