Heng Li's Avatar

Heng Li

@lh3lh3

Associate Professor DFCI & HMS

1,109
Followers
114
Following
53
Posts
09.09.2023
Joined
Posts Following

Latest posts by Heng Li @lh3lh3

πŸ–₯️ 🧬 βš›οΈ πŸ”¬ Plz spread the word. 2-week computational biology workshop in Singapore. No registration fee or expenses for US citizens. Protein annotation, function, structural modeling, and simulation. ~35 attendees. Lectures and hands-on tutorials with intensive instructor interaction. compbioasia.net

02.03.2026 21:53 πŸ‘ 4 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0
Preview
Detecting foldback artifacts in long-reads - BMC Genomics Long-read sequencing data is useful for detecting large and complex structural variations; however, technical artifacts can lead to false structural variant calls. In our analyses, we became aware of ...

Our paper on foldback artifacts in long-read sequencing is now published in BMC Genomics!

We introduce Breakinator to flag foldback and chimeric artifacts across library types, sequencers, and chemistries.

Paper: link.springer.com/article/10.1...

With Matthew Meyerson and @lh3lh3.bsky.social

24.02.2026 16:02 πŸ‘ 17 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0
Preview
PacBio Completes Sale of Short-Read Sequencing Assets - PacBio MENLO PARK, Calif., Feb. 02, 2026 (GLOBE NEWSWIRE) β€” PacBio (NASDAQ: PACB), a leading developer of high-quality, highly accurate sequencing solutions, today announced the completion of the sale of sel...

Pacific Biosciences Sells Short-Read Sequencing Assets to Illumina for $48.1M
www.pacb.com/press_releas...

02.02.2026 16:49 πŸ‘ 7 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - bluenote-1577/savont: Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads - bluenote-1577/savont

Announcing a new tool for "denoising" long-read amplicon sequences: savont.

Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.

1/4

github.com/bluenote-157...

28.01.2026 18:45 πŸ‘ 51 πŸ” 28 πŸ’¬ 1 πŸ“Œ 2
HLi Lab - Vacancies Openings

I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see hlilab.github.io/vacancies. RTs appreciated!

14.01.2026 15:44 πŸ‘ 43 πŸ” 64 πŸ’¬ 1 πŸ“Œ 0

Probably very minor

07.01.2026 00:20 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Now published in Algorithms for Molecular Biology: link.springer.com/article/10.1.... Key message: a tiny CNN model with 7k parameters can capture main splice signals across vertebrates+insect and halves the minimap2 & miniprot junction error rate. I always use this new feature now.

06.01.2026 23:02 πŸ‘ 58 πŸ” 20 πŸ’¬ 1 πŸ“Œ 0

Now published in gigascience: academic.oup.com/gigascience/.... Key messages: SVs are highly enriched in low-complexity/tandem-repeat regions and are harder to call. They behave differently from transposon insertions. Always stratify if you study SVs.

06.01.2026 22:55 πŸ‘ 33 πŸ” 9 πŸ’¬ 0 πŸ“Œ 0
S3 Bucket Browser

To those who have open data at AWS cloud: you can use the S3 Bucket Browser to list files in your buckets. You can either 1) put the bucket name at lh3.github.io/s3bb/, 2) or copy this index.html github.com/lh3/s3bb/blo... to the root of a bucket. Generated by gemini.

22.12.2025 17:41 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Thanks to the AWS Open Data program, this dataset along some derived data is also openly accessible via @AWSCloud at openhgl.s3.us-east-1.amazonaws.com/index.html

22.12.2025 17:01 πŸ‘ 22 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0

Both SMEM and PML only use the rank operation, so yes, it should be easy to implement them in both tools. This just hasn't been done.

21.12.2025 14:11 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This plot compares SMEM from ropebwt3 and PML from movi. Ropebwt3 and movi don't support the same query type. It is hard to do an apple-to-apple comparison, but movi is indeed a lot faster.

21.12.2025 04:14 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Home - ProbGen 2026 Your Site Description

The registration deadline is fast approaching for probgen 2026! Abstracts due by January 15, registration by January 31

probgen2026.github.io

18.12.2025 17:09 πŸ‘ 16 πŸ” 18 πŸ’¬ 1 πŸ“Œ 0

I’m recruiting a postdoc to work on algorithms for cancer genome reconstruction. We have access to a rich set of tumour samples sequenced across multiple technologies. If interested, feel free to DM. Please share.

11.12.2025 03:04 πŸ‘ 13 πŸ” 12 πŸ’¬ 0 πŸ“Œ 1

I guess there will be a version but probably not soon

03.12.2025 16:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - lh3/human-asm: A collection of high-quality human genomes A collection of high-quality human genomes. Contribute to lh3/human-asm development by creating an account on GitHub.

579 high-quality human genomes from @humanpangenome.bsky.social, Arab Pangenome and individual papers (CHM13, CN1, KSA001, I002C, YAO and KOREF1). Sequences available in the AGC format (3.7GB) and FM-index in the ropebwt3 format (20.3GB). For details, see github.com/lh3/human-asm

03.12.2025 03:44 πŸ‘ 57 πŸ” 23 πŸ’¬ 1 πŸ“Œ 1
https://ds.dfci.harvard.edu/postdocs/

πŸ“’ We are taking applications for our Postdoctoral Fellows Program at Harvard/DFCI!

πŸ”ΉJoin a research group in our department
πŸ”ΉCo-mentoring opportunities with 2+ faculty
πŸ”ΉCollaborate with investigators beyond our department
πŸ”ΉSalary starts at $75K

Apply here: t.co/B7SLZzQFKu

01.12.2025 18:59 πŸ‘ 9 πŸ” 15 πŸ’¬ 0 πŸ“Œ 0

Back online. Not sure if it is a bug in my code or a hiccup at the hosting service.

01.10.2025 14:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/o.... Work with Alvin Qin

30.09.2025 02:19 πŸ‘ 84 πŸ” 30 πŸ’¬ 0 πŸ“Œ 1

arXiv accepted our assembly review two years ago. That was written in MS Word, so PDF-only. Nonetheless, at that time they didn't require TeX source as I remember. Something might have been changed internally.

29.09.2025 01:26 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

And learn what fully AI-generated websites look like. Avoid them, as they are more likely to be scam.

15.09.2025 12:18 πŸ‘ 9 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

Heads up: ignore samtools dot org, similarly minimap2 dot com and likely others. It's owned by a known phishing site and while the binaries they offer look valid currently (but note they may be serving us different binaries to others), that could change.

Ie: it's not us (Samtools team)! Be warned

15.09.2025 08:40 πŸ‘ 146 πŸ” 127 πŸ’¬ 2 πŸ“Œ 5
Post image

New blog post – A quick look at Roche's SBX
lh3.github.io/2025/09/11/a...

12.09.2025 03:26 πŸ‘ 57 πŸ” 30 πŸ’¬ 2 πŸ“Œ 3

Now preprinted at arxiv.org/abs/2509.07357

10.09.2025 02:10 πŸ‘ 22 πŸ” 7 πŸ’¬ 0 πŸ“Œ 0
Preview
Phishing site : minimap2.com Β· Issue #1316 Β· lh3/minimap2 Not sure how to label this one, but I have come across a website minimap2.com which appears to be AI generated but is serving it's own copy of the Github repository. If you search the address or em...

minimap2.com is potentially a phishing site. Please don't use anything from that website.
github.com/lh3/minimap2...

09.09.2025 15:39 πŸ‘ 27 πŸ” 27 πŸ’¬ 1 πŸ“Œ 2

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 πŸ‘ 114 πŸ” 80 πŸ’¬ 5 πŸ“Œ 5

High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1

07.09.2025 04:47 πŸ‘ 18 πŸ” 9 πŸ’¬ 0 πŸ“Œ 1

Of course, also thank Andrea Guarracino and Andrew Carroll for their quick and careful review!

04.09.2025 17:09 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

"Received: July 4, 2025. Revised: August 7, 2025. Accepted: August 15, 2025" and published on September 4. This is a simple and straightforward paper, but the speedy editorial process is still impressive. It could have been even faster if I had responded the initial editorial request more timely.

04.09.2025 16:55 πŸ‘ 10 πŸ” 1 πŸ’¬ 2 πŸ“Œ 1

Now published in GigaScience with minor improvements: academic.oup.com/gigascience/...

* Download: zenodo.org/records/1490...
* More info: github.com/lh3/panmask

04.09.2025 16:44 πŸ‘ 31 πŸ” 10 πŸ’¬ 1 πŸ“Œ 1