James Hemker (@jahemker)

Proud that this work is now published!

We hope it helps orient future #Nanopore long-read sequencing studies hunting for structural variants

Also really happy to share eight new high-coverage (>100x) long-read read pools and assemblies with the #Drosophila community!

doi.org/10.1093/g3jo...

11.03.2026 03:44 👍 39 🔁 21 💬 0 📌 2

Time-calibrated phylogeny of the family Drosophilidae showing major species groups, surrounded by representative adult flies that highlight the remarkable morphological diversity of this model insect group. The composite image was created using Microsoft PowerPoint. Fly photographs were taken by Darren J. Obbard and are published under the Creative Commons Attribution License with permission.

The #fly community aims to achieve a comprehensive genomic study of the Drosophilidae family. @pankajd.bsky.social @bernardkim.bsky.social @petrovadmitri.bsky.social @darrenobbard.bsky.social present a comparative gene annotation for 301 #Drosophilidae species @plosbiology.org 🧪 plos.io/4c3pyrI

19.02.2026 13:55 👍 55 🔁 29 💬 0 📌 0

High-resolution mapping of a rapidly evolving complex trait reveals genotype-phenotype stability and an unpredictable genetic architecture of adaptation The extent to which adaptation can be predicted, particularly for traits with complex genetic bases, is unknown. Here, we leveraged a model complex trait, model species, and high-powered longitudinal ...

Thrilled to finally share the magnum opus of my PhD that focuses on the genetic basis of evolutionary change! Specifically, we know we can map the genetic basis of a trait, but can we tell which genes will underlie the trait shift when it evolves? doi.org/10.1101/2025...

18.11.2025 00:14 👍 65 🔁 30 💬 2 📌 3

Super excited that the bulk of my PhD work is now preprinted! Here we used whole-community competition, or coalescence, experiments to quantify selection acting on genetically diverged strains within larger communities. (1/n)
www.biorxiv.org/content/10.1...

11.11.2025 17:14 👍 102 🔁 48 💬 3 📌 2

Massively parallel interrogation of the fitness of natural variants in ancient signaling pathways reveals pervasive local adaptation The nature of standing genetic variation remains a central debate in population genetics, with differing perspectives on whether common variants are almost always neutral as suggested by neutral and n...

One of the most exciting works of my career, years in the making. We used high-throughput precision genome editing to test the fitness effects of thousands of natural variants. Our findings challenge the long-held assumption that common variants are inconsequential.

www.biorxiv.org/content/10.1...

22.10.2025 17:45 👍 165 🔁 85 💬 5 📌 6

Junior, Assistant, or Associate Specialist – Xue Lab University of California, Irvine is hiring. Apply now!

The Xue lab at UC Irvine is looking for a staff scientist to support our work investigating how microbes interact and evolve in the gut microbiome! Open to a wide range of previous experience levels, see ad for more.
recruit.ap.uci.edu/JPF09601

17.07.2025 20:32 👍 117 🔁 112 💬 0 📌 3

Unfortunately we haven’t had the chance to look at other organisms. I think repeat % could matter more if it starts creating large blocks of repetitive sequence (which would then need longer reads). More # of repeat elements should be ok, provided they are surrounded by unique flanking sequence.

21.06.2025 21:05 👍 1 🔁 0 💬 0 📌 0

Grateful to be talking at #Evol2025! Will be presenting on how long Nanopore reads need to be in order to accurately call structural variants in Drosophila at the population level. Talk is at 3pm on Saturday in the Genomics III section. If you can’t make it get in touch!

20.06.2025 22:32 👍 13 🔁 3 💬 0 📌 0

Increased rates of hybridization in swordtail fish are associated with water pollution Biodiversity loss can occur when disturbance compromises the reproductive barriers between species, causing them to collapse into a single population through hybridization. Recent research has documen...

1/9 Excited to share the preprint for the second half of my PhD with @mollyschumer.bsky.social: Connecting human environmental impacts to hybridization in swordtail fish through genomics, GIS, water chemistry, and histology. Thread below!

#SciSky

www.biorxiv.org/content/10.1...

01.05.2025 13:15 👍 59 🔁 25 💬 2 📌 2

Many thanks to my co-authors, @hgellert.bsky.social, Jess Smiley-Rhodes, @bernardkim.bsky.social, and @petrovadmitri.bsky.social, without whom this work would not have been possible!

25.04.2025 20:03 👍 1 🔁 0 💬 0 📌 0

Our results suggest that reads at least 3x longer than the largest repetitive elements are required to avoid SV-calling errors from these elements. They also highlight that SV-calling is a species-specific problem, as the repeat landscape varies greatly across taxa.

25.04.2025 20:03 👍 1 🔁 0 💬 2 📌 0

Finally, we short-read sequenced our inbred lines as the vast majority of genomic data is from NGS. Unsurprisingly, short-read data had the poorest accuracy, as well as the most significant biases against insertions and the largest number of spurious inversion calls.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

We additionally downsampled our 30x-coverage ultra-long reads to 20x- and 10x- coverage. We found that accuracy decreased even at 20x-coverage, and neither low-coverage distribution could recover all three of the cosmopolitan inversions.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

A second significant source of error came from incorrectly merging strain-level SVs into the population call sets. While joint genotyping is well-understood for SNPs, merging SVs is still a highly complex problem that needs to rely on multiple lines of evidence to find that two SVs are the same.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

We were able to determine the cause of false positive SV calls from manual validation. More than half of all errors came from misalignments due to transposable elements (TEs) or complex genomic loci — except for the ultra-long distribution, which had no major issues with TEs or complex regions.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

When we focused on SVs > 10kb, we found that the ultra-long data was very clearly the most accurate. Only the ultra-long data had 100% accuracy when calling large inversions, finding three cosmopolitan inversions in our dataset.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

We report significant shifts in SV-calling accuracy at the population level when systematically varying read length within D. melanogaster. Our ultra-long (as defined by ONT: read N50 > 50kb) read distribution, called more SVs, and at a significantly higher accuracy, than any other distribution.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

As no definitive benchmark SV call sets exist for D. melanogaster, we then manually validated more 2,300 SVs at over 18,000 genomic loci across the read-length distributions to assess variant-calling accuracy. Validation was done by visualizing read alignments in Jbrowse2.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

Using a combination of read-based (Sniffles2, cuteSV, DeBreak) and assembly-based callers (svim-asm, PAV), we called structural variants in every line and read-length distribution. Each strain's calls were merged together (Jasmine) for each distribution, creating a pop. level variant call set.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

To investigate this, we Nanopore sequenced eight D. melanogaster inbred lines to extremely high coverage (mean 238x) and then downsampled the reads to create 30x-coverage pools of distinct read-length distributions (as quantified by read N50). We additionally assembled genomes for each pool.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

Just as a drastic impact on structural variant calling was seen moving from short reads to long reads, it is likely that the accuracy of structural variant calling significantly varies along the spectrum of read lengths produced by long-read sequencing methods.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

@nanoporetech.com long-reads can range 1000-fold in length (100s bp - 1Mb) in a single sequencing run, but the consequences of significantly different long-read lengths on the accuracy of genome-wide structural variant calling is not well understood, especially for non-human species.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

The increasing availability of long-read data is leading to a boom in population-level SV datasets. These datasets are critical for uncovering SV polymorphisms, which better capture the dynamic evolutionary processes that shape SV diversity.

25.04.2025 20:03 👍 0 🔁 0 💬 1 📌 0

Manual validation finds only ultra-long long-read sequencing enables faithful, population-level structural variant calling in Drosophila melanogaster euchromatin The increasing accessibility of long-read sequencing and the rapid development of automated variant callers are promoting the generation of population-level structural variation data. However, the eff...

Excited to share the first manuscript from my PhD in which we leveraged ultra-long Nanopore sequencing, D. melanogaster inbred lines, and a ton of manual validation to investigate the effects of long-read length on population-level structural variant (SV) calling accuracy! doi.org/10.1101/2025...

25.04.2025 20:03 👍 56 🔁 24 💬 1 📌 2

Thrilled to see this work, led by @lisacouper.bsky.social now out!
We quantified variation in thermal tolerance in the mosquito, Aedes sierrensis, to quantify how adaptation may alter disease vector distributions under warming. 🧵
www.pnas.org/doi/10.1073/...

21.01.2025 15:56 👍 18 🔁 7 💬 1 📌 0

James Hemker

Latest posts by James Hemker @jahemker