Vikram Shivakumar's Avatar

Vikram Shivakumar

@vikramshivakumar

PhD Student @ JHU Langmead Lab

158
Followers
131
Following
26
Posts
06.12.2023
Joined
Posts Following

Latest posts by Vikram Shivakumar @vikramshivakumar

1/ Excited to share my first first-author preprint from my PhD!

We introduce Perseus, a lineage-aware confidence estimation framework for taxonomic classification in long-read metagenomics.

Preprint: www.biorxiv.org/content/10.6...
Code: github.com/matnguyen/Pe...

09.03.2026 15:25 πŸ‘ 14 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0

I wish I had this a few months back while searching for a postdoc πŸ˜…

02.03.2026 15:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Awesome! Thanks for sharing the code too!

02.03.2026 15:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Does this take into account how many pubs the first author had at publication time or what they have currently, when applying the filter? This is a really nifty measure and tool!

02.03.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

As an alternative to the h-index, I made the Mentorship Index (M-index) to proxy a scientist's contribution to mentoring junior scientists. Ex. M10-index = # last-author publications where the first author had < 10 pubs.

Calculate yours: jef.works/Mentorship-I...
Blog: jef.works/blog/2026/03...

πŸ§΅πŸ‘‡

02.03.2026 13:25 πŸ‘ 10 πŸ” 3 πŸ’¬ 4 πŸ“Œ 2
Post image

New tool from @alexsweeten.bsky.social to find and classify all your satellites: "AniAnn's: alignment-free annotation of tandem repeat arrays using fast average nucleotide identity estimates"
πŸ“„ www.biorxiv.org/content/10.6...
πŸ“¦ github.com/marbl/anianns

29.01.2026 13:05 πŸ‘ 44 πŸ” 19 πŸ’¬ 4 πŸ“Œ 1
Jobs | University of Utah Founded in 1850, The University of Utah is the flagship institution of higher learning in Utah, and offers over 100 undergraduate and more than 90 graduate degree programs to over 30,000 students. Uni...

I am hiring a staff bioinformatician for my new lab at the University of Utah! Please consider applying if you are on the hunt:
employment.utah.edu/salt-lake-ci...

12.01.2026 21:04 πŸ‘ 31 πŸ” 32 πŸ’¬ 1 πŸ“Œ 1
HLi Lab - Vacancies Openings

I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see hlilab.github.io/vacancies. RTs appreciated!

14.01.2026 15:44 πŸ‘ 43 πŸ” 64 πŸ’¬ 1 πŸ“Œ 0

(t)rust the process?

23.11.2025 06:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Really excited to see our new work in scaling Mumemto to any size pangenome published in Genome Research this morning. And right on cue with the great opportunity to present this work at #GI2025 this week.

07.11.2025 21:29 πŸ‘ 16 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image

Nicole Brown gave a fantastic talk on Identifying introgressions across pangenomes with Panagram

It uses k-mer conservation to annotate genomic variation across hundreds of genomes, followed by normalization of k-mer profiles to identify introgression events
github.com/kjenike/pana... #GI2025

06.11.2025 02:51 πŸ‘ 9 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Figure 1: (A) Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables compu- tation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade-based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Com- mon Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found us- ing the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method (sec- tion 2.6).

Figure 1: (A) Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables compu- tation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade-based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Com- mon Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found us- ing the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method (sec- tion 2.6).

Post image Post image

Fantastic talk by @vikramshivakumar.bsky.social Mumemtoβ€”Scalable multi-MUM finding for pangenomes
Papers biorxiv.org/content/10.1101/2025.05.20.654611 & doi.org/10.1186/s13059-025-03644-0
Code: github.com/vikshiv/mume...
Very efficient pangenome visualization tool, revealing synteny and variations!

06.11.2025 01:13 πŸ‘ 23 πŸ” 12 πŸ’¬ 1 πŸ“Œ 1

Looking forward to lots of great talks from JHU folks at CSHL this week!

04.11.2025 18:20 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Now this, undergrads, is how you cold email a professor.

03.11.2025 20:32 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

If that’s not enough, we threw in a complete, T2T giraffe genome! Giraffe genomes are pretty cool. Almost all of their chromosomes are Robertsonian fusions of the typically telocentric ruminant chromosomes. πŸ„ vs. πŸ¦’...

10.10.2025 15:26 πŸ‘ 9 πŸ” 4 πŸ’¬ 2 πŸ“Œ 0

Last week we were in the Washington Post for our characterization of Robertsonian chromosomes. This week we are entering our 10th day of being shut down and all of our research is on hold. To help me feel not-so-bad, here is a thread of some studies we released right before the shutdown 🧡 [1/n]...

10.10.2025 15:24 πŸ‘ 48 πŸ” 17 πŸ’¬ 1 πŸ“Œ 0
Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right).
https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right). https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

Excited to share our EvANI benchmarking workflow, published in Briefings in Bioinformatics doi.org/10.1093/bib/...
Computing average nucleotide identity (ANI) is neither conceptually nor computationally trivial. Its definition has evolved over years, with different meanings and assumptions (1/5)

21.09.2025 15:26 πŸ‘ 30 πŸ” 12 πŸ’¬ 1 πŸ“Œ 1

10/10 tool name πŸ‘Œ

22.08.2025 13:43 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables computation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade- based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Common Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found using the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method.

Anchor-based merging requires a common sequence (red) present in each partition. Multi-MUMs are merged by identifying overlaps between partition-specific matches in the anchor coordinate space, and a uniqueness threshold determines if a MUM is still unique in each partition after truncation. (B) String-based merging enables computation of multi-MUMs between partitions without a common sequence. An example tree (left) is shown, highlighting the use case where partial multi-MUMs specific to internal nodes (starred) can be computed by merging subclade- based partitions up a tree. (right) MUM overlaps are computed by running Mumemto on the MUM sequences, and the uniqueness threshold array ensures overlaps remain unique across the merged dataset. (C) An example Burrows-Wheeler Transform (BWT), matrix (BWM), and Longest Common Prefix (LCP) array, with sequence IDs for each suffix shown (ID). A non-maximal unique match (UM) is shown, and the uniqueness threshold for this match is found using the flanking LCP values. (D) A partial multi-MUM (in blue) is found in all-but-one sequence (excluded in red). Using two anchor sequences (red and orange), all-but-one partial MUMs can be computed using an augmented anchor-based merging method.

Post image (A) Phylogeny of geographically diverse A. thaliana accessions (Lian et al. 2024), with broad geographical regions colored. Internal nodes are labeled with the coverage of partial multi-MUMs across the leaves of each node. Internal node partial MUMs are computed by merging subtree-based partitions progressively up the phylogeny. (B) Global multi-MUM synteny across the full dataset shown in blue (with inversions in green). Global MUMs are computed by merging all partitions together (representing the root node). Additionally, three geographically distinct subgroups are highlighted and partition-specific multi-MUMs (in purple, with inversions in pink) reveal local structural variation in centromeric regions.

(A) Phylogeny of geographically diverse A. thaliana accessions (Lian et al. 2024), with broad geographical regions colored. Internal nodes are labeled with the coverage of partial multi-MUMs across the leaves of each node. Internal node partial MUMs are computed by merging subtree-based partitions progressively up the phylogeny. (B) Global multi-MUM synteny across the full dataset shown in blue (with inversions in green). Global MUMs are computed by merging all partitions together (representing the root node). Additionally, three geographically distinct subgroups are highlighted and partition-specific multi-MUMs (in purple, with inversions in pink) reveal local structural variation in centromeric regions.

Great talk by Vikram @vikramshivakumar.bsky.social on studying pangenomes and synteny visualization in #WABI25
Github: github.com/vikshiv/mume...
First paper: genomebiology.biomedcentral.com/articles/10....
Second: www.biorxiv.org/content/10.1... #WABI2025

20.08.2025 15:03 πŸ‘ 22 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
Post image

Vikram Shivakumar telling us about "Partitioned Multi-MUM finding for scalable pangenomics" #WABI25! So many kinds of matches!

20.08.2025 14:33 πŸ‘ 8 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Protein language models reveal evolutionary constraints on synonymous codon choice Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...

This preprint from Helen Sakharova is one of the coolest things to come out of my lab: β€œProtein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...

07.08.2025 08:29 πŸ‘ 215 πŸ” 83 πŸ’¬ 6 πŸ“Œ 4

Not saying I agree either way, but one pro for text-based file formats are less dependencies needed for viewing files

06.08.2025 22:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This is so amazing, thank you!

31.07.2025 13:00 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
comic doodle of Vikram Shivakumar in a sweater and checkered shirt on a pink gradient background, with various elements of the talk to the left: two old moms pointing at MUMs, below an explanation of what those are (large chunks of the same DNA sequence through the genome), at the bottom a few of the organisms worked on: a tomato, a potato, an arabidopsis weed.

comic doodle of Vikram Shivakumar in a sweater and checkered shirt on a pink gradient background, with various elements of the talk to the left: two old moms pointing at MUMs, below an explanation of what those are (large chunks of the same DNA sequence through the genome), at the bottom a few of the organisms worked on: a tomato, a potato, an arabidopsis weed.

#SciArt doodle of @vikramshivakumar.bsky.social's talk yesterday at the @sangerinstitute.bsky.social on MUMs*

*maximal unique matches in pangenomes, now if you did that on sequenced moms you could do mummoms

31.07.2025 07:51 πŸ‘ 18 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0

Excited to share our new preprint on detecting foldback artifacts in long reads with my advisors Matthew Meyerson and @lh3lh3.bsky.social ! Stop by poster C-180 on Wednesday at ISMB/ECCB2025 to learn more and chat!

21.07.2025 14:26 πŸ‘ 4 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Post image

And of course, the poster itself:

21.07.2025 17:00 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

If you’re in Liverpool, stop by my poster A217 at ISMB/EECB 2025, and chat about all things pangenomes, MUMs, and alignment (and the Beatles or Oasis-mania)

21.07.2025 15:11 πŸ‘ 16 πŸ” 6 πŸ’¬ 2 πŸ“Œ 0

Really excited to see this published! To more mum-finding 🍻

17.06.2025 15:01 πŸ‘ 21 πŸ” 7 πŸ’¬ 2 πŸ“Œ 0
We are what we index; a primer for the Wheeler Graph era Talk by Ben Langmead - WABI 2025

πŸ–₯️🧬We're thrilled to announce that one of our keynote speakers at #WABI2025 will be the inimitable @benlangmead.bsky.social! wabiconf.github.io/2025/talks/t... Ben's keynote is titled "We are what we index; a primer for the Wheeler Graph era", & it's sure to be a whirlwind tour of full-text indexing!

16.06.2025 12:46 πŸ‘ 20 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0

1/5 We introduce Movi Color, led by Steven Tan (a brilliant undergrad member of Langmead lab) for taxonomic and multi-class classification. It uses a full-text index based on the move structure and does not rely on predefined values (like k-mer length) for index building.
github.com/mohsenzakeri...

29.05.2025 14:36 πŸ‘ 15 πŸ” 6 πŸ’¬ 1 πŸ“Œ 2