Welcome to all of our Systems Biology: Global Regulation of Gene Expression attendees! #cshlsysbio
Welcome to all of our Systems Biology: Global Regulation of Gene Expression attendees! #cshlsysbio
#cshlsysbioπ§¬
Upcoming Cold Spring Harbor meeting on Regulatory & Non-coding RNAs (April 7-11, 2026). Abstract deadline is fast approaching! @cshlmeetings.bsky.social meetings.cshl.edu/meetings.asp...
I will be attending "Gene Expression & Signaling in the Immune System" at @cshlmeetings.bsky.social next week - this is their first meeting of the year and will be an exciting one.
First one of 2026! Gene Expression & Signaling in the Immune System starts tonight! #cshlimmune
Great meeting!!! Amazing talks and posters today also.
Equally important, our meetingβs simple rule is holding throughout the sessions, with graduate students and postdoctoral scholars reliably asking the first 2 questions after each presentation.
A privilege to organize this with Sally Temple and Christine Mummery!
Next edition in December 2027!
Remarkable energy at the inaugural @cshlnews.bsky.social conference on Assembloids and complex cellβcell interactions across tissues and systems
A wonderful series of talks so far highlighting advances & discoveries being made with these self-organizing systems and a few common themes are emerging
Thatβs a wrap on 2025! Thanks to everyone who came to #cshlassembloid
The inaugural conference on #assembloids and complex cell-cell interactions starting tonight at @cshlnews.bsky.social ! Amazing energy and inspiring opening keynote by Ruslan Medzhitov.
Thank you to everyone who attended last weekβs Plant Genomes, Systems Biology, and Engineering meeting! #cshlplant π±
Great piece on #interoception and brain body interactions by @carlzimmer.com in the @nytimes.com www.nytimes.com/2025/11/25/s... What an exciting time to work on this. Join us @cshlmeetings.bsky.social to learn more about the this exciting field meetings.cshl.edu/meetings.asp...
Today is the last day of Zebrafish Neurobiology π! #cshlzebrafish
Last day of Single Cell Analyses! π #cshlsca
Really excited to see our new work in scaling Mumemto to any size pangenome published in Genome Research this morning. And right on cue with the great opportunity to present this work at #GI2025 this week.
#GI2025 Vikram Shivakumar from Ben Langmead's lab (@benlangmead.bsky.social) presents "MumemtoM - partitioned Multi-MUM finding for scalable pangenomics ". Now published in Genome Research @genomeresearch.bsky.social. Read full text here β‘οΈ tinyurl.com/Genome-Res-2...
OCR Ortholog Open Chromatin Status Prediction Framework Overview. a We trained a convolutional neural network (CNN) for predicting brain open chromatin using sequences underlying brain open chromatin region (OCR) orthologs in a small number of species and used the CNN to predict brain OCR ortholog open chromatin status across the species in the Zoonomia Consortium. Specifically, we used the sequences underlying the orthologs for which we have brain open chromatin data to train a CNN for predicting open chromatin. Then, we used the CNN to predict the probability of brain open chromatin for all brain OCR orthologs; predictions are illustrated on the right. Animals for which we do not have open chromatin data are in dark gray instead of black to indicate that their brain open chromatin is imputed. While we cannot evaluate the accuracy of most of our predictions, obtaining open chromatin data from most tissues in most species is infeasible, so predictions might be the best OCR annotations that we can obtain. b To demonstrate that our models can accurately predict whether sequence differences between species are associated with open chromatin differences, in addition to the evaluations described in previous work [57], we evaluated our performance on species-specific open chromatin for a species not used in model training and clade-specific open and closed chromatin for clades not used in model training. Since such regions often comprise a minority of OCR orthologs, models could obtain good overall performance while obtaining poor performance on such regions. We also evaluated our performance on tissue-specific open and closed chromatin for a tissue not used in model training, where we expect models to predict 0 if model learns sequence signatures related to the tissue used in training. c Full mouse test set and lineage-specific OCR accuracy evaluations for mouse sequence-only brain model, illustrating that, even for the best of these models,
Third day of Genome Informatics #GI2025 began with an exciting session on βAI, ML and Integrative Genomicsβ chaired by Irene Kaplow & Thomas Pierrot.
The first talk, by Irene Kaplow, focused on Challenges in Predicting Enhancer Activity Differences Between Species
doi.org/10.1186/s12864-022-08450-7
https://arxiv.org/abs/2503.17547 Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Second's day concluded by fantastic talk by Cristina Martin Linares on "Minimal reconstruction of SpliceAI using distilled matryoshka sparse autoencoders"
They showed that matryoshka SAEs arxiv.org/abs/2503.17547 improves upon openSpliceAI elifesciences.org/reviewed-preprints/107454. #GI2025
a, LeftβFasta representation of an individual SARS-CoV-2 genome consists of sample name followed by the entireβββ30βkbp genome sequence. RightβMAPLE format records only the differences between the genome under consideration and a reference; columns represent the variant character observed, the position along the genome and (when necessary) the number of consecutive positions for which the character is observed. b, Leftβan example likelihood vector at an internal node of a phylogenetic tree (shown by the narrow blue arrow; only a small portion of the tree is shown); for simplicity, we show only ten genome positions. At each position (rows 1β10), each column contains the likelihood for a specific nucleotide. For rows 1β9, the likelihood is concentrated at only one nucleotide (highlighted in green), while for position 10, we show an example with more uncertainty. RightβMAPLE representation of these node likelihoods. Assuming that the reference sequence at the first nine positions matches the most likely nucleotides in the vector (ATTAAAGGT), then for positions 1β9, the likelihood of nonreference nucleotides is negligible and we represent the likelihoods with a single symbol (R). At position 10, due to non-negligible uncertainty, we explicitly calculate and store the four relative likelihoods. c, Examples of likelihood calculation steps in MAPLE. Red arrows represent the flow of information from the tips to the root of the tree. Leftβif two child nodes are in reference state R for a region of the genome (here, positions 1β9), then MAPLE assumes that their parent is also in state R. Rightβif at a genome position (here, position 10), two child nodes have likelihoods concentrated at different nucleotides, then for their parent, we explicitly calculate the relative likelihoods of all four nucleotides.
Nicola De Maio presented "Maximum likelihood phylogenetics at pandemic scales" and discussed the importance of scalable phylogenetics in genomic epidemiology. #GenomeInformatics #GI2025
MAPLE: nature.com/articles/s41588-023-01368-0
#GI2025 Ilias Georgakopoulos-Soares presents "Quadrupia - a comprehensive catalog of G-quadruplexes across genomes from the tree of life". Now published in Genome Research @genomeresearch.bsky.social Read full text here β‘οΈ tinyurl.com/Genome-Res-2...
#GI2025 Chirag Jain presents "Pangenome-based genome inference using integer programming". Now published in GenomeResearch @genomeresearch.bsky.social Read the full text here β‘οΈ tinyurl.com/Genome-Res-2...
#GI2025 Mile Sikic @msikic.bsky.social presents "Geometric deep learning framework for de novo genome assembly" Now published in GenomeResearch @genomeresearch.bsky.social Full text here β‘οΈ tinyurl.com/Genome-Res-2...
Abstract: Seed-chain-extend with k-mer seeds is a powerful heuristic technique for sequence alignment used by modern sequence aligners. Although effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with k-mers in expectation. Assume we are given a random nucleotide sequence of length βΌn that is indexed (or seeded) and a mutated substring of length βΌm β€ n with mutation rate ΞΈ < 0.206. We prove that we can find a k = Ξ(log n) for the k-mer size such that the expected runtime of seed-chain-extend under optimal linear-gap cost chaining and quadratic time gap extension is O(mn^f(ΞΈ) log n), where f(ΞΈ) < 2.43 Β· ΞΈ holds as a loose bound. The alignment also turns out to be good; we prove that more than 1-o(sqrt(1/m)) fraction of the homologous bases is recoverable under an optimal chain. We also show that our bounds work when k-mers are sketched, that is, only a subset of all k-mers is selected, and that sketching reduces chaining time without increasing alignment time or decreasing accuracy too much, justifying the effectiveness of sketching as a practical speedup in sequence alignment. We verify our results in simulation and on real noisy long-read data and show that our theoretical runtimes can predict real runtimes accurately. We conjecture that our bounds can be improved further, and in particular, f(ΞΈ) can be further reduced.
Second day of Genome Informatics #GI2025 began with the session βGenome Assembly and Sequence Algorithms" Yun William Yu presented βAverage-case Analysis of Seed-Chain-Extend under Random Mutations"
genome.cshlp.org/content/33/7/1175
providing theoretical guarantees for the popular seed-chain-extend
A thread on #GI2025 's first session ππ»
The first session is PANGENOMES #GI2025. Alexander SchΓΆnhuth is delivering the first talk on "Generating synthetic genotypes using diffusion models"
Paper: academic.oup.com/bioinformati...
Code: github.com/TheMody/Gene...
@benlangmead.bsky.social kicking off the start of Genome Informatics! #gi2025 @cshlnews.bsky.social
Ben Langmead @benlangmead.bsky.social delivers the official opening for this year's Genome Informatics Conference #GI2025 at Cold Spring Harbor Laboratory.
List of talks and posters: meetings.cshl.edu/abstracts.as...
#GI2025 is about to start. I hope you'll enjoy this edition!
Excited to be at the Genome Informatics conference #GI2025 in Cold Spring Harbor Laboratory this week! Iβll be sharing my current work on using machine learning to improve the reliability of Metagenomics classification for analzing gut microbiota and soil samples. Let's connect!
Genome Informatics starts tonight 𧬠#gi2025