Thanks for the idea, we briefly checked this and for E.coli test set predictions, we get ~80% of the high confidence interactions to be more than 5 genes away from each other, so a large fraction is non-syntenic!
Thanks for the idea, we briefly checked this and for E.coli test set predictions, we get ~80% of the high confidence interactions to be more than 5 genes away from each other, so a large fraction is non-syntenic!
For wirus-microbe -- yes (we have examples in paper)!, for microbe-host, we haven't fully evaluated how this would work for eukaryotic proteomes.
We thought a lot about how to deploy ππππππ·π·π°, and we are very proud of this implementation that integrates annotation+context+CoSearch+agent with FlashPPI on SeqHub!
Thanks for pointing this out! We will add an option to download the network!
Step-by-step how to run FlashPPI on your favorite genomes!
Predicting protein-protein interactions (PPIs) at proteome scale can take months with co-folding models due to the massive all-vs-all comparisons required.
We are excited to announce FlashPPI, a contrastive learning framework that predicts proteome wide physical interfaces in minutes. 1/π§΅
Preprint: www.biorxiv.org/content/10.6...
For a typical microbial genome, all-vs-all PPI prediction with AF3 would take hundreds of GPU-years. With FlashPPI, we can scale molecular interaction prediction across diverse, non-model microbial genomes, unlocking truly scalable discovery. We deployed FlashPPI on Seqhub.org, give it a spin!
3. Online hard negative mining improves sensitivity.
We use joint optimization to let the model propose hard negatives for contact prediction during training. This results in even more sensitive and robust performance.
2. Learning how proteins interact matters
It's not enough to learn that 2 proteins interact, learning *how* they interact at residue level is critical for performance.
Some fun highlights on what we learned along the way:
1. Reframing PPI prediction as retrieval
Instead of asking βDo A and B interact?β, we ask: Which proteins does A interact with in this genome? This shift in framing enables linear-time scaling and ultrafast performance.
For technical details, check out @ancornman1βs excellent breakdown of the model. bsky.app/profile/anco...
Proteinβprotein interactions (PPIs) are key to discovering and interpreting new biological functions.
Weβre excited to introduce ππππππ·π·π°: a new application of gLM2 that uses genomic language modeling to predict proteome-wide PPIs in microbial genomes in minutes.
Weβd love to join your lab meeting!
Weβve been meeting with research groups to share how scientists are using SeqHub for sequence and genome analysis, and the conversations have been highly interactive and grounded in real workflows.
Booking info below.
Weβre excited to welcome Daniela Bourges-Waldegg to the SeqHub Advisory Board!
Daniela is EVP + Chief Digital & Technology Officer at @addgene.bsky.social. She will help shape our approach to building researcher-centered digital infrastructure with an eye toward long-term scientific impact.
First, @tattabio.bsky.social is now on Bluesky!π and second, we launched mult-sequence CoSearch on SeqHub!
This. Is. So. Cool. π€―
Hi Roland, our servers are in the US, we explicitly state in our docs that we do not train models on private data, and the data is private to you only - unless intentionally made public (for publication/data sharing purposes)!
thanks for the feedback! We are working on making more of the platform exportable as figuresπ
Thank you for the shoutout!
Released today from Tatta Bio: SeqHub! A place to explore, annotate, and share sequence data with functional insights.Β
Over 1,000 scientists worldwide have already used SeqHub to annotate more than 550,000 proteins, uncovering new insights and accelerating discovery.
Annotations are mapped using embedding-based search, making it faster than most alignment-based search. HMM prediction speed-up comes from some optimization and parallelization :)
Thank you! and PaperBLAST team deserves a shoutout for the sequence-paper linkages
@ancornman1.bsky.social @sokrypton.org @pgirguis.bsky.social @alexbateman1.bsky.social @simrouxvirus.bsky.social @apcamargo.bsky.social
Currently, SeqHub is optimized for microbial protein and genome analysis. As we expand beyond microbial data, we'd love your feedback to help shape what comes next. I'm deeply grateful to our team at Tatta Bio, and to our collaborators and funders, for making this vision a reality. π seqhub.org
We're thrilled to announce SeqHub, an AI-enabled platform for biological sequence analysis. SeqHub brings together sequence search, genome annotation, and data sharing in one place.
Ready to explore New Lineages of Life with @jgi.doe.gov ? π§¬π¦
Registration for our 2025 NeLLi Symposium is now open. For the first time in collaboration with @unlv.edu
Mark the date: November 6-7 in Las Vegas, NV
We are building this infrastructure for the scientific community, and we invite feedback and collaboration from researchers at every stage. We are grateful to
the Moore Foundation for their generous support in making this project possible. Stay tuned for more updates!
www.tatta.bio/gaia
At Tatta Bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:
I am very happy (and anxious) to share with you our most recent work in which we evaluated four of the most popular long-read assemblers,
www.biorxiv.org/content/10.1...
and tell you just a little bit about it in the following π§΅