> paper claims astonishing progress on protein folding problem
> ask if itβs interesting proteins or villin headpiece denatured with urea
> they say results are robust to protein sequence properties
> open the pdf
> villin headpiece in urea
> paper claims astonishing progress on protein folding problem
> ask if itβs interesting proteins or villin headpiece denatured with urea
> they say results are robust to protein sequence properties
> open the pdf
> villin headpiece in urea
I wrote a blog post about the future of structural bioinformatics.
Where to go after AlphaFold? How do we avoid the field becoming a load of half-baked LLMs?
Let me know what you think.
jgreener64.github.io/posts/struct...
One of the most interesting parts of this workflow is the "sunk cost fallacy" estimator that predicts how promising a particular mutational line of inquiry is, and whether it is worth abandoning in favor of others
Most benchmarks for drug discovery AI don't effectively evaluate generative models; instead, due to the data's incompleteness, they rely on surrogate functions, like fwd folding, property prediction, or ranking previously characterized designs. No idea what the solution is here
New paper from former PhD student @tkschulze.bsky.social on supervised learning of protein variant effects across large-scale mutagenesis datasets
MAVE/DMS experiments provide large amounts of data for benchmarking variant effect predictors, but may be difficult to use in supervised learning. 1/5
you and me both claude
I regret to inform you that if you are job hunting on LI and this is your profile pic then youβre ngmi
Somewhere a student is wondering why their stats lecture on Gaussian Processes has so many pictures of the Strait of Hormuz
Itβs tough out there! I just feel bad because Iβm not the hiring manager, just RTing openings to get visibility
Seconding this question
RIP my LinkedIn after mentioning a job opening in my group β οΈ
The two co-first authors of this research paper, Yang & Yang, has decided to sort their names alphabetically
Protenix trained an identical model with way more training data (2025 cutoff instead of 2021), demonstrating that antibody-antigen modeling, but not protein-ligand modeling, is currently data-limited (DQ SR % means % DockQβ₯0.23) with this architecture
Extremely important point from the authors! The entire pipeline *is* the method. Not the NN per se! My bad for misjudging
Ah! I understand now. Let me add a note to what I wrote on the other site
Thanks, I appreciate your feedback, and I see how this decision makes some sense. But don't you think it is misleading to say that it is a problem of the method, rather than how it is applied? They might improve w/ different masking (e.g., using pseudoperplexity instead of marginal likelihoods)
Scribbled on at least one whiteboard in every office with "do not erase" written below
First fig below is the relevant code for zero-shot fitness prediction w/ ESM in the ProteinGym repo for example. It would explain why the results in the preprint, second fig, have many masked sequence-only models near the bottom of the rankings (3/3)
As far as I can tell they are using precomputed ProteinGym predictions. For masked LMs, ProteinGym uses a marginal perplexity which adds together the individual logits for all mutations against a WT or partially masked background, which definitionally cant predict epistasis (2/3)
I have a concern with this paper and I want someone who knows more than me to confirm if it is founded or not. The title makes a pretty specific claim about epistasis predictions, but the method does not seem sound for masked LMs (1/3)
Hot take: this "chain X[auth Y]" notation on FASTA files pulled from the PDB is needlessly confusing, adds nothing, and needed to be changed yesterday
This plot is quite the indictment of fine-tuned PLMs, showing how performance is entirely data-dependent and, at the upper end of performance, equally achievable with randomized model weights
We're hiring - looking for folks with experience designing and training foundation models from scratch, particularly biomolecular language models, GNNs, or vision models. Lots of room for creativity in this role. Contact me if you have any interest
sir, those are my emotional support unclosed tabs
What are you talking about? Itβs literally in the app
Does anyone know if the Boltz team has made any effort to get their papers peer-reviewed and published βofficiallyβ? Because if not, I have tons of respect for them for choosing not to play that game
LinkedIn is a major contributor to my career being the way it is (in a good way lol), but I agree with what you're implying which is that the discourse on there can be insufferable
*influencer
Another LinkedIn influence confidently declaring that cryo-EM and X-ray crystallography are no longer necessary because we have AI now
The solution offered by EmbedOpt is to apply the "push" to the conditioning inputs (pair and sequence data), which is reminiscent of the pair representation scaling strategy for conformational sampling (although that doesn't provide guidance). Should be tried here