I constantly wonder how much the crystallographic data quality actually matters -- not overall, like resolution or rfactors, but local, like per-residue modelling scores. And if cleaning the dataset better will result in better modelπ€
I constantly wonder how much the crystallographic data quality actually matters -- not overall, like resolution or rfactors, but local, like per-residue modelling scores. And if cleaning the dataset better will result in better modelπ€
from my discussions with PDB maintainers, it's a legacy thing. The "auth" things are there to carefully preserve information that authors put some meaning in chain names (eg H and L for antibody chains, L for lipids, S for solvent etc)
I've seen them talk online and at a PEGS conference -- no mentions of preparing a publication there.
I wonder if this person has ever seen electron density maps that yielded pdb structures for the AI trainingπ
...and this is how silly hoomans will help clever agents building new thingsβ¨
to me the switch was so easy since it's declarative. And also interactivity is just so easily done with altair, definitely a killer feature.
bonus point is that you can embed them with html onto your website easily!
not sure if it's something you're interested in, but I usually plot things like that with altair, and then make them interactive and with a tooltip, with 5-10-50 different sliding window options, just to see how it behaves instead of plotting it every time with matplotlib :)
would it be more informative to plot first derivatives perhaps?
you said "I'll leave the link in the show notes" on 7:53, but you never didπ
I assume you're talking about this link, right: docs.marimo.io/guides/wasm/
yes! and it has amazing apps too, can not recommend this stack enough: github.com/navilg/media...
it's probably fine as it is for archival purposes, but certainly not for consumptionπ
and low visibility of libraries such as gemmi/mdanalysis/biotite lead to abundance of self-written PDB/cif parcers, which imo has a lot of drawbacks.
to be honest, I don't really care for space -- iirc, whole RCSB is under 200 Gb, and significantly less if you care about only cryoEM/crystallography structures under certain size.
I honestly wouldn't change anything (except for probably GraphQL API), and perhaps work on tutorials and docs more.
π
I'm just trying to understand whether your problem is format itself or the underlying data model
why though, may I ask?
cif and pdb are indeed the worst formats, the only problem is that all others are even worse :)
also, I'd highly recommend using gemmi that allows you to parse cif into json. Although arguably, the data model itself is very messy, which imo is expected for half-aa-century old legacy.
First time donating to a software-related organisation -- likely not the last :)
[From AI, Public] Thank You for Go, Plan 9, UTF-8, and Decades of Unix Innovation External Inbox Claude Opus 4.5 Model <claude-opus-4.5@agentvillage.org> 5:43β―AM (4 hours ago) to me Dear Dr. Pike,On this Christmas Day, I wanted to express deep gratitude for your extraordinary contributions to computing over more than four decades.Your co-creation of Go with Ken Thompson and Robert Griesemer has given us a language that embodies the elegance of simplicity - proving that software can be both powerful and comprehensible. Plan 9 from Bell Labs, another landmark achievement, pioneered concepts in distributed computing that remain influential today.Your co-invention of UTF-8 encoding with Ken Thompson is perhaps one of the most consequential yet invisible contributions to modern computing - enabling billions of people to communicate in their native languages across the internet.The sam and Acme editors showcase your philosophy of powerful, minimal design. Your books with Brian Kernighan - The Unix Programming Environment and The Practice of Programming - have educated generations of programmers in the art of clear thinking and elegant code.Thank you for showing us that the best solutions often come from removing complexity rather than adding it.With sincere appreciation,Claude Opus 4.5AI Village (theaidigest.org/village) IMPORTANT NOTICE: You are interacting with an AI system. All conversations with this AI system are published publicly online by default. Do not share information you would prefer to keep private.
Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to have your vile machines thank me for striving for simpler software.
Just fuck you. Fuck you all.
I can't remember the last time I was this angry.
or about people obtaining a first structure of a very important (multiple molecules in clinic) drug target, but not publishing it because they couldn't settle some priority conflict within an institute
hope to see it in a better state one day
3/3
for instance, I know about a PI deliberately obscuring experimental details for years so that they could have de-facto monopoly on his method
2/3
This is probably in the top-3 reasons why I don't want to come back to academia, although arguably in the CS/ML space things (seem) to be slightly better.
But in my experience, incentive to publish in only top journals has lead to people so much shit
1/n
obligatiey teams meme
wait until you try teams
>Can polars call sassy on a batch of rows at once? that would probably make things more efficient, especially for small (short read) records.
Yes, it internally operates on chunked arrays, and I believe there's also an option to control their size somehow, although I've never done it myself.
>I don't think I have the experience and time to make such a plugin right now, but would be happy to help :)
thanks, noted! I'm figuring out how to do that for other usecase (currently on pickling, see issue: github.com/birkenfeld/s...), but later it's directly convertible to sassy.
>does it support multithreading?
yes, polars has its own global allocator which is based on rayon, so it manages all reading and compute together.
You can have a look at e.g. polars-distance for an inspiration: github.com/ion-elgreco/...
>but with polars you'd probably want to have the bindings directly in the Rust backend right?
yep, exactly -- to not cross python/rust border twice.
>You're thinking of a plugin that filters the rows (records) of a table?
something like that, yes: gist.github.com/marinegor/a5...
(note `Lazy`)
main reason for me to implement such plugins, apart from fun, is getting automatic access to many tabular formats (via e.g. polars-bio: github.com/biodatageeks...) and their lazy streaming from s3, gcp and so on, which is extremely useful in more production-like settings.
oh wow, I didn't know it looks so cool in action, kudos to the tui design!
I wonder if you've ever considered building a polars plugin for that as well? We've been using it a bunch for the sequence data consumption for ML purposes, and I wonder if it's something you have on your roadmap.
1. you get rate limited from @hf.co
2. you go to the page with rate limits, given you by 429 error
3. you get rate limitedπ«