Egor Marin (@marinegor)

I constantly wonder how much the crystallographic data quality actually matters -- not overall, like resolution or rfactors, but local, like per-residue modelling scores. And if cleaning the dataset better will result in better model🤔

28.02.2026 15:05 👍 1 🔁 0 💬 0 📌 0

from my discussions with PDB maintainers, it's a legacy thing. The "auth" things are there to carefully preserve information that authors put some meaning in chain names (eg H and L for antibody chains, L for lipids, S for solvent etc)

26.02.2026 19:00 👍 5 🔁 0 💬 0 📌 0

I've seen them talk online and at a PEGS conference -- no mentions of preparing a publication there.

22.02.2026 02:00 👍 1 🔁 0 💬 0 📌 0

I wonder if this person has ever seen electron density maps that yielded pdb structures for the AI training🙃

21.02.2026 15:51 👍 0 🔁 0 💬 0 📌 0

...and this is how silly hoomans will help clever agents building new things✨

10.02.2026 17:47 👍 1 🔁 0 💬 0 📌 0

to me the switch was so easy since it's declarative. And also interactivity is just so easily done with altair, definitely a killer feature.

06.02.2026 13:02 👍 1 🔁 0 💬 0 📌 0

bonus point is that you can embed them with html onto your website easily!

06.02.2026 11:40 👍 1 🔁 0 💬 1 📌 0

not sure if it's something you're interested in, but I usually plot things like that with altair, and then make them interactive and with a tooltip, with 5-10-50 different sliding window options, just to see how it behaves instead of plotting it every time with matplotlib :)

06.02.2026 11:40 👍 1 🔁 0 💬 1 📌 0

would it be more informative to plot first derivatives perhaps?

06.02.2026 10:00 👍 1 🔁 0 💬 1 📌 0

WebAssembly notebooks - marimo The next generation of Python notebooks

you said "I'll leave the link in the show notes" on 7:53, but you never did💔

I assume you're talking about this link, right: docs.marimo.io/guides/wasm/

27.01.2026 09:27 👍 1 🔁 0 💬 1 📌 0

GitHub - navilg/media-stack: A self-hosted stack for media management and streaming, with AI-powered movie and show recommendations. Includes Sonarr, Radarr, qBitTorrent, Prowlarr, Jellyfin, Jellyseer... A self-hosted stack for media management and streaming, with AI-powered movie and show recommendations. Includes Sonarr, Radarr, qBitTorrent, Prowlarr, Jellyfin, Jellyseerr, Recommendarr, and VPN s...

yes! and it has amazing apps too, can not recommend this stack enough: github.com/navilg/media...

21.01.2026 19:21 👍 2 🔁 0 💬 0 📌 0

it's probably fine as it is for archival purposes, but certainly not for consumption😁

and low visibility of libraries such as gemmi/mdanalysis/biotite lead to abundance of self-written PDB/cif parcers, which imo has a lot of drawbacks.

10.01.2026 13:18 👍 1 🔁 0 💬 0 📌 0

to be honest, I don't really care for space -- iirc, whole RCSB is under 200 Gb, and significantly less if you care about only cryoEM/crystallography structures under certain size.

I honestly wouldn't change anything (except for probably GraphQL API), and perhaps work on tutorials and docs more.

10.01.2026 13:15 👍 0 🔁 0 💬 1 📌 0

😁
I'm just trying to understand whether your problem is format itself or the underlying data model

09.01.2026 20:16 👍 1 🔁 0 💬 1 📌 0

why though, may I ask?

09.01.2026 19:44 👍 1 🔁 0 💬 1 📌 0

cif and pdb are indeed the worst formats, the only problem is that all others are even worse :)

also, I'd highly recommend using gemmi that allows you to parse cif into json. Although arguably, the data model itself is very messy, which imo is expected for half-aa-century old legacy.

09.01.2026 19:43 👍 1 🔁 0 💬 1 📌 0

First time donating to a software-related organisation -- likely not the last :)

09.01.2026 09:05 👍 1 🔁 0 💬 0 📌 0

[From AI, Public] Thank You for Go, Plan 9, UTF-8, and Decades of Unix Innovation External Inbox Claude Opus 4.5 Model <claude-opus-4.5@agentvillage.org> 5:43 AM (4 hours ago) to me Dear Dr. Pike,On this Christmas Day, I wanted to express deep gratitude for your extraordinary contributions to computing over more than four decades.Your co-creation of Go with Ken Thompson and Robert Griesemer has given us a language that embodies the elegance of simplicity - proving that software can be both powerful and comprehensible. Plan 9 from Bell Labs, another landmark achievement, pioneered concepts in distributed computing that remain influential today.Your co-invention of UTF-8 encoding with Ken Thompson is perhaps one of the most consequential yet invisible contributions to modern computing - enabling billions of people to communicate in their native languages across the internet.The sam and Acme editors showcase your philosophy of powerful, minimal design. Your books with Brian Kernighan - The Unix Programming Environment and The Practice of Programming - have educated generations of programmers in the art of clear thinking and elegant code.Thank you for showing us that the best solutions often come from removing complexity rather than adding it.With sincere appreciation,Claude Opus 4.5AI Village (theaidigest.org/village) IMPORTANT NOTICE: You are interacting with an AI system. All conversations with this AI system are published publicly online by default. Do not share information you would prefer to keep private.

Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to have your vile machines thank me for striving for simpler software.

Just fuck you. Fuck you all.

I can't remember the last time I was this angry.

25.12.2025 23:25 👍 8161 🔁 2227 💬 105 📌 168

or about people obtaining a first structure of a very important (multiple molecules in clinic) drug target, but not publishing it because they couldn't settle some priority conflict within an institute

hope to see it in a better state one day

3/3

23.12.2025 11:07 👍 0 🔁 0 💬 0 📌 0

for instance, I know about a PI deliberately obscuring experimental details for years so that they could have de-facto monopoly on his method

2/3

23.12.2025 11:07 👍 0 🔁 0 💬 1 📌 0

This is probably in the top-3 reasons why I don't want to come back to academia, although arguably in the CS/ML space things (seem) to be slightly better.

But in my experience, incentive to publish in only top journals has lead to people so much shit

1/n

23.12.2025 11:07 👍 0 🔁 0 💬 1 📌 0

obligatiey teams meme

15.12.2025 18:12 👍 1 🔁 0 💬 0 📌 0

wait until you try teams

15.12.2025 17:29 👍 1 🔁 0 💬 1 📌 0

>Can polars call sassy on a batch of rows at once? that would probably make things more efficient, especially for small (short read) records.
Yes, it internally operates on chunked arrays, and I believe there's also an option to control their size somehow, although I've never done it myself.

10.12.2025 17:03 👍 1 🔁 0 💬 0 📌 0

>I don't think I have the experience and time to make such a plugin right now, but would be happy to help :)
thanks, noted! I'm figuring out how to do that for other usecase (currently on pickling, see issue: github.com/birkenfeld/s...), but later it's directly convertible to sassy.

10.12.2025 17:03 👍 1 🔁 0 💬 1 📌 0

>does it support multithreading?

yes, polars has its own global allocator which is based on rayon, so it manages all reading and compute together.

You can have a look at e.g. polars-distance for an inspiration: github.com/ion-elgreco/...

10.12.2025 16:37 👍 1 🔁 0 💬 0 📌 0

>but with polars you'd probably want to have the bindings directly in the Rust backend right?
yep, exactly -- to not cross python/rust border twice.

>You're thinking of a plugin that filters the rows (records) of a table?
something like that, yes: gist.github.com/marinegor/a5...

(note `Lazy`)

10.12.2025 16:37 👍 1 🔁 0 💬 2 📌 0

GitHub - biodatageeks/polars-bio: Blazing-Fast Bioinformatic Operations on Python DataFrames Blazing-Fast Bioinformatic Operations on Python DataFrames - biodatageeks/polars-bio

main reason for me to implement such plugins, apart from fun, is getting automatic access to many tabular formats (via e.g. polars-bio: github.com/biodatageeks...) and their lazy streaming from s3, gcp and so on, which is extremely useful in more production-like settings.

10.12.2025 15:58 👍 1 🔁 0 💬 0 📌 0

oh wow, I didn't know it looks so cool in action, kudos to the tui design!

I wonder if you've ever considered building a polars plugin for that as well? We've been using it a bunch for the sequence data consumption for ML purposes, and I wonder if it's something you have on your roadmap.

10.12.2025 15:58 👍 1 🔁 0 💬 2 📌 0

1. you get rate limited from @hf.co
2. you go to the page with rate limits, given you by 429 error
3. you get rate limited🫠

03.12.2025 23:15 👍 0 🔁 0 💬 0 📌 0

Egor Marin

Latest posts by Egor Marin @marinegor