Andrew White 🐦‍⬛ (@andrew.diffuse.one)

Making "AI Scientists" has become a hot topic lately. The first reference I could find was from 2008. The term has been used for 20 years! Like "Adam," an AI Scientist robot for studying yeast was published in 2009. I wrote a short post about the term and what it means now.

diffuse.one/p/w1-001

29.10.2025 18:26 👍 8 🔁 2 💬 0 📌 0

diffuse.one andrew white's blog.

So we probably won't be getting a direct simulation of a whole virtual cell at meaningful timescales any time soon. Oh, and it would require 20x current earth power generation. 3/3

Read the analysis/blog post here: diffuse.one/p/d1-009

26.09.2025 15:19 👍 3 🔁 0 💬 0 📌 0

It sounds insane, but remember there are 10^14 atoms in a human cell and 10^20 femtoseconds in a day. And across multiple simulation engines, it requires 10^4 FLOPs per atom x femtosecond 2/3

26.09.2025 15:19 👍 1 🔁 0 💬 1 📌 0

I finished my estimate on required compute to make an atomic-resolution virtual cell: 10^38 FLOPs to simulate a human cell for 1 day. We should be able to do this simulation in 2074 using 200 TW of power. 1/3

26.09.2025 15:19 👍 12 🔁 3 💬 4 📌 0

yea, those are the model thoughts. It has a lot of mistakes in its thoughts. But you've got a very good eye! We'll make sure the final paper has a pristine example of its thoughts.

19.09.2025 22:50 👍 1 🔁 0 💬 0 📌 0

Our ether0 paper was accepted at NeurIPS 2025! Very proud of the FutureHouse team!

19.09.2025 15:42 👍 6 🔁 0 💬 1 📌 0

Very good point - I can re-run without that phrase.

16.09.2025 17:21 👍 1 🔁 0 💬 0 📌 0

If you don't put phrase in quotes, it's an or. So it was

"α" equation

which is equivalent to "α" OR equation

16.09.2025 17:20 👍 1 🔁 0 💬 0 📌 0

You can also look at it over time. Here's relatively popularity of different animal models in research over time.

Anyway, found this to be interesting. More details about it here: diffuse.one/p/d2-003 3/3

14.09.2025 16:52 👍 3 🔁 0 💬 1 📌 0

Here's one measuring the frequency of sample sizes. Like how often people use 8 samples vs 12 samples for reporting research results. N=2 is apparently the most popular 2/3

14.09.2025 16:52 👍 3 🔁 0 💬 2 📌 0

Google scholar has a full-text index of nearly all research papers. You can use it to get counts for arbitrary phrases. I've been using this to measure popularity of things in science. For example, here's the popularity of Greek letters used in equations 1/3

14.09.2025 16:52 👍 13 🔁 1 💬 3 📌 0

diffuse.one andrew white's blog.

read it here: diffuse.one/p/d2-002

15.08.2025 18:10 👍 1 🔁 0 💬 0 📌 0

I've written up some thoughts on publishing for machines. 10M research papers are published per year and there are 227M total - machines will be primary producers and readers of publications going forward. Humans can simply not keep up. It's time to think about revising the scientific paper.

15.08.2025 18:10 👍 1 🔁 0 💬 1 📌 0

We make evals at FutureHouse. It’s hard and it sucks. It’s also now the bottleneck, as we scratch the boundary of human ability. HLE was a huge effort and made many good questions and we hope this analysis stimulates review of the other HLE categories and improvements 7/7

23.07.2025 16:28 👍 2 🔁 0 💬 1 📌 0

futurehouse/hle-gold-bio-chem · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

We have written up our analysis: www.futurehouse.org/research-ann...
And made a gold subset on @huggingface that passed our review: huggingface.co/datasets/fut... 6/7

23.07.2025 16:28 👍 2 🔁 0 💬 1 📌 0

We reviewed 150 of the questions in the chem and bio and found about 30% have peer-reviewed papers contradicting their ground-truth answers. Issues include confusion of species with orders, misreading of FDA guidelines, etc. All our notes are public. 5/7

23.07.2025 16:28 👍 0 🔁 0 💬 1 📌 0

The HLE rubric wanted questions to have “objectively correct, univocal” ground-truth answers. You can find multiple peer-reviewed papers that contradict the statement "Oganesson was the rarest noble gas in 2002 as a percentage of terrestrial matter" 4/7

23.07.2025 16:28 👍 0 🔁 0 💬 1 📌 0

It’s a clever question. But it’s not really about frontier science. Multiple papers have shown that Oganesson is not a gas (it’s predicted to be semiconducting solid), it’s not noble (it’s reactive), and it isn’t included in any "terrestrial matter" tables of noble gases. 3/7

23.07.2025 16:28 👍 0 🔁 0 💬 1 📌 0

The design process of HLE required the questions to be unanswerable by contemporary LLMs. That lead to many gotcha style questions like the one below. It’s a trick question – in 2002, a few atoms of a group 18 element Oganesson were made for a few milliseconds. 2/7

23.07.2025 16:28 👍 1 🔁 0 💬 1 📌 0

HLE has recently become the benchmark to beat for frontier agents. We at FutureHouse took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

23.07.2025 16:28 👍 6 🔁 2 💬 1 📌 1

I just noticed it has sound lol. It's amazing

12.07.2025 04:17 👍 0 🔁 0 💬 0 📌 0

1/4
🚀 Announcing the 2025 Protein Engineering Tournament.

This year’s challenge: design PETase enzymes, which degrade the type of plastic in bottles. Can AI-guided protein design help solve the climate crisis? Let’s find out! ⬇️

#AIforBiology #ClimateTech #ProteinEngineering #OpenScience

08.07.2025 16:26 👍 23 🔁 20 💬 1 📌 4

ether0/src/ether0/rewards.py at c8cc676354e926b50ad206a606e04489bc9c95e3 · Future-House/ether0 A scientific reasoning model, dataset, and reward functions for chemistry. - Future-House/ether0

It may take a bit to extract the function, but here it is: github.com/Future-House...

22.06.2025 19:12 👍 2 🔁 0 💬 0 📌 0

I have written up a 3.5k word/10 figure essay on how to write a reward function while avoiding reward hacking for chemistry. It covers all the ridiculous ways we had to avoid reward hacking for training ether0, our scientific reasoning model.

diffuse.one/p/m1-000

22.06.2025 15:21 👍 23 🔁 1 💬 2 📌 0

Demonstrating end-to-end scientific discovery with Robin: a multi-agent system | FutureHouse

Although the discovery here is exciting, we are not claiming that we have cured dry AMD. Fully validating this hypothesis as a treatment for dry AMD will take human trials, which will take much longer.

Blog: www.futurehouse.org/research-ann...
Paper: arxiv.org/abs/2505.13400

20.05.2025 15:35 👍 4 🔁 0 💬 0 📌 0

The code for this is really minimal - similar to Google Co-Scientist we used multiple agents (from our platform in this case) and tournament-style rankings to select ideas. We're open sourcing it next week, along with all the trajectories.

20.05.2025 15:35 👍 2 🔁 0 💬 1 📌 0

The figures, hypothesis, original and follow-up experiments were all generated from our agents. Interestingly, only the lab-work and the paper writing were not automated (which is the opposite of what I would have predicted 2 years ago).

20.05.2025 15:35 👍 1 🔁 0 💬 1 📌 0

FutureHouse's goal has been to automate scientific discovery. Now we used our agents to make a genuine discovery – a potential new treatment for one kind of blindness (dAMD). We had multiple cycles of hypotheses, experiments, and data analysis – including identify the mechanism.

20.05.2025 15:35 👍 24 🔁 5 💬 1 📌 0

We shipped multi-agents today! Our chemistry design agent can now call Crow, our scholarly research agents, to bring in data from literature/clinical trials/open targets while designing molecules.

platform.futurehouse.org

13.05.2025 15:45 👍 11 🔁 2 💬 0 📌 0

Integrating @opentargets.org is so helpful to provide evidence for disease mechanisms independent of the literature. Here's a demo of synthesizing 78 papers and open targets to propose two novel targets for triple negative breast cancer

See the answer: platform.futurehouse.org/trajectories...

11.05.2025 01:59 👍 6 🔁 0 💬 0 📌 0

Andrew White 🐦‍⬛

Latest posts by Andrew White 🐦‍⬛ @andrew.diffuse.one