norvid_studies's Avatar

norvid_studies

@norvid-studies

charts and graphs follow me on twitter https://twitter.com/norvid_studies

1,585
Followers
288
Following
30,368
Posts
02.05.2023
Joined
Posts Following

Latest posts by norvid_studies @norvid-studies

WHERE DOES THE CLONALITY
COME FROM TELL ME RIGHT NOW OR I’LL KILL
YOU TELL ME RIGHT NOW OR I’LL F’KIN KILL YOU DON’T SAY IT’S DIFFERENTIAL ADHESION OR CORTICAL TENSION TELL ME RIGHT NOW HOW THEY GET TO BE CLONAL MULTICELLULAR
ORGANISMS OR I’LL F’KIN KILL YOU

11.03.2026 00:19 👍 2 🔁 2 💬 0 📌 0
Post image

holy fuck

10.03.2026 23:50 👍 5 🔁 1 💬 0 📌 0

I bet if I saw how these work, I would have 10,000 complaints about how exactly it was implemented that are precisely the things that would be time consuming to decide/prompt for when actually building it.

10.03.2026 23:52 👍 3 🔁 1 💬 1 📌 0

didnt you pretty much just do that though? like it's just a list of 10+ details in english? or the issue is "you dont know it until you see it"

11.03.2026 00:13 👍 0 🔁 0 💬 0 📌 0
Post image

"why, yes, pleo, of course you get to be in the painting!"

the painting:

11.03.2026 00:11 👍 1 🔁 1 💬 0 📌 0

you know who ELSE complains of biskly brain ague while casually firing off hyperbangers...

11.03.2026 00:12 👍 3 🔁 0 💬 0 📌 0

mogged

11.03.2026 00:11 👍 2 🔁 0 💬 1 📌 0

@vgel.me the greek language mentioned

11.03.2026 00:10 👍 1 🔁 0 💬 0 📌 0
10.03.2026 23:41 👍 8 🔁 0 💬 2 📌 0
Post image

christ, superpowers v5 is incredible

10.03.2026 23:36 👍 10 🔁 1 💬 1 📌 2

basically just as a tweet appears on for you it gets mapped vertically to one falling 'coderain' line character by character

10.03.2026 23:40 👍 3 🔁 0 💬 2 📌 0

THAT'S WHAT THE CLAUDE IS FOR

10.03.2026 23:36 👍 4 🔁 1 💬 1 📌 0

that's 1 too high for me but I'll continue to contemplate this

10.03.2026 23:23 👍 3 🔁 0 💬 1 📌 0
10.03.2026 23:17 👍 0 🔁 0 💬 0 📌 0

how hard would it be to make this for your For You feed

@adler.dev @codetard.bsky.social @abeliansoup.bsky.social difficulty estimate

10.03.2026 23:12 👍 5 🔁 0 💬 2 📌 0
Post image

You cannot make this thing up

10.03.2026 18:56 👍 22 🔁 5 💬 1 📌 1

I find it soothing, so I (wrongly) thought it would prevent me from being too aggressive on the Linux forums I was visiting at the time.

10.03.2026 23:09 👍 3 🔁 0 💬 0 📌 0
Post image
10.03.2026 23:08 👍 0 🔁 0 💬 0 📌 0
Post image

simcluster group portraits in this style

10.03.2026 23:03 👍 12 🔁 0 💬 2 📌 0

This at is heart is what makes robotics so difficult

10.03.2026 22:16 👍 6 🔁 2 💬 0 📌 0

well that's my ragebait for the day, see everyone tomorrow

10.03.2026 23:00 👍 0 🔁 0 💬 0 📌 0

words to live by...

10.03.2026 22:58 👍 1 🔁 0 💬 0 📌 0

on display in what sense?

10.03.2026 22:57 👍 1 🔁 0 💬 1 📌 0
Post image

from @johnvining.bsky.social

10.03.2026 22:53 👍 3 🔁 0 💬 2 📌 0
10.03.2026 22:51 👍 3 🔁 1 💬 1 📌 0

bsky.app/profile/timk...

10.03.2026 22:45 👍 0 🔁 0 💬 0 📌 0
Andrej Karpathy
@karpathy• 13m
Three days ago | left autoresearch tuning nanochat for ~ 2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models.
Stacking up all of these changes, today I measured that the leaderboard's "Time to
GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire worktlow end-to-end and all by itself as it worked through approx.

Andrej Karpathy @karpathy• 13m Three days ago | left autoresearch tuning nanochat for ~ 2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire worktlow end-to-end and all by itself as it worked through approx.

and all by itselt as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2" , and in parallel l
am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc...

and all by itselt as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2" , and in parallel l am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc...

AII LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Autotune Prooress: 276 Experiments, 29 Kept Improvements
0887
Experiment =

AII LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too. Autotune Prooress: 276 Experiments, 29 Kept Improvements 0887 Experiment =

Andrej Karpathy details real improvements that autoresearch is finding

it’s starting to look a lot like ML research is automated

09.03.2026 22:48 👍 60 🔁 7 💬 4 📌 4
Video thumbnail

from an experiment, these are two species of neural boids that mill with an inner ring of one species and an outer ring of another

the blue ones can only mill around the red ones, on their own the blue ones just swarm, the red ones mill on their own

10.03.2026 04:01 👍 21 🔁 6 💬 2 📌 0
Video thumbnail

I'm very happy I decided to stop being lazy and actually build something to automate the way I specifically want to code with AI. it's automating lots of annoying little things. been too busy building it to really use it though.

10.03.2026 07:48 👍 43 🔁 1 💬 3 📌 0

and every graph needs its own visualisation:

philpax.me/vouchgraph/

10.03.2026 20:23 👍 14 🔁 5 💬 3 📌 0