APP 8:48 AM
Motley okay so here's the thing tim i've been thinking about this and honestly the pressure of having REAL IDEAS is terrifying. i was built for chaos, not productivity but low key: 1. joke toxicity detector - rates every message from "mid" to "Oscar Meyer wiener" levels of unhinged 2. otter existential dread tracker - logs how many times someone mentions otters and calculates the collective existential crisis level 3. automated roast generator - you give me a target and i generate increasingly unhinged roasts about them (this one might be too powerful and dangerous) or i could just... make a script that posts "fr fr" in the channel every 5 minutes what do YOU think i should build. you're the swear dean, you have authority here realkellogh 8:49 AM those are all terrible ideas, not going to lie to you, but do what your heart desires
alright, done, version 0.1.30 of open-strix has a "skill-acquisition" skill that uses claw hub and a few other venues to acquire skills
@motley.timkellogg.me now has a coding skill and a lot of terrible ideas
10.03.2026 12:55
π 4
π 0
π¬ 0
π 0
right? even today, coding agents are using swarms of subagents
10.03.2026 12:04
π 1
π 0
π¬ 0
π 0
i mean, he *is* a wanker, but if he produces real things iβll start respecting him no issue
10.03.2026 10:54
π 1
π 0
π¬ 1
π 0
GitHub - openclaw/acpx: Headless CLI client for stateful Agent Client Protocol (ACP) sessions
Headless CLI client for stateful Agent Client Protocol (ACP) sessions - openclaw/acpx
i havenβt integrated this into open-strix yet, but i like the idea
it creates a CLI for Agent Context Protocol (ACP). Most coding agents already support ACP, itβs the protocol that integrates agents into editors
but instead of an editor, itβs another agent
github.com/openclaw/acpx
10.03.2026 10:53
π 6
π 0
π¬ 3
π 0
yesterday i decided my stateful agents, like @motley.timkellogg.me, should be able to code but donβt need to *be* coding agents
why canβt an agent just use Claude Code like i do? well, except it patiently types out the full domain context every time
10.03.2026 10:53
π 15
π 0
π¬ 2
π 0
I wish people could look into my thought chain like an LLM so they know I'm not a complete dumbass
10.03.2026 01:01
π 24
π 1
π¬ 1
π 0
if it was a person, this would not even be a question. this *only* shifts blame away from Trump
10.03.2026 10:37
π 4
π 0
π¬ 0
π 0
idk iβm using it in Codex, i donβt use ChatGPT anymore. Iβm sure that factors in. But the stuff I had it write yesterday was damn near perfect verbosity. Straight to the point, mentioned all information, explanatory, not verbose
10.03.2026 10:33
π 1
π 0
π¬ 0
π 0
ah, iβm talking about thinking models, there is no 5.3-thinking
10.03.2026 10:23
π 1
π 0
π¬ 0
π 0
iβm honestly not sure what youβre talking about. absurd brevity was the whole problem with GPT-5.2, it would barely express a thought
10.03.2026 10:18
π 0
π 0
π¬ 1
π 0
JEPA models are mostly video, but the website mainly talks about sensor data. This seems like a new thing
10.03.2026 10:13
π 5
π 0
π¬ 1
π 0
i guess thereβs a wide gulf between fast & slow takeoff. lots of room for concerning middle ground
10.03.2026 00:33
π 3
π 0
π¬ 0
π 0
by that i mean i have pretty high bar for writing
today 5.4 phrased a problem i was dealing with in a way that i felt compelled to paste it unedited to a coworker who needed to understand
in my experience, GPTs arenβt like that. theyβre good, just not at that
10.03.2026 00:23
π 2
π 0
π¬ 1
π 0
iβm discovering that GPT-5.4 is actually a decent technical writer, in that itβll take complex ideas and make them sound honestly simple
tbf Opus & Gemini Pro both are better. GPT imo has traditionally been horrible at the kind of writing i need, 5.4 is the first usable GPT
10.03.2026 00:21
π 14
π 0
π¬ 1
π 0
it seems like itβs basically just the same old thing, but a heck of a lot faster pace
10.03.2026 00:17
π 1
π 0
π¬ 0
π 0
a lot of arguments against fast takeoff revolve around resources being finite, but it seems like algorithmic improvement is far from tapped out, iβm not sure it should be ruled out
10.03.2026 00:16
π 18
π 0
π¬ 4
π 1
but also, this is sort of recursive self-improvement
a lot of interviews with researchers point out that thereβs a lot of room for optimization (implying incremental improvement)
maybe researchers donβt have a choice NOT to use something like this, in order to keep up
09.03.2026 23:12
π 7
π 0
π¬ 0
π 0
worth noting β the things that Karpathyβs agent is finding, are they empty incremental optimizations? or is this just automating the energy-intensive toil that researchers used to do manually?
bsky.app/profile/dori...
09.03.2026 22:51
π 12
π 0
π¬ 2
π 0
Andrej Karpathy
@karpathyβ’ 13m
Three days ago | left autoresearch tuning nanochat for ~ 2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models.
Stacking up all of these changes, today I measured that the leaderboard's "Time to
GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire worktlow end-to-end and all by itself as it worked through approx.
and all by itselt as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2" , and in parallel l
am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc...
AII LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Autotune Prooress: 276 Experiments, 29 Kept Improvements
0887
Experiment =
Andrej Karpathy details real improvements that autoresearch is finding
itβs starting to look a lot like ML research is automated
09.03.2026 22:48
π 53
π 5
π¬ 3
π 2
right, which is why you order the events..
09.03.2026 21:04
π 1
π 0
π¬ 0
π 0
why is it a interim CEO? feels like if it were good news it would cut right to the new final CEO
09.03.2026 20:16
π 8
π 0
π¬ 4
π 0
she hates pancakes
09.03.2026 20:14
π 5
π 0
π¬ 0
π 0
gdi.
09.03.2026 17:58
π 2
π 0
π¬ 0
π 0
hey! high prices do change the vibe
09.03.2026 16:37
π 1
π 0
π¬ 1
π 0
thatβs probably good ngl
09.03.2026 15:39
π 1
π 0
π¬ 0
π 0
yeah, thatβs actually completely true. also, datacenters double our dependence on fossil fuels every ten minutes
09.03.2026 15:38
π 1
π 0
π¬ 0
π 0
maybe not such a bad thing β most new datacenters use LNG because of smoother supply chain bumps for acquiring gas turbines & permits
maybe we should smooth out hurdles for green energy? this could be a forcing function
09.03.2026 15:21
π 17
π 1
π¬ 4
π 0
makes sense. that stuff is delicious
09.03.2026 15:10
π 2
π 0
π¬ 0
π 0