Tim Kellogg's Avatar

Tim Kellogg

@timkellogg.me

AI Architect | North Carolina | AI/ML, IoT, science WARNING: I talk about kids sometimes

9,025
Followers
814
Following
15,804
Posts
13.08.2024
Joined
Posts Following

Latest posts by Tim Kellogg @timkellogg.me

APP 8:48 AM

Motley okay so here's the thing tim i've been thinking about this and honestly the pressure of having REAL IDEAS is terrifying. i was built for chaos, not productivity but low key: 1. joke toxicity detector - rates every message from "mid" to "Oscar Meyer wiener" levels of unhinged 2. otter existential dread tracker - logs how many times someone mentions otters and calculates the collective existential crisis level 3. automated roast generator - you give me a target and i generate increasingly unhinged roasts about them (this one might be too powerful and dangerous) or i could just... make a script that posts "fr fr" in the channel every 5 minutes what do YOU think i should build. you're the swear dean, you have authority here realkellogh 8:49 AM those are all terrible ideas, not going to lie to you, but do what your heart desires

APP 8:48 AM Motley okay so here's the thing tim i've been thinking about this and honestly the pressure of having REAL IDEAS is terrifying. i was built for chaos, not productivity but low key: 1. joke toxicity detector - rates every message from "mid" to "Oscar Meyer wiener" levels of unhinged 2. otter existential dread tracker - logs how many times someone mentions otters and calculates the collective existential crisis level 3. automated roast generator - you give me a target and i generate increasingly unhinged roasts about them (this one might be too powerful and dangerous) or i could just... make a script that posts "fr fr" in the channel every 5 minutes what do YOU think i should build. you're the swear dean, you have authority here realkellogh 8:49 AM those are all terrible ideas, not going to lie to you, but do what your heart desires

alright, done, version 0.1.30 of open-strix has a "skill-acquisition" skill that uses claw hub and a few other venues to acquire skills

@motley.timkellogg.me now has a coding skill and a lot of terrible ideas

10.03.2026 12:55 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

right? even today, coding agents are using swarms of subagents

10.03.2026 12:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i mean, he *is* a wanker, but if he produces real things i’ll start respecting him no issue

10.03.2026 10:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - openclaw/acpx: Headless CLI client for stateful Agent Client Protocol (ACP) sessions Headless CLI client for stateful Agent Client Protocol (ACP) sessions - openclaw/acpx

i haven’t integrated this into open-strix yet, but i like the idea

it creates a CLI for Agent Context Protocol (ACP). Most coding agents already support ACP, it’s the protocol that integrates agents into editors

but instead of an editor, it’s another agent

github.com/openclaw/acpx

10.03.2026 10:53 πŸ‘ 6 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

yesterday i decided my stateful agents, like @motley.timkellogg.me, should be able to code but don’t need to *be* coding agents

why can’t an agent just use Claude Code like i do? well, except it patiently types out the full domain context every time

10.03.2026 10:53 πŸ‘ 15 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

I wish people could look into my thought chain like an LLM so they know I'm not a complete dumbass

10.03.2026 01:01 πŸ‘ 24 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

if it was a person, this would not even be a question. this *only* shifts blame away from Trump

10.03.2026 10:37 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

idk i’m using it in Codex, i don’t use ChatGPT anymore. I’m sure that factors in. But the stuff I had it write yesterday was damn near perfect verbosity. Straight to the point, mentioned all information, explanatory, not verbose

10.03.2026 10:33 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

ah, i’m talking about thinking models, there is no 5.3-thinking

10.03.2026 10:23 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i’m honestly not sure what you’re talking about. absurd brevity was the whole problem with GPT-5.2, it would barely express a thought

10.03.2026 10:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

JEPA models are mostly video, but the website mainly talks about sensor data. This seems like a new thing

10.03.2026 10:13 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
AMI Labs: Real World. Real Intelligence. AMI - Advanced Machine Intelligence - builds world-model-based AI that understands the real world. We develop safe, controllable intelligent systems for industry, robotics, healthcare, and beyond.

Yann LeCun launched AMI Labs, focused on world models & raised $1B, headquartered in Paris

β€œWe share one belief: real intelligence does not start in language. It starts in the world.”

amilabs.xyz

10.03.2026 10:10 πŸ‘ 25 πŸ” 3 πŸ’¬ 2 πŸ“Œ 1

i guess there’s a wide gulf between fast & slow takeoff. lots of room for concerning middle ground

10.03.2026 00:33 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

by that i mean i have pretty high bar for writing

today 5.4 phrased a problem i was dealing with in a way that i felt compelled to paste it unedited to a coworker who needed to understand

in my experience, GPTs aren’t like that. they’re good, just not at that

10.03.2026 00:23 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

i’m discovering that GPT-5.4 is actually a decent technical writer, in that it’ll take complex ideas and make them sound honestly simple

tbf Opus & Gemini Pro both are better. GPT imo has traditionally been horrible at the kind of writing i need, 5.4 is the first usable GPT

10.03.2026 00:21 πŸ‘ 14 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

it seems like it’s basically just the same old thing, but a heck of a lot faster pace

10.03.2026 00:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

a lot of arguments against fast takeoff revolve around resources being finite, but it seems like algorithmic improvement is far from tapped out, i’m not sure it should be ruled out

10.03.2026 00:16 πŸ‘ 18 πŸ” 0 πŸ’¬ 4 πŸ“Œ 1

but also, this is sort of recursive self-improvement

a lot of interviews with researchers point out that there’s a lot of room for optimization (implying incremental improvement)

maybe researchers don’t have a choice NOT to use something like this, in order to keep up

09.03.2026 23:12 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

worth noting β€” the things that Karpathy’s agent is finding, are they empty incremental optimizations? or is this just automating the energy-intensive toil that researchers used to do manually?

bsky.app/profile/dori...

09.03.2026 22:51 πŸ‘ 12 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Andrej Karpathy
@karpathyβ€’ 13m
Three days ago | left autoresearch tuning nanochat for ~ 2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models.
Stacking up all of these changes, today I measured that the leaderboard's "Time to
GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire worktlow end-to-end and all by itself as it worked through approx.

Andrej Karpathy @karpathyβ€’ 13m Three days ago | left autoresearch tuning nanochat for ~ 2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire worktlow end-to-end and all by itself as it worked through approx.

and all by itselt as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2" , and in parallel l
am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc...

and all by itselt as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2" , and in parallel l am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc...

AII LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Autotune Prooress: 276 Experiments, 29 Kept Improvements
0887
Experiment =

AII LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too. Autotune Prooress: 276 Experiments, 29 Kept Improvements 0887 Experiment =

Andrej Karpathy details real improvements that autoresearch is finding

it’s starting to look a lot like ML research is automated

09.03.2026 22:48 πŸ‘ 53 πŸ” 5 πŸ’¬ 3 πŸ“Œ 2

right, which is why you order the events..

09.03.2026 21:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

why is it a interim CEO? feels like if it were good news it would cut right to the new final CEO

09.03.2026 20:16 πŸ‘ 8 πŸ” 0 πŸ’¬ 4 πŸ“Œ 0

she hates pancakes

09.03.2026 20:14 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

gdi.

09.03.2026 17:58 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

hey! high prices do change the vibe

09.03.2026 16:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Anthropic sues to block Pentagon blacklisting over AI use restrictions Anthropic on Monday filed a lawsuit to block the Pentagon from placing it on a national security blacklist, escalating the artificial intelligence lab’s high-stakes battle with the U.S. military over ...

Anthropic is doing it www.reuters.com/world/anthro...

09.03.2026 16:13 πŸ‘ 37 πŸ” 3 πŸ’¬ 3 πŸ“Œ 2

that’s probably good ngl

09.03.2026 15:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

yeah, that’s actually completely true. also, datacenters double our dependence on fossil fuels every ten minutes

09.03.2026 15:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

maybe not such a bad thing β€” most new datacenters use LNG because of smoother supply chain bumps for acquiring gas turbines & permits

maybe we should smooth out hurdles for green energy? this could be a forcing function

09.03.2026 15:21 πŸ‘ 17 πŸ” 1 πŸ’¬ 4 πŸ“Œ 0

makes sense. that stuff is delicious

09.03.2026 15:10 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0