austin (@aparker.io)

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces! Intermediate token generation (ITG), where a model produces output before the solution, has become a standard method to improve the performance of language models on reasoning tasks. These intermediat...

"We collated emerging evidence to support our position that intermediate tokens are not guaranteed to have any end user semantics, and that their interpretability and solution accuracy are often
at loggerheads..."

arxiv.org/abs/2504.09762

09.03.2026 23:27 👍 19 🔁 1 💬 2 📌 0

i really gotta hydrate more

10.03.2026 03:18 👍 13 🔁 0 💬 3 📌 0

assume 250k all-in for a person, works out to $60/hr give or take…

I dunno! You get some really strange decision trees here.

10.03.2026 03:15 👍 2 🔁 0 💬 1 📌 0

the easy reaction is “wow, $30 to find nothing, what a waste!” but… idk, it was a 1k diff, it’d probably take a real person the same amount of time or more to do a good review, they’d also presumably find nothing.

10.03.2026 03:15 👍 6 🔁 0 💬 3 📌 0

it’s an interesting play because I think it probably is priced well, especially if you play it against headcount. we tried it out today; ~$27 and about ~30m on each PR, it found actual bugs that got missed otherwise, it also found nothing in a few.

10.03.2026 03:15 👍 5 🔁 0 💬 1 📌 0

replace the org chart with a tier list

10.03.2026 01:31 👍 12 🔁 1 💬 1 📌 0

i think @shreyanjain.net has been astrally communicating with me because i have become extremely normal about alysa liu

10.03.2026 01:28 👍 13 🔁 0 💬 1 📌 1

fediverse be normal about anything challenge - rating: impossible

10.03.2026 01:27 👍 13 🔁 0 💬 0 📌 0

it's CLAUDE.md/AGENTS.md bullshit all over again

10.03.2026 01:24 👍 1 🔁 0 💬 1 📌 0

what i want is to not support 20 different fucking coding agents that cant all agree on 'what should the folder structure of a plugin be' and at least two of them are different just to piss me off

10.03.2026 01:24 👍 1 🔁 0 💬 1 📌 0

nice transparency

09.03.2026 23:40 👍 2 🔁 0 💬 0 📌 0

it's all markdown but everyone names the markdown something different

09.03.2026 23:34 👍 6 🔁 0 💬 1 📌 0

begging all of the ai agent people to get on the same fuckin' page about what to call plugins and how to install them

09.03.2026 23:32 👍 18 🔁 0 💬 6 📌 0

I'm thrilled to announce that I'll be joining Bluesky as interim CEO. I deeply believe in what this team has built and the open social web they're fighting for. More here: toni.org/2026/03/09/c...

09.03.2026 19:09 👍 1884 🔁 290 💬 379 📌 201

i do agree with the overall point tho

09.03.2026 11:39 👍 0 🔁 0 💬 0 📌 0

hm, agree and disagree. what I’ve been experimenting with is not making plans, but making decision records - using plan mode/no-edit to walk thru the domain, give constraints, explain stuff and focus on what’s important, then resetting and using those for implementation

09.03.2026 11:39 👍 8 🔁 0 💬 2 📌 0

crabs?

09.03.2026 01:34 👍 1 🔁 0 💬 1 📌 0

i have several questions but I will note that we do tend to bathe children daily, even the ones who wear diapers, and that involves soap

09.03.2026 01:26 👍 4 🔁 0 💬 1 📌 0

pls clap I was about to be a dick to a stranger on the internet but deleted the post

09.03.2026 01:25 👍 48 🔁 0 💬 2 📌 0

pretty sure that’s a crime in most jurisdictions

09.03.2026 01:21 👍 1 🔁 0 💬 0 📌 0

that’s beautiful

09.03.2026 01:21 👍 1 🔁 0 💬 0 📌 0

is the plural of google meet:

a) google meet
b) googles meet
c) google meets
d) googles meets
e) other (explain)

08.03.2026 23:25 👍 8 🔁 0 💬 12 📌 0

thank you for your service

08.03.2026 23:11 👍 2 🔁 0 💬 0 📌 0

baked, breaded chicken draped in mozz and vodka sauce

hm, not bad

08.03.2026 22:41 👍 3 🔁 0 💬 1 📌 0

real spring needs to get here fucking asap this child is going to gnaw a hole through the drywall

08.03.2026 22:37 👍 9 🔁 0 💬 0 📌 0

sometimes you get people who think they want that but it turns out they do not!

the worst part is when they don’t realize that they do not.

08.03.2026 22:25 👍 5 🔁 0 💬 0 📌 0

oh is there a clocksball today?

08.03.2026 20:53 👍 3 🔁 0 💬 0 📌 0

The Obscure Relation of Appropriateness The principles that govern human and machine behavior derive from a simple observation: human behavior is uncaused yet appropriate; machine behavior is caused and functionally appropriate.

Well, it happened again. I started writing a short piece and became a long piece.

People have questioned why I insist that human linguistic behavior is not part of a causal structure, whereas LLM behavior is. This piece provides justification. 🧵

vincentcarchidi.substack.com/p/the-obscur...

08.03.2026 14:35 👍 17 🔁 3 💬 4 📌 4

unfortunately for most of us, ~99% of actual problems bsky has are human problems

08.03.2026 19:07 👍 3 🔁 0 💬 0 📌 0

i would hazard a guess that 100% of the “problems” bsky has in the mind of users have 0% to do with the attitude of the bsky core devs towards AI

08.03.2026 19:06 👍 5 🔁 0 💬 1 📌 0

austin

Latest posts by austin @aparker.io