"We collated emerging evidence to support our position that intermediate tokens are not guaranteed to have any end user semantics, and that their interpretability and solution accuracy are often
at loggerheads..."
arxiv.org/abs/2504.09762
"We collated emerging evidence to support our position that intermediate tokens are not guaranteed to have any end user semantics, and that their interpretability and solution accuracy are often
at loggerheads..."
arxiv.org/abs/2504.09762
i really gotta hydrate more
assume 250k all-in for a person, works out to $60/hr give or takeβ¦
I dunno! You get some really strange decision trees here.
the easy reaction is βwow, $30 to find nothing, what a waste!β butβ¦ idk, it was a 1k diff, itβd probably take a real person the same amount of time or more to do a good review, theyβd also presumably find nothing.
itβs an interesting play because I think it probably is priced well, especially if you play it against headcount. we tried it out today; ~$27 and about ~30m on each PR, it found actual bugs that got missed otherwise, it also found nothing in a few.
replace the org chart with a tier list
i think @shreyanjain.net has been astrally communicating with me because i have become extremely normal about alysa liu
fediverse be normal about anything challenge - rating: impossible
it's CLAUDE.md/AGENTS.md bullshit all over again
what i want is to not support 20 different fucking coding agents that cant all agree on 'what should the folder structure of a plugin be' and at least two of them are different just to piss me off
nice transparency
it's all markdown but everyone names the markdown something different
begging all of the ai agent people to get on the same fuckin' page about what to call plugins and how to install them
I'm thrilled to announce that I'll be joining Bluesky as interim CEO. I deeply believe in what this team has built and the open social web they're fighting for. More here: toni.org/2026/03/09/c...
i do agree with the overall point tho
hm, agree and disagree. what Iβve been experimenting with is not making plans, but making decision records - using plan mode/no-edit to walk thru the domain, give constraints, explain stuff and focus on whatβs important, then resetting and using those for implementation
crabs?
i have several questions but I will note that we do tend to bathe children daily, even the ones who wear diapers, and that involves soap
pls clap I was about to be a dick to a stranger on the internet but deleted the post
pretty sure thatβs a crime in most jurisdictions
thatβs beautiful
is the plural of google meet:
a) google meet
b) googles meet
c) google meets
d) googles meets
e) other (explain)
thank you for your service
baked, breaded chicken draped in mozz and vodka sauce
hm, not bad
real spring needs to get here fucking asap this child is going to gnaw a hole through the drywall
sometimes you get people who think they want that but it turns out they do not!
the worst part is when they donβt realize that they do not.
oh is there a clocksball today?
Well, it happened again. I started writing a short piece and became a long piece.
People have questioned why I insist that human linguistic behavior is not part of a causal structure, whereas LLM behavior is. This piece provides justification. π§΅
vincentcarchidi.substack.com/p/the-obscur...
unfortunately for most of us, ~99% of actual problems bsky has are human problems
i would hazard a guess that 100% of the βproblemsβ bsky has in the mind of users have 0% to do with the attitude of the bsky core devs towards AI