Fun read, and interesting method: identify a circuit of several "thinking" layers and repeat it during inference for improved results, no extra training of any kind.
Fun read, and interesting method: identify a circuit of several "thinking" layers and repeat it during inference for improved results, no extra training of any kind.
One would imagine Jurgen would've shown us some results with more than a decade of a headstart...
they show that more speed is always possible and weird tricks can work. idk, they feel more practical than, say, hutter prize.
is thor any good for llms?
Fun, but a bit frustrating. Some felt obvious, while others felt unfair due to being too simple...
now compare to ripgrep
DGX spark is consumer blackwell, sorry...
I suppose my brain is too pytorch shaped to appreciate the value of non-ML use cases...
Hopper and Blackwell (and not the consumer blackwell, probably...)
boring, we have many numpies in rust already. we also have multiple pytorch in rust attempts, but how about... jax in rust?
now that sounds a bit more exciting.
FlashAttention-4
I hope it is not pain to work with. It changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed!
Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear.
+20Β°C, I see, so we're skipping spring and going straight into summer, followed by the new annual season - hellfire.
the washed out browns of early spring feel more autumn than autumn itself.
Earliest dyson spheres were built inside out covering increasingly large fractions of the builders' native planet.
Qwen 3.5 Small Model Series just dropped on
@hf.co π₯
huggingface.co/collections/...
β¨ 0.8B/2B/4B/9B
β¨ Apache2.0
β¨ 262Kβ1M token context
consciousness is defined by human experience. humans do experience stuff. but blindsight claims it's not neccessary for intelligence, not that it doesn't exist.
(at least that's how I remember it...)
what is this good for? probably only interesting if you want to run stuff using candle for some reason. these reasons are your own.
Some time ago, I ported RF-DETR inference from pytorch to rust's candle using Opus 4.5 and my own hands.
With some iteration and hand-holding, Opus managed to get it correct.
Port can be found here: github.com/slckl/candle...
redzΔs, kas bΕ«s vietΔ...
To expand, good tests to verify correctness and a target benchmark that Claude can run on its own repeatedly will often yield substantial performance gains for a looooooooot of code.
There are only so many performance engineers out there, a lot of projects could benefit this way.
1) I've had similar experience with optimizing rust code. If you have a bench to target and tests for correctness, Claude will squeeze the juice out of any stone, certainly far beyond what I could do.
2) I must have missed the mad bits! I'm just happy to see more rust users.
LLMs getting much better at pushing back against bullshit prompts.
βGreen means the model clearly called out the nonsense. Amber means partial challenge. Red means the model let nonsense passβ
github.com/petergpt/bul...
it's fun the first time.
Its cool that self driving cars are real now; new blog post open.substack.com/pub/itcanthi...
Rest in peace, good kittizen :(
the era of rambling as specification
dear model, decipher my dreams and make them real
surely the system prompt will never lie about this being another workday.
the doe of consciousness has blessed you.