my internet got quick enough for long enough to watch a video on youtube in 720p and that was exciting for me today
my internet got quick enough for long enough to watch a video on youtube in 720p and that was exciting for me today
this is just a factory game
i love that i get to be autistic about this
ahh im so excited im having so much fun
i said this an hour ago and the pipeline is already finished and working and the first run for the sweep of the non reasoning discriminator for the updated data needed for it isnt even done
genuinely playing toys
im also hoping with the shared backbone between the generator and discriminator in reasoning mode, after enough RL, the generator will be able to re-use the discriminator's patterns of thought to self-reflect and iterate during reasoning
i'll also be running some "normal" benchmarks here with this for the next tech report
guys i love data engineering
i am bootstrapping "general simulator that knows when to spend what amount of compute to simulate the thing i ask" from these assistants and it is annoying but it's also really fun it's like playing toys
i am just training models so i can do data engineering at this point. i am just training the data
idea here is that eventually i'll be able to let it decide when to use a scratchpad autonomously and train that behavior with RL
also updated prompt revision
the stage after this (clmr-3) will be using a reasoning discriminator instead, i have a complex synthetic data pipeline in mind for the warm start that should be pretty baller
im beefing the discriminator in my setup up with some bidirectional transformer blocks (maybe 8? for 0.5B extra params?) so that i can have a value function baseline for the generator that isn't as powerful as it
yea shit rocks for finetuning
ok im having fun again
i hope i can just keep pushing this and get a general simulator, just a base model but + test time compute seems like exactly what i would like to have right now
i am an allen ai fangirl
also cleaning up prompt w/ synth data -w-
im gonna attempt after i finish with the next clmr
you can just make that setup right now programmatically generating dynamic systems to influence
you can set the meta reasoning interval to every like 8 tokens for a huge inflation of test time compute
what i'm doing now is a text simulator with reasoning, the text it is simulating can be a reasoning chain! i feel like an idiot for not realizing this sooner
i love coding, i won't delegate that to agents. i gain such detailed maps of my mental acting as translator between high dimensional concepts and a computer, in the discrepancies between those two spaces i learn more about me. with agents you deal with a much more noisy signal
oh fuck
rn this
bsky.app/profile/crum...
i dont like the reasoning model but if you have a corpus of your own reasoning data it's easy to just train the base model and then let it rip in your RL
this thing is fuckin speedy
im really happy there's a 2b version of the latest qwen.. im gonna do so much with that