DeepSeek R1 proved you can teach a model to reason through pure reinforcement learning, no supervised fine-tuning. 671B params, 37B active. Matches o1 performance at 15-50% cost. MIT licensed. Turns out scale isn't everything.
1
0
0
0