Our paper on RL scaling laws got a NeurIPS spotlight!! ๐๐ฅณ๐จ
We added A LOT of new content, especially analyzing why inverse scaling happens. Check it out on arxiv!
Our paper on RL scaling laws got a NeurIPS spotlight!! ๐๐ฅณ๐จ
We added A LOT of new content, especially analyzing why inverse scaling happens. Check it out on arxiv!
Reviewer asks why we didn't cite a recent paper. That paper cites our paper that's being reviewed ๐
I wonder how common citation cycles are...
There are quite a few papers on supply chain management with RL, although only on toy problems. I'm currently writing a paper on doing it with real supply chains.
Is it all related to dormant neurons or is there other literature on why RL struggles with plasticity?
arxiv.org/abs/2302.12902
Read the full paper for more details and results: 'AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws'. โ๏ธ
arxiv.org/abs/2412.11979
Big thanks to
@ericjmichaud.bsky.social
for sharing his wisdom! this all started with our hallway chat in ICLR ๐
X:
x.com/neumann_oren...
There is: in those games, larger models improve overall accuracy by focusing on late-game positions, forgetting what they learned on opening positions. This directly harms performance, since mastering openings is crucial, while wrapping up a game can be done with blind MCTS.
AlphaZero doesn't always scale nicely. On some games, Elo goes up, then sharply degrades w/ model size. We noticed this happens in games where game rules bend the Zipf curve, since end-game board positions have a high frequency. Is there a connection?
In line with the quantization model, we see that AlphaZero agents fit board states in decreasing order of frequency. This is very surprising: high-frequency opening moves are exponentially harder to model, since they depend on downstream positions.
There is! Chess/Go tournament games famously follow Zipf's law: the frequency of each board position scales as a power of their rank.
We find that Zipf's law emerges also in RL self-play games. It's a direct result of universal board-game rules.
The quantization model suggests that LLM power-law scaling results from the Zipf's law of natural language:
arxiv.org/abs/2303.13506
In RL, AlphaZero has one of the few examples of power-law scaling:
arxiv.org/abs/2210.00849
But is there a Zipf's law in board games?
๐จDo RL scaling laws share the same origin as LLM scaling laws?
We show that AlphaZero scaling might be the result of Zipf's law, and that inverse-scaling can result from unusual frequency curves.
arxiv.org/abs/2412.11979
A ๐งต on scaling laws and board games! โ๏ธ๐ฒ
I'm excited to share a new paper: "Mastering Board Games by External and Internal Planning with Language Models"
storage.googleapis.com/deepmind-med...
(also soon to be up on Arxiv, once it's been processed there)