The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning ...
Jierun Chen, Tiezheng YU, Haoli Bai et al.
Action editor: Sylvain Le Corff
https://openreview.net/forum?id=XPML8UGI04
#reasoning #multimodal #verbosity