Evaluation Pipeline Connects Model Merging Behavior and Internals
Researchers merged Qwen2.5 models, then tested them on the MMLU benchmark and probing of morphology and syntax, finding stronger linguistic knowledge despite mid scores. Read more: getnews.me/evaluation-pipeline-conn... #modelmerging #mmlu #probing