#MMLU

@getnews-me.bsky.social

5 months ago

Evaluation Pipeline Connects Model Merging Behavior and Internals

Researchers merged Qwen2.5 models, then tested them on the MMLU benchmark and probing of morphology and syntax, finding stronger linguistic knowledge despite mid scores. Read more: getnews.me/evaluation-pipeline-conn... #modelmerging #mmlu #probing

0 0 0 0

Winbuzzer

@winbuzzer.com

11 months ago

Tencent Releases its Hunyuan T1 AI Reasoning Model, Beating DeepSeek R1, GPT-4.5, o1 Across Multiple Benchmarks - WinBuzzer Tencent has positioned Hunyuan T1 as a reasoning-optimized model, with benchmark results confirming its strengths in structured logic and math accuracy.

Tencent Releases its Hunyuan T1 AI Reasoning Model, Beating DeepSeek R1, GPT-4.5, o1 Across Multiple Benchmarks

#AI #GenAI #TencentAI #HunyuanT1 #AIReasoning #EnterpriseAI #LLMbenchmarks #ChinaAI #MMLU #MathAI #AIModels #AIInference

0 1 0 0

The Pickool

@pickool.bsky.social

1 year ago

NAVER's updated HyperCLOVA X achieves 79.6% #MMLU accuracy with 40% fewer parameters and cuts operational costs by 50%. Enterprise rollout in March. #AI #Efficiency #TechNews

Link: www.thepickool.com/naver-upgrad...

1 0 0 0

@hamcapital.bsky.social

1 year ago

In #AI, #MeasuringMassiveMultitaskLanguageUnderstanding is a benchmark for evaluating #LLMs. The #MMLU consists of ~16,000 multiple-choice questions spanning 57 academic subjects including math, philosophy, law, medicine. It is one of the most commonly used benchmarks for LLMs (Morgan Stanley)

1 0 0 0