Trending

#mathreasoning

Latest posts tagged with #mathreasoning on Bluesky

Latest Top
Trending

Posts tagged #mathreasoning

Post image

Falcon H1R 7B just crushed AIME 2025 with an 83.1% score—out‑reasoning models up to 7× its size. Can open‑source finally beat the big labs? Dive into the details. #FalconH1R7B #AIME2025 #MathReasoning

🔗 aidailypost.com/news/falcon-...

0 0 0 0
AdaR Framework Enhances Adaptive Math Reasoning in LLMs

AdaR Framework Enhances Adaptive Math Reasoning in LLMs

Researchers introduced AdaR, a framework that trains LLMs on logically equivalent math prompts to boost robustness, with a paper submitted in October 2025. The code is open on GitHub. getnews.me/adar-framework-enhances-... #adar #llm #mathreasoning

0 0 0 0
VCSearch Boosts Detection of Ill-Defined Math Problems for LLMs

VCSearch Boosts Detection of Ill-Defined Math Problems for LLMs

VCSearch boosts detection of unsolvable math problems by at least 12% and was released on 28 September 2025. The PMC benchmark holds over 5,000 ill‑defined questions. Read more: getnews.me/vcsearch-boosts-detectio... #vcsearch #mathreasoning #emnlp

0 0 0 0
Random Policy Valuation Boosts LLM Math Reasoning

Random Policy Valuation Boosts LLM Math Reasoning

Researchers introduced Random Policy Valuation for Diverse Reasoning (ROVER), which improves LLM math reasoning by +8.2 pp on pass@1 and +16.8 pp on pass@256, while boosting solution diversity. Read more: getnews.me/random-policy-valuation-... #llm #mathreasoning

0 0 0 0
Problem‑Aware Strategy Routing Boosts LLM Mathematical Reasoning

Problem‑Aware Strategy Routing Boosts LLM Mathematical Reasoning

PRISM, a new framework for LLM math reasoning, adapts its strategy per problem and boosts benchmark accuracy by up to 7%. The code and MathStrat dataset are open‑source on GitHub. getnews.me/problem-aware-strategy-r... #prism #llm #mathreasoning

0 0 0 0
Future Policy Aware Preference Learning Boosts LLM Math Reasoning

Future Policy Aware Preference Learning Boosts LLM Math Reasoning

Future Policy Aware (FPA) preference learning boosts LLM math performance, with SimPER plus FPA gaining up to 5.75% on MATH and GSM8K benchmarks, while adding minimal overhead. getnews.me/future-policy-aware-pref... #llm #mathreasoning

0 0 0 0
LLMs Learn Better from Incorrect Answers Without Explanations

LLMs Learn Better from Incorrect Answers Without Explanations

An EMNLP 2025 paper reports LLMs achieve better math‑reasoning accuracy when given only wrong answers, surpassing chain‑of‑thought prompts; the gap widens with larger models. Read more: getnews.me/llms-learn-better-from-i... #llm #mathreasoning

0 0 0 0
Cross-Lingual Reward Modeling Boosts Multilingual LLM Math Reasoning

Cross-Lingual Reward Modeling Boosts Multilingual LLM Math Reasoning

A cross‑lingual reward model scores multilingual math answers and beats same‑language baselines on a benchmark, even with few sampled candidates. getnews.me/cross-lingual-reward-mod... #multilingualllm #mathreasoning

0 0 0 0
False Positive Solutions Persist in Scaled Math Reasoning Models

False Positive Solutions Persist in Scaled Math Reasoning Models

A September 2025 study shows false-positive math solutions stay common across open-source models; scaling or sampling doesn't cut their rate, and pass@N often inflates performance. Read more: getnews.me/false-positive-solutions... #mathreasoning #ai

1 0 0 0
Post image

🚨 Microsoft just dropped Phi-4—a small yet mighty LLM excelling in advanced math reasoning! 🧮

🔹 High-quality results at a compact size
🔹 Open-source under the MIT license

The frontier for efficient AI is here. Ready to explore?

#AI #MathReasoning #Phi4

0 0 0 0
Preview
Apple Researchers Challenge Large Language Models' Math Reasoning Capabilities with New Benchmark Apple researchers introduced GSM-Symbolic, a new benchmark to reveal the weaknesses in large language models' mathematical reasoning, showing that they rely heavily on pattern-matching rather than gen...

🔢🧠📊 Apple Researchers Challenge Large Language Models' Math Reasoning Capabilities with New Benchmark www.azoai.com/news/2024102... #AI #MachineLearning #LLM #Mathematics #ArtificialIntelligence #Research #Benchmarking #MathReasoning #SymbolicLogic #TechInnovation @arxiv-stat-ml.bsky.social

0 0 0 0