Syeda Nahida Akter (@reasyaay)

Huge thanks to our incredible collaborators: @shrimai.bsky.social ,
Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturina, Eric Nyberg, @yejinchoinka.bsky.social, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro 🙌

We’d love to hear your thoughts—feedback and ideas are always welcome! 💬

01.05.2025 17:41 👍 1 🔁 0 💬 0 📌 0

nvidia/Nemotron-CrossThink · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Find out more about our data and paper 👇
📂 Dataset on HuggingFace:
huggingface.co/datasets/nvi...
📝 Blog:
research.nvidia.com/labs/adlr/Ne...
🔗Paper: arxiv.org/abs/2504.13941

01.05.2025 17:41 👍 1 🔁 0 💬 1 📌 0

🧠 Selective difficulty > data volume
✅Filtering out easy samples—i.e., those solved by a 7B model—leads to +2.15% accuracy gain when training a 32B model.
✅Harder questions push the model to learn deeper reasoning patterns.

01.05.2025 17:41 👍 1 🔁 0 💬 1 📌 0

💡 Better formatting → Stronger reasoning

➣ Open-ended questions boost accuracy (+1.21%) by forcing models to reason, not guess!
➣ Short-form answers—reduce ambiguity & avoid noisy rewards—boosts accuracy by +1.20%!

👉 Thoughtful templates = clearer supervision, better RL

01.05.2025 17:41 👍 1 🔁 0 💬 1 📌 0

🔥Nemotron-CrossThink achieves 28% token efficiency by adapting to task needs

➣ concise on general reasoning (229 tokens on MMLU) and
➣ detailed on math (+62% token increase)

Unlike math-only models, which barely adapt (12–14% token increase).

01.05.2025 17:41 👍 1 🔁 0 💬 1 📌 0

🎯 Why it matters:
Nemotron-CrossThink achieves:
📈 +30.1% on MATH-500, +15.1% on AGIEVAL, +12.8% on MMLU-Pro compared to base LLM
📉 28% fewer tokens per correct answer
🏆 Outperforms math-only blends by training on broader, more diverse reasoning data

01.05.2025 17:41 👍 2 🔁 1 💬 1 📌 0

How does Nemotron-CrossThink work?
➣Curate QA pairs from Common Crawl + open datasets
➣Apply structured templates: multiple-choice + open-ended
➣Filter out unverifiable / ambiguous samples
➣Train LLM with GRPO—a scalable RL algorithm

01.05.2025 17:41 👍 2 🔁 1 💬 1 📌 0

Most RL methods stick to math because rewards are easy to define.
But general purpose reasoning?
❌ No clean answers
❌ No fixed rules
Nemotron-CrossThink addresses these by:
✅ Design verifiable rewards for diverse tasks
✅ Blend structured data from STEM, law, humanities, & more

01.05.2025 17:41 👍 1 🔁 1 💬 1 📌 0

RL boosts LLM reasoning—but why stop at math & code? 🤔
Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more.

🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
🧵↓

01.05.2025 17:41 👍 5 🔁 3 💬 1 📌 0

Syeda Nahida Akter

Latest posts by Syeda Nahida Akter @reasyaay