Hanxu Hu (@hanxuhu)

Joint work with Xingxing Zhang, @vamvas.bsky.social @ricosennrich.bsky.social and Furu Wei.

21.10.2025 14:01 👍 0 🔁 0 💬 0 📌 0

Overall, QueST opens new possibilities:
Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧵5/5

21.10.2025 14:01 👍 0 🔁 0 💬 1 📌 0

📊 RESULTS: State-of-the-Art Performance on 8B size. Qwen3-8B-Base trained on our 212K synthetic data matches performance of DeepSeek-R1-671B on LCB!
🧵4/5

21.10.2025 14:01 👍 0 🔁 0 💬 1 📌 0

🎯 OUR SOLUTION: QueST
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧵3/5

21.10.2025 14:01 👍 1 🔁 0 💬 1 📌 0

📊 THE PROBLEM
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧵2/5

21.10.2025 14:01 👍 0 🔁 0 💬 1 📌 0

💥Introducing new paper: arxiv.org/pdf/2510.17715, QueST — train specialized generators to create challenging coding problems.
From Qwen3-8B-Base
✅ 100K synthetic problems: better than Qwen3-8B
✅ Combining with human written problems: matches DeepSeek-R1-671B
🧵(1/5)

21.10.2025 14:01 👍 4 🔁 3 💬 1 📌 0

📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025

20.10.2025 10:37 👍 34 🔁 15 💬 1 📌 0

We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.

14.03.2025 14:58 👍 0 🔁 0 💬 0 📌 0

We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.

14.03.2025 14:58 👍 0 🔁 0 💬 1 📌 0

We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.

14.03.2025 14:58 👍 0 🔁 0 💬 1 📌 0

I'm thrilled to share my first PhD project, a joint work with
@vamvas.bsky.social and @ricosennrich.bsky.social
Paper link:
arxiv.org/pdf/2503.10494
Long context LLMs have paved the way for document translation, but is simply inputting the whole content the optimal way?
Here's the thread 🧵 [1/n]

14.03.2025 14:58 👍 8 🔁 3 💬 1 📌 0

Hanxu Hu

Latest posts by Hanxu Hu @hanxuhu