Hanxu Hu's Avatar

Hanxu Hu

@hanxuhu

Researching Post-Training of LLMs

20
Followers
26
Following
10
Posts
21.02.2025
Joined
Posts Following

Latest posts by Hanxu Hu @hanxuhu

Joint work with Xingxing Zhang, @vamvas.bsky.social @ricosennrich.bsky.social and Furu Wei.

21.10.2025 14:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Overall, QueST opens new possibilities:
Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧡5/5

21.10.2025 14:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“Š RESULTS: State-of-the-Art Performance on 8B size. Qwen3-8B-Base trained on our 212K synthetic data matches performance of DeepSeek-R1-671B on LCB!
🧡4/5

21.10.2025 14:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🎯 OUR SOLUTION: QueST
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧡3/5

21.10.2025 14:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ“Š THE PROBLEM
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧡2/5

21.10.2025 14:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ’₯Introducing new paper: arxiv.org/pdf/2510.17715, QueST β€” train specialized generators to create challenging coding problems.
From Qwen3-8B-Base
βœ… 100K synthetic problems: better than Qwen3-8B
βœ… Combining with human written problems: matches DeepSeek-R1-671B
🧡(1/5)

21.10.2025 14:01 πŸ‘ 4 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“’ Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 πŸ‡²πŸ‡¦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

πŸ“… Workshop Mar 24–29, 2026
πŸ—“οΈ Submit by Dec 19, 2025

20.10.2025 10:37 πŸ‘ 34 πŸ” 15 πŸ’¬ 1 πŸ“Œ 0
Post image

We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.

14.03.2025 14:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.

14.03.2025 14:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.

14.03.2025 14:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I'm thrilled to share my first PhD project, a joint work with
@vamvas.bsky.social and @ricosennrich.bsky.social
Paper link:
arxiv.org/pdf/2503.10494
Long context LLMs have paved the way for document translation, but is simply inputting the whole content the optimal way?
Here's the thread 🧡 [1/n]

14.03.2025 14:58 πŸ‘ 8 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0