|
|
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Yongchao Chen,
Jiefeng Chen,
Rui Meng,
Ji Yin,
Na Li,
Chuchu Fan,
Chi Wang,
Tomas Pfister,
Jinsung Yoon,
The Fourteenth International Conference on Learning Representations (ICLR'2026), TUMIX lifts Gemini-2.5-pro performance from 21.6 to 34.1 on Humanity Last Exam (HLE)
Paper
/
Twitter post
We propose Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents (text-only, code, search, etc.) in parallel and letting them share notes across a few rounds. Instead of brute-forcing more samples, it mixes strategies, stops when confident, and ends up both more accurate and cheaper.
|