Human level sample efficiency? LIMO: Less is More for Reasoning https://arxiv.org/abs/2502.03387
- LIMO achieves unprecedented performance in mathematical reasoning with only 1% of the training data used by previous approaches, showcasing remarkable data efficiency.
- LIMO exhibits exceptional out-of-distribution generalization, outperforming models trained on 100x more data by a significant 40.5% absolute improvement across diverse benchmarks.
LIMO Hypothesis: In foundation models with comprehensively encoded domain knowledge (achieved through extensive pre-training), sophisticated reasoning can emerge through minimal, precisely orchestrated demonstrations of cognitive processes.
- The core of LIMO's success lies in the meticulous curation of a small, high-quality dataset. The resulting dataset of 817 examples was carefully selected from millions of candidates.
- LIMO fundamentally challenges the assumption that massive datasets are necessary for complex reasoning in LLMs. Quality of the examples, rather than just the number, is the key factor.
- LIMO suggests that modern, well-pretrained models like Qwen already possess latent, rich reasoning capabilities. LIMO demonstrates that these capabilities can be unlocked and activated effectively with the right "cognitive templates" provided by curated examples.
- LIMO indicates that sophisticated reasoning, regardless of complexity, could potentially be activated with minimal samples given sufficient pre-trained domain knowledge and optimal cognitive reasoning chains for activation.
Further research is needed to validate the LIMO hypothesis across different model architectures and reasoning domains beyond mathematics.
- LIMO achieves unprecedented performance in mathematical reasoning with only 1% of the training data used by previous approaches, showcasing remarkable data efficiency.
- LIMO exhibits exceptional out-of-distribution generalization, outperforming models trained on 100x more data by a significant 40.5% absolute improvement across diverse benchmarks.
LIMO Hypothesis: In foundation models with comprehensively encoded domain knowledge (achieved through extensive pre-training), sophisticated reasoning can emerge through minimal, precisely orchestrated demonstrations of cognitive processes.
- The core of LIMO's success lies in the meticulous curation of a small, high-quality dataset. The resulting dataset of 817 examples was carefully selected from millions of candidates.
- LIMO fundamentally challenges the assumption that massive datasets are necessary for complex reasoning in LLMs. Quality of the examples, rather than just the number, is the key factor.
- LIMO suggests that modern, well-pretrained models like Qwen already possess latent, rich reasoning capabilities. LIMO demonstrates that these capabilities can be unlocked and activated effectively with the right "cognitive templates" provided by curated examples.
- LIMO indicates that sophisticated reasoning, regardless of complexity, could potentially be activated with minimal samples given sufficient pre-trained domain knowledge and optimal cognitive reasoning chains for activation.
Further research is needed to validate the LIMO hypothesis across different model architectures and reasoning domains beyond mathematics.