We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers.
We evaluated them on the AIME 2025 I competition from *yesterday* and the results are good!
Source: https://x.com/mbalunovic/status/1887962694659060204