STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving https://arxiv.org/abs/2502.00212
Inspired by how mathematicians continue advancing the field, the authors train an LLM that conjectures and attempts proofs; then they iteratively reinforce/re-train it with correct, elegant, novel, and approachable generated conjectures and correctly generated proofs.
STP has two main components: a conjecturer and a prover. The conjecturer generates increasingly challenging conjectures that are barely provable by the current prover. The prover attempts to prove these conjectures and receives training signals based on its success.
STP significantly improves the performance of LLMs in formal theorem proving.
Inspired by how mathematicians continue advancing the field, the authors train an LLM that conjectures and attempts proofs; then they iteratively reinforce/re-train it with correct, elegant, novel, and approachable generated conjectures and correctly generated proofs.
STP has two main components: a conjecturer and a prover. The conjecturer generates increasingly challenging conjectures that are barely provable by the current prover. The prover attempts to prove these conjectures and receives training signals based on its success.
STP significantly improves the performance of LLMs in formal theorem proving.