
INSAIT releases Open Proof Corpus — the largest public collection of expert-annotated AI-generated mathematical proofs

INSAIT and ETH Zurich announces the release of the Open Proof Corpus (OPC) — the largest public dataset of mathematical proofs generated by large language models (LLMs) and meticulously annotated for correctness by human experts. Alongside the dataset, INSAIT has published a large-scale study analyzing the performance of today’s AI models in mathematical proof generation. Both the dataset and study are available at proofcorpus.ai.
The Open Proof Corpus builds on INSAIT’s earlier platform MathArena.ai, whose results have been adopted by major technology companies to benchmark and develop their own “thinking” AI models. The OPC represents a significant step forward in advancing AI systems beyond producing correct answers toward generating valid and rigorous mathematical proofs.
The corpus includes over 5,000 proofs produced by leading AI models such as Gemini 2.5 pro, OpenAI O3, OpenAI O4-mini, Qwen, and DeepSeek R1. The problems are sourced from 20 elite mathematical competitions, including the International Mathematical Olympiad (IMO) and the USA Mathematical Olympiad (USAMO). All proofs have been carefully reviewed and annotated by experienced expert judges.
The study reveals several key findings: large language models are approaching the level of human experts in judging the correctness of mathematical proofs, but generating formal proofs (e.g., in Lean) remains a major challenge for current AI systems.
The OPC dataset, the accompanying research paper detailing these results, and INSAIT’s fine-tuned AI models are available at proofcorpus.ai.
Authors: Jasper Dekoninck, Ivo Petrov, Kristian Minchev, Mislav Balunović, Martin Vechev. Dataset contributors (Bulgarian mathematics competitors and national team coaches): Miroslav Marinov, Maria Drencheva, Lyuba Konova, Milen Shumanov, Kaloyan Tsvetkov, Nikolay Drenchev, Lazar Todorov, Kalina Nikolova, Nikolay Georgiev, Vanesa Kalinkova, Margulan Ismoldayev