Google DeepMind’s “Superhuman Reasoning Team” has disclosed several research projects:
- AlphaGeometry: Automatically complete geometric proof questions
- AlphaGeometry2: Achieve International Mathematical Olympics (IMO) Silver Medal Level
- IMO Bench: A mathematics evaluation benchmark launched after winning the IMO gold medal in 2025, containing more than 400 questions for AI testing
- Aletheia: An AI agent capable of verifying and optimizing mathematical problem solving processes
All projects are open using Apache 2.0 and CC-BY open source protocols.
There is one thing that many people don’t actually realize: these projects recently disclosed by Google DeepMind are not “making AI that can do better problems”, but are quietly rewriting something more basic–How machines reason。
When people first saw AlphaGeometry, it was easy to think of it as a “mathematical problem solver.” It can indeed complete geometric proof and even approach or even surpass human players on some issues. But if you just stop here, you will misunderstand its meaning. This system does not simply “learn a problem-solving routine”. It is more like trying to translate the geometric reasoning process developed by humans over hundreds of years into a language that machines can stably execute. Not imitation, but reconstruction.
This reconstruction becomes more evident in AlphaGeometry 2. When it is said to have reached IMO Silver Level, what is really noteworthy is not the “scores”, but its ability to handle longer, more complex chains of reasoning that are closer to the limits of human thinking. The change here is not a quantitative accumulation, but a qualitative leap: reasoning no longer develops mechanically step by step, but begins to take on a structure similar to “strategic choice”-when to build auxiliary lines, when to transform problems, when to give up A path starts over. These behaviors, which were originally thought to be highly intuitive, began to be systematically captured by machines.
But if you only “know how to do questions”, this system is actually dangerous. Because any complex reasoning will not be credible once it cannot be verified. This is why DeepMind also launched IMO Bench. It is not a simple question bank, but a “stress testing environment”. The significance of more than 400 questions lies not in the number, but in the coverage: different difficulty levels, different structures, and different reasoning models. It is not asking “Can you solve certain questions”, but “Is your reasoning ability stable?”
Then it was Aletheia’s turn to appear. This step is the part of the entire system closest to the “future”. It doesn’t solve problems, it does another, more critical thing:Check if the reasoning holds true and correct it if necessary。This may sound like an auxiliary tool, but in fact, what it changes is the closed-loop structure of reasoning. In the past, most AI stopped at “generating answers”; now, this system has begun to have the ability to “generate-verify-correct” cycle. Once this cycle is stable, it means that the machine can not only reason, but also be responsible for its own reasoning.
Put these things together, and a clearer outline is emerging: AlphaGeometry is building “problem-solving capabilities”, AlphaGeometry 2 is approaching “human limits”, IMO Bench is defining “metrics”, and Aletheia is building “trustworthy mechanisms.” They are not discrete projects, but a gradually closing system.
This also explains why DeepMind uses the word “superhuman” to describe this direction. The key is not that machines are “faster than humans” or “more accurate than humans”, but that machines begin to have an inference ability that can be verified, repeated, and extended. Once this ability is stable, it no longer relies on individual talents, nor is it limited by fatigue, emotions, or experience.
Here’s where the question really becomes interesting: When reasoning itself can be industrially produced and verified, will there be structural changes in the field where humans have long relied on a “few geniuses”? Mathematics is just a starting point. Physics, engineering, and even complex system design may all enter the same trajectory.
From this perspective, the significance of these projects is not that “AI can do geometry problems”, but that it lies in another more direct thing:
Reasoning was treated as an engineerable ability for the first time.
Github:https://github.com/google-deepmind/superhuman
Oil tubing: