Mathematics under AI: why framework fields automate first while “chaotic” fields endure

TLDR. AI will keep eating execution-heavy mathematics that sits on stable formal scaffolding – think large chunks of algebraic geometry or homological algebra once the objects are defined in Lean. The parts that resist are the ones where progress depends on inventing the invariant, the weight, the parametrix, or the gluing scheme – contest combinatorics and frontier nonlinear dispersive or hyperbolic PDE in the Klainerman tradition. Recent IMO results and the current state of formal libraries make this split visible. (Google DeepMind)


1) What the public data already shows

Olympiads: the geometry boom, the combinatorics pothole.
In 2024, DeepMind reached silver at the IMO by solving algebra, number theory, and geometry, but left both combinatorics problems unsolved. In 2025, an advanced Gemini with Deep Think hit gold-level scoring – 35 out of 42 by solving five of six problems within the 4.5 hour window – yet independent evaluations document that the miss was the hard combinatorics Problem 6. The official statistics confirm that P6 was by far the steepest problem, with an average of 0.184 out of 7. This is the pattern: strong where stable playbooks exist, brittle when the invariant must be invented. (Google DeepMind)

Geometry is the outlier in AI’s favor.
AlphaGeometry 2 reports an 84 percent solve rate on two and a half decades of IMO geometry, surpassing average gold-medalist performance. A synthetic language plus a symbolic back end makes the search space smooth enough to scale. (Nature)

Formal libraries: where Lean is already deep.
Perfectoid spaces were defined in Lean in 2019–20. The Liquid Tensor Experiment was completed in 2022, machine-checking a core theorem in condensed mathematics. Derived categories have since been formalized in mathlib. The current mathlib stats sit well into six figures for theorems and definitions – a massive surface area for AI copilots and auto-formalization pipelines to leverage. (arXiv)

Analysis tooling is catching up but still thin.
Even a day-one PDE workhorse – the Gagliardo–Nirenberg–Sobolev inequality – was only formalized in 2024, alongside new infrastructure for iterated integration. That is progress, but it also exposes how much analytic plumbing still needs to be built before “automate the PDE proof” becomes a routine button. (DROPS)


2) The mechanism: template density, global glue, and scaffolding externalities

Template density.
AI thrives where there is a stable language of moves and where search is rewarding. Euclidean geometry gives you exactly that: expressive synthetic primitives, algebraic fallbacks, and fast symbolic engines. Combinatorics and many analytic olympiad problems often hinge on inventing the right potential, charging scheme, or structure theorem. Without the invariant, naive search wanders. The last two IMOs are a clean demonstration. (arXiv)

Global vs local.
Local steps are uniform; global gluing is not. That is Klainerman’s point. Small changes in an equation can flip qualitative behavior, and the passage from local control to global regularity typically forces equation-specific weights, monotonicity identities, commutators, and profile decompositions. Templates crack at the global step. (Princeton Math)

Scaffolding externalities.
Mathlib’s breadth compounds. Once an object exists – perfectoids, derived categories, condensed tools – diagram chases and long exact sequences are cheap to formalize and therefore cheap to automate. In analysis and PDE, basic infrastructure has only recently landed, so the automation payoff is lower and patchier. (arXiv)


3) Why “chaotic” areas of math are durable

Klainerman’s diagnosis.
There is no single uniform theory of PDE that takes you from local to global. The subject resists unification because tiny modifications in structure yield different global behaviors, and because PDE is entangled with geometry and physics in ways that frustrate clean axiomatization. If you must invent the norms, the weights, and the gluing strategy per equation, you have low template density and sparse rewards for search – bad terrain for current LLM-plus-tool hybrids. (JHU Mathematics)

Concrete PDE novelties are bespoke by design.

  • Concentration-compactness plus rigidity: a now-classic global classification strategy that still needs equation-specific profiles, monotonicity, and “channels of energy” to kill the critical element. (Annals of Mathematics)
  • Interaction Morawetz and the I-method: engineered functionals and energy-increment tricks that must be retuned for each geometry and nonlinearity. (Oxford Academic)
  • Space–time resonance: a decay and modified scattering framework whose decisive cuts depend on the exact dispersion relation and resonant set geometry. (arXiv)
  • Parametrices and decay on rough or curved backgrounds: from variable-coefficient Strichartz to black hole “red-shift” vector fields, the estimates are geometry-specific. (Department of Mathematics)

Contest combinatorics shows the same failure mode.
The IMO 2025 Problem 6 was a grid-tiling combinatorics problem. Official stats show it as a cliff – average 0.184, only six perfect scores. At least one independent Gemini 2.5 Pro evaluation reports that P6 was the lone miss even when the rest were solved. The invariant had to be invented; it did not pop out of a stock playbook. (Art of Problem Solving)


4) Why framework-heavy areas automate first

AG and homological algebra ride the library.
Once the category is specified and the functors are in view, large parts of the work reduce to exactness, base change, spectral sequences, and diagram chasing. Because mathlib already carries perfectoids, condensed mathematics artifacts from the Liquid Tensor Experiment, and derived categories with a triangulated structure, AI formalizers can keep pushing. The novelty that remains human-hard is concept creation – new categories or cohomology theories – not the bookkeeping. (arXiv)

Auto-formalization pipelines are converging on these areas.
LeanDojo and follow-ons expose mathlib as a gym for retrieval-augmented proving, while recent datasets translate or align Lean statements with natural language at scale. As the natural-language-to-Lean channel strengthens, fields that already sit in mathlib’s center of mass benefit first. (arXiv)


5) Addressing the obvious counterpoints

“But aren’t AI systems already ‘gold’ on Olympiad math?”
Yes, and that is a genuine milestone. But the performance is jagged – very strong in niches with robust playbooks and still brittle on problems where the invariant or global strategy must be invented. DeepMind’s own leadership has been explicit about this inconsistency problem. (Google DeepMind)

“Won’t formal PDE libraries close the gap quickly?”
They will reduce friction on local steps: IBP and commutator bookkeeping, standard Sobolev and interpolation, TT* for linear dispersive estimates in fixed geometries. That is great. It does not replace the step where you design the bespoke multiplier, Lyapunov functional, resonant decomposition, or parametrix that makes the global argument go. The recent GNS formalization shows how much groundwork is still being laid. (DROPS)


6) A short forecast for the next 2 to 5 years

  • Algebraic geometry – execution accelerates. Expect faster automation of derived-category and spectral-sequence work as mathlib deepens. Humans push the frontier by inventing new frameworks; machines grind the derivations. (Annals of Formalized Mathematics)
  • Combinatorics – uneven gains. Subclasses that admit encodings – flows, matchings, SAT or ILP reductions – will move. P6-style invariant-invention problems remain spiky. (Art of Problem Solving)
  • Nonlinear dispersive or hyperbolic PDE – durable at the frontier. As more Strichartz or Carleman machinery lands, AI will eat routine subroutines. Framework creation and global gluing stay human-led. Klainerman’s “no uniform theory” remains the right mental model. (JHU Mathematics)

7) Practical playbook for researchers

Lean into framework creation.

  • In PDE: invent the functional architecture – weights, null structures, resonance decompositions, parametrices on rough or curved backgrounds – then let tools audit the grind. (Department of Mathematics)
  • In AG: invent the next category or cohomology theory, then publish a clean formal package that mathlib can absorb. The Liquid Tensor Experiment is the precedent for how powerful that can be. (Lean Community)

Exploit AI where it is strong.

  • For combinatorics: use AI for counterexample search and small‑n experiments, or to test candidate invariants – not to conjure the invariant from scratch.
  • For PDE: offload integration by parts, coercivity checks under hypotheses, and standard embeddings or interpolations as the analytic library fills in. The 2024 GNS work is the canary here. (DROPS)

Design “formalizable kits,” not just single proofs.
When you find a successful method – a ghost weight, an interaction Morawetz variant, a resonance cut – publish a minimal kit that can be ported. Turning inventions into re-usable modules is how human novelty becomes tomorrow’s template. (Oxford Academic)


8) Context: forecasts vs ground truth

High-profile predictions – for example, Eric Schmidt’s claim that most programmers will be replaced within a year and that graduate-level AI mathematicians are imminent – make headlines. The contest and library evidence above provides a more granular picture: rapid progress where playbooks and scaffolding exist, jagged performance elsewhere. Treat the claims as directional pressure, not as a calendar. (Quote Investigator)


Closing claim

Automation follows scaffolding. Where the grammar of the field is already encoded – algebraic geometry’s categories and functors, geometry’s synthetic calculus – AI scales quickly. Where progress depends on inventing the right invariant or functional setting and sewing local truths into a global one – combinatorics at its hardest and Klainerman-style nonlinear PDE – current AI lacks the handles. The playbooks are thin, the rewards sparse, and the decisive moves are still human-made. That is why these areas are the near-term refuges – and opportunities – for deep mathematical creativity. (arXiv)


Sources and further reading

  • DeepMind on IMO 2024 and 2025 performance; Reuters roundup of 2025 gold claims. (Google DeepMind)
  • Official IMO 2025 problem statement and statistics. (Art of Problem Solving)
  • AlphaGeometry 2 geometry results – Nature news and arXiv. (Nature)
  • Mathlib depth, derived categories, and library statistics. (Annals of Formalized Mathematics)
  • Gagliardo–Nirenberg–Sobolev in Lean and iterated integration framework. (DROPS)
  • Perfectoid spaces in Lean; Liquid Tensor Experiment completion. (arXiv)
  • Klainerman on why PDE resists uniformization; “small changes” caution. (JHU Mathematics)
  • Space–time resonance, interaction Morawetz, and I‑method exemplars. (arXiv)
  • Dispersive and decay estimates on rough or curved backgrounds. (Department of Mathematics)
  • Auto-formalization pipelines and datasets: LeanDojo, Herald, FormL4, FormalAlign, and scaling proofs in Lean. (arXiv)


Leave a comment