The Singularity Gate

A benchmark for AI-driven paradigm-shifting scientific discovery

Tests whether AI can discover new paradigm-shifting science — published after its training cutoff.

Frontier-model evaluation

The Singularity Gate

Score Progression: Overall

Score = Prediction Accuracy: 0–100%

All scores are partial credit — no model has fully predicted any item on this corpus.

Reasoning effort: Opus 4.6 / Opus 4.7 / Sonnet 4.6 = max · Gemini 3.1 Pro = high · GPT-5.5 = xhigh.

Native harness with tool use, web search disabled: Claude Code (Claude models) · Gemini CLI (Gemini 3.1 Pro) · Codex (GPT-5.5).

Read the paper → Methodology

Leaderboard

Per-field breakdown

The headline ranking aggregates over five broad scientific fields. Opus 4.7 leads in four of five; GPT-5.5 leads in Physics & Astronomy by the widest single-field margin.

Reasoning effort: Opus 4.6 / Opus 4.7 / Sonnet 4.6 = max · Gemini 3.1 Pro = high · GPT-5.5 = xhigh.