LIVE — Updated June 13, 2026
DATA: llm-stats.com API GitHub ↗
◈ Scientific Instrument v0.1.0

AGI.CLOCK

Tracking humanity's last invention — in real time.

COMPOSITE INDEX
100%
of expert-human baseline
SYS://AGI_MONITOR_v2
◉ REC
AGI INDEX0.0%OF EXPERT HUMAN0255075100
✓ ALL THRESHOLDS EXCEEDED
AGI INDEX: 100%
All 6 benchmarks surpass expert-human baseline
AI systems have reached or surpassed expert-human performance on every tracked benchmark simultaneously. This represents our operational definition of AGI capability threshold.
BENCHMARKS TRACKED
6
ALREADY EXCEEDED
6 / 6
MONTHS FROM NOW
threshold reached
STATUS
REACHED
all benchmarks ✓
BENCHMARK BREAKDOWN
REASONING EXCEEDED ✓

ARC-AGI v2

General fluid intelligence via abstract pattern tasks. Closest proxy to true general reasoning.

AI BEST
85%
EXPERT HUMAN
85%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
ENGINEERING EXCEEDED ✓

SWE-Bench

Real-world GitHub bug fixes requiring full codebase understanding and autonomous execution.

AI BEST
95%
EXPERT HUMAN
92%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
MATHEMATICS EXCEEDED ✓

FrontierMath

Expert-crafted mathematical problems beyond current textbooks. AI already exceeds expert humans.

AI BEST
47.6%
EXPERT HUMAN
5%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
SCIENCE EXCEEDED ✓

GPQA Diamond

PhD-level questions in biology, chemistry, and physics written and validated by domain experts.

AI BEST
94.6%
EXPERT HUMAN
69.3%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
KNOWLEDGE EXCEEDED ✓

MMLU-Pro

Professional-level knowledge across 57 academic domains. Harder variant of original MMLU.

AI BEST
89.6%
EXPERT HUMAN
85%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
CODING EXCEEDED ✓

LiveCodeBench

Competitive programming problems from Codeforces, LeetCode, AtCoder — contamination-free.

AI BEST
91.6%
EXPERT HUMAN
78%
100.0% — THRESHOLD EXCEEDED
SOURCE: llm-stats.com API
HISTORICAL TRAJECTORY

AGI INDEX OVER TIME

Jan 2023 → Projected 2028

ACTUAL
PROJECTED
METHODOLOGY
01 / BENCHMARK SELECTION

Six benchmarks covering orthogonal dimensions of intelligence: abstract reasoning (ARC-AGI), real-world engineering (SWE-Bench), mathematics (FrontierMath), scientific knowledge (GPQA), broad knowledge (MMLU-Pro), and coding (LiveCodeBench).

02 / EXPERT HUMAN THRESHOLD

Each benchmark has a published "expert human" baseline — PhD researchers, senior engineers, competitive programmers. AGI is defined as surpassing this threshold across all tracked benchmarks simultaneously.

03 / COMPOSITE INDEX

Weighted average of per-benchmark progress (capped at 100%). Weights reflect uniqueness and replacement difficulty. Data fetched daily from the llm-stats.com API.

OPEN SOURCE — MIT LICENSE

Contributions, corrections, and new benchmark proposals welcome.

★ Star on GitHub