AGI Clock — Tracking Humanity's Last Invention

SYS://AGI_MONITOR_v2

◉ REC

✓ ALL THRESHOLDS EXCEEDED

AGI INDEX: 100%

All 6 benchmarks surpass expert-human baseline

AI systems have reached or surpassed expert-human performance on every tracked benchmark simultaneously. This represents our operational definition of AGI capability threshold.

BENCHMARKS TRACKED

6

ALREADY EXCEEDED

6 / 6

MONTHS FROM NOW

—

threshold reached

STATUS

REACHED

all benchmarks ✓

REASONING EXCEEDED ✓

ARC-AGI v2

General fluid intelligence via abstract pattern tasks. Closest proxy to true general reasoning.

AI BEST

85%

EXPERT HUMAN

85%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

ENGINEERING EXCEEDED ✓

SWE-Bench

Real-world GitHub bug fixes requiring full codebase understanding and autonomous execution.

AI BEST

95%

EXPERT HUMAN

92%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

MATHEMATICS EXCEEDED ✓

FrontierMath

Expert-crafted mathematical problems beyond current textbooks. AI already exceeds expert humans.

AI BEST

47.6%

EXPERT HUMAN

5%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

SCIENCE EXCEEDED ✓

GPQA Diamond

PhD-level questions in biology, chemistry, and physics written and validated by domain experts.

AI BEST

94.6%

EXPERT HUMAN

69.3%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

KNOWLEDGE EXCEEDED ✓

MMLU-Pro

Professional-level knowledge across 57 academic domains. Harder variant of original MMLU.

AI BEST

89.6%

EXPERT HUMAN

85%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

CODING EXCEEDED ✓

LiveCodeBench

Competitive programming problems from Codeforces, LeetCode, AtCoder — contamination-free.

AI BEST

91.6%

EXPERT HUMAN

78%

100.0% — THRESHOLD EXCEEDED

SOURCE: llm-stats.com API

AGI INDEX OVER TIME

Jan 2023 → Projected 2028

ACTUAL

PROJECTED

01 / BENCHMARK SELECTION

Six benchmarks covering orthogonal dimensions of intelligence: abstract reasoning (ARC-AGI), real-world engineering (SWE-Bench), mathematics (FrontierMath), scientific knowledge (GPQA), broad knowledge (MMLU-Pro), and coding (LiveCodeBench).

02 / EXPERT HUMAN THRESHOLD

Each benchmark has a published "expert human" baseline — PhD researchers, senior engineers, competitive programmers. AGI is defined as surpassing this threshold across all tracked benchmarks simultaneously.

03 / COMPOSITE INDEX

Weighted average of per-benchmark progress (capped at 100%). Weights reflect uniqueness and replacement difficulty. Data fetched daily from the llm-stats.com API.

OPEN SOURCE — MIT LICENSE

Contributions, corrections, and new benchmark proposals welcome.

★ Star on GitHub