| # | Model | Provider | MMLU-Pro | HumanEval | MATH | GPQA | Arena ELO | Overall | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Op Claude Opus 4.7 |
Anthropic | 89.2 | 95.1 | 93.7 | 71.4 | 1328 | 90.6 | NEW |
| 2 | 4o GPT-4o (2026-04) |
OpenAI | 88.7 | 94.3 | 92.1 | 69.8 | 1314 | 89.3 | +1.2 |
| 3 | Ge Gemini 2.5 Pro |
Google DeepMind | 87.9 | 93.0 | 90.4 | 68.3 | 1302 | 87.8 | |
| 4 | D DeepSeek V3.1 |
DeepSeek | 85.2 | 91.7 | 89.3 | 64.9 | 1287 | 85.6 | +0.8 |
| 5 | Qw Qwen 3-Max |
Alibaba | 83.8 | 89.5 | 87.9 | 63.1 | 1268 | 83.5 | |
| 6 | Ll Llama 4 Maverick |
Meta AI | 81.9 | 88.2 | 84.8 | 60.7 | 1244 | 81.3 | |
| 7 | Gr Grok 3 |
xAI | 80.6 | 87.3 | 83.5 | 59.4 | 1228 | 79.8 | |
| 8 | Mi Mistral Large 3 |
Mistral AI | 78.4 | 85.9 | 81.2 | 57.8 | 1206 | 77.6 |
| # | Company | Papers (12mo) | Flagship Models | Score |
|---|---|---|---|---|
| 1 | Google DeepMind | 184 | Gemini, Gemma, AlphaFold | 94.2 |
| 2 | OpenAI | 37 | GPT-4o, o3, Sora | 92.8 |
| 3 | Anthropic | 56 | Claude Opus, Sonnet, Haiku | 89.5 |
| 4 | Meta AI | 122 | Llama 4, SAM, Code Llama | 85.1 |
| 5 | DeepSeek | 28 | DeepSeek V3, R1, Coder | 82.7 |
| 6 | Alibaba | 91 | Qwen 3, Qwen-VL, Tongyi | 79.3 |
| 7 | Mistral AI | 31 | Mistral Large, Codestral | 74.6 |
| # | Repository | Stars | Δ | |
|---|---|---|---|---|
| 1 | deepseek-ai/DeepSeek-V3 Official impl + weights |
78.4K | +12.3K | HOT |
| 2 | anthropics/claude-code CLI agentic coding tool |
64.1K | +9.8K | HOT |
| 3 | langchain-ai/langgraph Agent orchestration framework |
38.2K | +4.1K | |
| 4 | QuivrHQ/quivr OSS RAG second brain |
41.7K | +3.6K | |
| 5 | microsoft/autogen Multi-agent conversation framework |
45.0K | +3.2K | |
| 6 | THUDM/ChatGLM-5 Bilingual open LLM |
52.3K | +2.9K | |
| 7 | openai/whisper Robust speech recognition |
77.9K | +2.4K | TOP |
| # | Tool | Category | Rating | |
|---|---|---|---|---|
| 1 | Claude Code Anthropic |
Agentic IDE | 9.6 | |
| 2 | Cursor Anysphere |
AI Editor | 9.4 | |
| 3 | GitHub Copilot Microsoft |
Code Completion | 9.1 | |
| 4 | v0 by Vercel Vercel |
UI Generation | 8.9 | |
| 5 | Aider Paul Gauthier |
CLI Pair Programmer | 8.7 | |
| 6 | Continue Continue Dev |
IDE Extension | 8.5 | |
| 7 | Windsurf Codeium |
AI Editor | 8.4 |
| Date | Top Model | MMLU-Pro | HumanEval | GPQA |
|---|---|---|---|---|
| May 2026 | Claude Opus 4.7 | 89.2 | 95.1 | 71.4 |
| Mar 2026 | GPT-4o (2026-01) | 87.3 | 93.5 | 68.9 |
| Jan 2026 | Claude Opus 4.5 | 85.8 | 92.7 | 66.3 |
| Nov 2025 | Gemini 2.0 Pro | 83.1 | 90.8 | 63.7 |
| Sep 2025 | GPT-4o (2025-08) | 81.2 | 89.4 | 61.0 |
| Jul 2025 | Claude Sonnet 3.5 | 78.6 | 87.9 | 58.2 |
| May 2025 | Llama 3.1 405B | 75.3 | 84.6 | 54.8 |