AI GPT — LLM Analytics & Leaderboard

LLM Leaderboard — May 2026

weighted: MMLU-Pro 25% · HumanEval 20% · MATH 15% · GPQA 15% · IFEval 10% · Arena ELO 15%

#	Model	Provider	MMLU-Pro	HumanEval	MATH	GPQA	Arena ELO	Overall
1	Op Claude Opus 4.7	Anthropic	89.2	95.1	93.7	71.4	1328	90.6	NEW
2	4o GPT-4o (2026-04)	OpenAI	88.7	94.3	92.1	69.8	1314	89.3	+1.2
3	Ge Gemini 2.5 Pro	Google DeepMind	87.9	93.0	90.4	68.3	1302	87.8
4	D DeepSeek V3.1	DeepSeek	85.2	91.7	89.3	64.9	1287	85.6	+0.8
5	Qw Qwen 3-Max	Alibaba	83.8	89.5	87.9	63.1	1268	83.5
6	Ll Llama 4 Maverick	Meta AI	81.9	88.2	84.8	60.7	1244	81.3
7	Gr Grok 3	xAI	80.6	87.3	83.5	59.4	1228	79.8
8	Mi Mistral Large 3	Mistral AI	78.4	85.9	81.2	57.8	1206	77.6

AI Company Rankings

by research output + product influence

#	Company	Papers (12mo)	Flagship Models	Score
1	Google DeepMind	184	Gemini, Gemma, AlphaFold	94.2
2	OpenAI	37	GPT-4o, o3, Sora	92.8
3	Anthropic	56	Claude Opus, Sonnet, Haiku	89.5
4	Meta AI	122	Llama 4, SAM, Code Llama	85.1
5	DeepSeek	28	DeepSeek V3, R1, Coder	82.7
6	Alibaba	91	Qwen 3, Qwen-VL, Tongyi	79.3
7	Mistral AI	31	Mistral Large, Codestral	74.6

Trending AI Repos — This Week

GitHub ★

#	Repository	Stars	Δ
1	deepseek-ai/DeepSeek-V3 Official impl + weights	78.4K	+12.3K	HOT
2	anthropics/claude-code CLI agentic coding tool	64.1K	+9.8K	HOT
3	langchain-ai/langgraph Agent orchestration framework	38.2K	+4.1K
4	QuivrHQ/quivr OSS RAG second brain	41.7K	+3.6K
5	microsoft/autogen Multi-agent conversation framework	45.0K	+3.2K
6	THUDM/ChatGLM-5 Bilingual open LLM	52.3K	+2.9K
7	openai/whisper Robust speech recognition	77.9K	+2.4K	TOP

AI Dev Tools — Top Rated

community votes

#	Tool	Category	Rating
1	Claude Code Anthropic	Agentic IDE	9.6
2	Cursor Anysphere	AI Editor	9.4
3	GitHub Copilot Microsoft	Code Completion	9.1
4	v0 by Vercel Vercel	UI Generation	8.9
5	Aider Paul Gauthier	CLI Pair Programmer	8.7
6	Continue Continue Dev	IDE Extension	8.5
7	Windsurf Codeium	AI Editor	8.4

Benchmark Progress — Last 18 Months

MMLU-Pro top score trend

Date	Top Model	MMLU-Pro	HumanEval	GPQA
May 2026	Claude Opus 4.7	89.2	95.1	71.4
Mar 2026	GPT-4o (2026-01)	87.3	93.5	68.9
Jan 2026	Claude Opus 4.5	85.8	92.7	66.3
Nov 2025	Gemini 2.0 Pro	83.1	90.8	63.7
Sep 2025	GPT-4o (2025-08)	81.2	89.4	61.0
Jul 2025	Claude Sonnet 3.5	78.6	87.9	58.2
May 2025	Llama 3.1 405B	75.3	84.6	54.8