July 2025 LLM Testing Lab
Test and compare the latest commercial AI models. Real responses, real-time pricing, and performance metrics.
Select Models to Compare
OpenAI
GPT-4o
Flagship multimodal
128k context • Text, image, audio
GPT-4.1
1M context pilot
128k-1M context • Text, vision
o3 Pro
Highest accuracy
200k context • Agentic reasoning
GPT-4o Mini
Cheapest quality
128k context • Text, vision
o3
Long-context agents
200k context • Reasoning-focused
o3-mini
Entry-level o3
200k context • Low cost
Google Gemini
Gemini 2.5 Pro
Balanced accuracy
256k context • Multimodal
Gemini 2.5 Flash
Real-time agents
256k context • Fast chains
Gemini 2.5 Flash-Lite
Ultra-low cost
128k context • Production
Gemini 1.5 Pro
2M context RAG
128k-2M context • Video
Anthropic Claude
Claude 4 Opus
Deep research
200k-1M context • Enterprise
Claude 4 Sonnet
Fast automation
200k context • Balanced
Claude 3.5/3.7
Production coding
200k context • Reasoning
Claude 3 Haiku
Cheapest Claude
200k context • Near-instant
xAI Grok
Grok-4
Live-web grounding
256k context • Scientist-level
Grok-3 / 3 Mini
Stepwise reasoning
128k context • Text
Grok-2 / 2 Mini
Aurora image gen
128k context • Text + image
Perplexity
Sonar Pro
Deep research
200k context • Citations
Sonar Reasoning
Fast logic & code
128k context • Fast
pplx-70b-online
Real-time web
4k context • Web answers
Enter Your Prompt
0 models selected • Est. cost: $0.00
Upload Files (Optional)
Drag and drop files here or click to browse
Supported: Images (JPG, PNG), Documents (PDF, TXT, MD), Code files
Uploaded Files:
0 file(s) • Total size: 0 KB
File Handling Options:
Model Responses
Try These Example Prompts
July 2025 Commercial LLM Quick Reference
Quick Decision Guide
Need...
-
•
Highest raw reasoning & reliability:
o3 Pro (Jun 2025) or Claude 4 Opus (Jun 2025)
-
•
Largest context window (>1M tokens):
Gemini 1.5 Pro (2M), Claude 4 Opus enterprise
-
•
Best API/tool integration:
GPT-4o family (Assistants, function calling)
-
•
Live, citation-grounded answers:
Perplexity Sonar or pplx-70b-online
Best for...
-
•
Lowest cost at quality:
GPT-4o Mini, Gemini 2.5 Flash-Lite, Claude 3 Haiku
-
•
Real-time social/streaming:
Grok-4 (SuperGrok), Grok-3
-
•
Strict safety & compliance:
Claude 4 series
-
•
Multimodal (video) analysis:
Gemini 1.5 Pro, GPT-4o
OpenAI - GPT & o-series
Model | CTX | Modalities | Price ($/1M in/out) | Launch | Primary strengths |
---|---|---|---|---|---|
GPT-4o | 128k | Text·image·audio | 5/15 | May 2024 | Flagship multimodal, real-time chat |
GPT-4.1 | 128k→1M* | Text·vision | 3/12 | Apr 2025 | Extreme long-context (1M CTX pilot) |
o3 Pro | 200k | Text (+tools) | 20/80 | Jun 2025 | Highest accuracy + agentic reasoning |
GPT-4o Mini | 128k | Text·vision | 0.15/0.60 | Jul 2024 | Cheapest with GPT-4 class quality |
2 | Google Gemini
Model | CTX | Modalities | Price (in/out) | Launch | Best-fit use cases |
---|---|---|---|---|---|
Gemini 1.5 Pro | 128k→2M | Text·image·audio·video | Tiered | May 2024 | Huge-context RAG & video analysis |
Gemini 2.5 Pro | 256k | Same | Premium | Jun 2025 | Balanced accuracy / coding |
Gemini 2.5 Flash | 256k | Same | 0.25/1 | Jun 2025 | Real-time agent chains |
Gemini 2.5 Flash-Lite | 128k | Same | 0.10/0.40 | Jul 2025 | Ultra-low-cost production |
Gemini 1.5 Flash | 128k | Same | ≈0.15/0.60 | Jun 2024 | Interactive Q&A apps |
3 | xAI Grok
Model | CTX | Modalities | Access tier | Launch | Differentiators |
---|---|---|---|---|---|
Grok-4 | 256k | Text (+limited vision) | X Premium+/SuperGrok | Jul 2025 | Live-web grounding, scientist-level reasoning |
Grok-3 / 3 Mini | 128k | Text | X Premium+ | Feb 2025 | Stepwise reasoning modes |
Grok-2 / 2 Mini | 128k | Text + image gen | Free tier on X | Jul 2024 | Aurora image gen |
Grok-1.5 Vision | 128k | Text + vision | Legacy | Nov 2023 | First Grok with vision |
4 | Perplexity - Sonar & pplx
Model | CTX | Price (in/out) | Launch | Strengths |
---|---|---|---|---|
Sonar Pro | 200k | 1/4 (est.) | Feb 2025 | Deep multi-hop research w/ citations |
Sonar Reasoning | 128k | 0.70/3 | Feb 2025 | Fast logic & coding |
pplx-70b-online | 4k | 1/1 | Oct 2024 | Real-time web answers |
pplx-7b-online | 4k | Tiny | Aug 2024 | Edge demos & open inference |
* Est. prices rely on Perplexity's public blog statements; updated formal rate card still pending.
5 | Anthropic - Claude 4 & 3 series
Model | CTX | Price (in/out) | Launch | Sweet-spot strengths |
---|---|---|---|---|
Claude 4 Opus | 200k→1M* | 15/75 | Jun 2025 | Deep research, enterprise safety |
Claude 4 Sonnet | 200k | 3/15 | Jun 2025 | Fast automation, balanced logic |
Claude 3.5 / 3.7 | 200k | 3/15 | May 2024 | Production coding & reasoning |
Claude 3 Haiku | 200k | 0.25/1 | Mar 2024 | Cheapest Claude; near-instant |
Claude 2.1 / Instant | 200k / 100k | Legacy | Nov 2023 | Entry-level / archival |
* 1M CTX currently an enterprise-only preview.
Sources:
- • OpenAI Community & Documentation
- • Google Developers Blog
- • Anthropic Documentation
- • xAI & TechCrunch Reports
- • Perplexity Blog