Complete LLM Pricing Comparison (February 2026)
All prices per 1 million tokens. Sorted by input price from cheapest to most expensive.
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| DeepSeek V3Cheapest | DeepSeek | $0.07 | $0.28 | 64K |
| Gemini 3 FlashBest Value | $0.075 | $0.30 | 2M | |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | 128K |
| Claude Haiku 4.5 | Anthropic | $0.25 | $1.25 | 200K |
| o3-mini | OpenAI | $1.10 | $4.40 | 200K |
| Gemini 3 Pro | $1.25 | $5.00 | 2M | |
| DeepSeek R1 | DeepSeek | $2.00 | $8.00 | 128K |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 200K |
| o3 | OpenAI | $10.00 | $40.00 | 200K |
| Claude Opus 4.5 | Anthropic | $12.00 | $60.00 | 200K |
| GPT-5 | OpenAI | $15.00 | $45.00 | 256K |
Best LLM by Use Case
Budget Tasks
- DeepSeek V3. Cheapest overall
- Gemini 3 Flash. Best context size
- GPT-4o-mini. Most reliable
Coding
- Claude Opus 4.5. Best overall
- Claude Sonnet 4.5. Best value
- DeepSeek R1. Budget option
Long Documents
- Gemini 3 Pro, 2M context
- GPT-5, 256K context
- Claude Opus, 200K context
Chat / Conversation
- GPT-4o. Best quality
- Claude Sonnet. Balanced
- Gemini Flash. Budget
How to Reduce Your LLM Costs
#1: Right-Size Your Model
Use smaller models for simple tasks. GPT-4o-mini or Claude Haiku can handle 80% of tasks at 10-100x lower cost than flagship models.
#2: Use Prompt Caching
Claude offers 90% discount on cached tokens. OpenAI offers 50%. If you have repeated system prompts, enable caching.
#3: Optimize Prompts
Remove unnecessary whitespace, be concise, and don't repeat instructions. Every token costs money.
Frequently Asked Questions
What is the cheapest AI model in 2026?
DeepSeek V3 at $0.07/1M tokens and Gemini 3 Flash at $0.075/1M are the cheapest. For reliable quality, GPT-4o-mini ($0.15/1M) and Claude Haiku ($0.25/1M) are excellent budget options.
How do I reduce my LLM API costs?
1) Use smaller models for simple tasks. 2) Enable prompt caching. 3) Optimize prompts. 4) Use batch APIs for non-urgent work. 5) Consider open-source models for high volume.
GPT-5 vs Claude Opus, which is better?
Both are flagship models. Claude Opus is better for coding and cheaper for inputs ($12 vs $15/1M). GPT-5 is cheaper for outputs ($45 vs $60/1M) and has larger context (256K vs 200K).
Which LLM has the largest context window?
Gemini 3 Pro and Flash have the largest at 2 million tokens. GPT-5 offers 256K, Claude models offer 200K. For very long documents, Gemini is the best choice.