Key Highlights
- Google introduced Flex and Priority service tiers for the Gemini API
- Flex tier provides 50% cost reduction for non-urgent, background processing
- Priority tier costs 75–100% more, delivering enhanced reliability for time-critical applications
- Batch API continues offering 50% savings with latency up to 24 hours
- Caching tier uses token-based pricing tied to storage time
On April 2, Google announced a significant update to its Gemini API pricing structure, introducing five separate service tiers: Standard, Flex, Priority, Batch, and Caching. This enhancement provides developers with greater flexibility to optimize their applications based on performance requirements, budget constraints, and urgency levels.
The newly introduced Flex tier targets background operations where immediate responses aren’t essential. By leveraging off-peak computing resources, it delivers 50% cost savings compared to standard pricing. Response times typically range between 1 and 15 minutes, though Google provides no guarantees. Ideal applications include CRM data synchronization, computational research tasks, and autonomous agent workflows.
What distinguishes Flex from Google’s current Batch API is its synchronous endpoint architecture. Developers can avoid the complexity of managing file inputs/outputs or monitoring job completion status, while still achieving identical cost benefits.
Conversely, the Priority tier addresses mission-critical, real-time requirements. With pricing 75% to 100% above standard rates, it ensures maximum reliability and rapid response times measured in milliseconds to seconds.
Google positions Priority as ideal for interactive customer service applications, real-time fraud prevention systems, and automated content filtering workflows. When Priority tier usage surpasses allocated limits, excess requests automatically route to the Standard tier instead of failing completely.
Complete Tier Overview
The previously available Batch API continues operating at 50% below standard pricing, accommodating latency periods extending to 24 hours. This option suits extensive offline processing scenarios where timing isn’t critical.
The Caching tier employs pricing calculated from token volume and content retention duration. Google identifies optimal use cases as conversational agents with extensive system prompts, recurring analysis of large multimedia files, or searches across substantial document collections.
Both Flex and Priority tiers utilize an identical service_tier parameter within API calls. Developers can switch between tiers through simple configuration adjustments, with API responses confirming which tier processed each request.
Flex accessibility extends to all paid tier users for GenerateContent and Interactions API calls. Priority availability restricts to Tier 2 and Tier 3 paid accounts on identical endpoints.
Developer Benefits
The standardized interface represents the most significant advancement from this update. Previously, supporting both background and interactive operations required developers to maintain separate synchronous and asynchronous system architectures. The new structure consolidates both workload types through unified synchronous endpoints.
Google positioned this enhancement as supporting its broader AI agent development strategy, acknowledging that these systems frequently require simultaneous handling of both low-priority background tasks and time-sensitive interactive operations.
Gemini API product manager Lucia Loher and engineering lead Hussein Hassan Harrirou announced these changes on April 2, 2026.



