Key Highlights
- Google has introduced its eighth-generation tensor processing units: TPU 8t designed for model training and TPU 8i optimized for inference workloads
- The inference-focused TPU 8i achieves 80% superior cost-performance compared to the preceding Ironwood generation
- Broadcom partnered with Google to co-engineer both processors, extending their decade-long collaboration
- The TPU 8t training processor supports configurations up to 9,600 chips with double the inter-chip communication bandwidth of Ironwood
- Google Cloud will make both new processors accessible to customers in the coming months
Google revealed a pair of specialized AI processors on Wednesday, representing the first time the company has divided its tensor processing unit architecture into distinct chips for training and inference operations.
The TPU 8t addresses the computational demands of training artificial intelligence models, while its counterpart, the TPU 8i, focuses exclusively on inference—deploying trained models in live environments. Broadcom collaborated with Google on both chips, building on a technological partnership spanning more than ten years.
This release represents a strategic departure from Google’s previous approach. Earlier TPU iterations combined training and inference capabilities within a single processor design. According to Google, the emergence of agentic AI systems—which operate in autonomous cycles with minimal human oversight—necessitates hardware tailored to specific functions.
“With the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving,” said Amin Vahdat, Google’s SVP and chief technologist for AI and infrastructure.
The inference-oriented TPU 8i packs 384 megabytes of SRAM memory per processor—a threefold increase over Ironwood’s capacity. Google claims this architectural enhancement eliminates what the company describes as the “waiting room” phenomenon, reducing the latency spikes that occur when multiple users simultaneously query a model.
Inference Chip Delivers Substantial Performance Gains
Compared to Ironwood, the TPU 8i provides 80% better cost-efficiency metrics. In operational terms, this means organizations can support approximately double the workload while maintaining identical expenses.
The chip also achieves up to twice the energy efficiency of its predecessor, utilizing adaptive power management technology that modulates consumption based on real-time demand patterns.
For the first time, both processors leverage Google’s Axion CPU architecture as the host platform, enabling optimizations at the system level rather than solely at the chip level.
Regarding training capabilities, the TPU 8t superpod configuration supports deployments of up to 9,600 processors with access to 2 petabytes of high-bandwidth memory. The architecture features double the inter-chip communication bandwidth found in Ironwood, enabling Google to compress frontier model development cycles from several months down to a matter of weeks.
The training chip delivers 2.8 times the computational performance of the seventh-generation Ironwood at an identical price point.
Early Adopters and Market Validation
The technology is gaining traction among enterprise and research organizations. Citadel Securities developed quantitative analysis tools using Google’s TPU infrastructure. All 17 research facilities in the U.S. Department of Energy’s national laboratory network operate AI research software on the platform. Anthropic has pledged to utilize multiple gigawatts of Google TPU computing capacity.
Analysts at DA Davidson published an estimate in September valuing the combined TPU business and Google DeepMind division at approximately $900 billion.
Google maintains an exclusive distribution model for TPUs—the processors are not sold as standalone hardware but are only accessible through Google Cloud services. Nvidia continues supplying GPU processors to Google, and the company confirmed it will be among the inaugural cloud platforms offering Nvidia’s forthcoming Vera Rubin architecture later this year.
Google DeepMind contributed to the chip design process, having utilized the new processors to train Gemini language models and optimize algorithms powering Search and YouTube platforms.
Google announced that both the TPU 8t training chip and TPU 8i inference chip will reach general availability for cloud customers before the end of this year.



