AI 训练专用 ASIC 完整指南

AI 训练 ASIC（Application-Specific Integrated Circuit）是专为 AI 训练优化的定制芯片，区别于通用 GPU。它们通过牺牲灵活性换取更高能效比和更优单位算力成本。

主流 AI 训练 ASIC 对比

型号	厂商	制程	算力 (BF16)	显存	互联	提供方式
TPU 8t (Trillium 2 训练)	Google	3nm	~3,500 TFLOPS	216GB HBM	3D Torus + Axion CPU	Google Cloud
TPU 8i (Trillium 2 推理)	Google	3nm	~5,500 TFLOPS	288GB HBM	3D Torus	Google Cloud
Google TPU v7 (Ironwood)	Google	5nm	2,307 TFLOPS	192GB HBM	3D Torus, 9,216 Pod	Google Cloud
Google TPU v6e (Trillium)	Google	5nm	918 TFLOPS	32GB HBM	2D Torus, 256 Pod	Google Cloud
Google TPU v5p	Google	5nm	459 TFLOPS	95GB HBM	3D Torus, 8,960 Pod	Google Cloud
AWS Trainium 3 (Trn3)	Amazon	3nm	1,300 TFLOPS	144GB HBM	NeuronLink-v4, 144 UltraServer	AWS Cloud (2025-12 GA)
AWS Trainium 2	Amazon	4nm	667 TFLOPS	96GB HBM	NeuronLink, 64 UltraServer	AWS Cloud
AWS Trainium 1	Amazon	7nm	191 TFLOPS	32GB HBM	NeuronLink, 16 集群	AWS Cloud
Intel Gaudi 3	Intel	5nm	1,835 TFLOPS	128GB HBM2e	24× 200GbE	商用
Intel Gaudi 2	Intel	7nm	432 TFLOPS	96GB HBM2e	24× 100GbE	商用

代	名称	算力 (BF16)	HBM	互联	主要用途
v4	—	275 TFLOPS	32GB	3D Torus	训练
v5p	—	459 TFLOPS	95GB	3D Torus	训练
v5e	—	197 TFLOPS	16GB	2D Torus	推理
v6e	Trillium	918 TFLOPS	32GB	2D Torus	训练/推理
v7	Ironwood	2,307 TFLOPS	192GB	3D Torus	推理优先
8t	Trillium 2 训练	~3,500 TFLOPS	216GB	3D Torus + Axion CPU	训练专用
8i	Trillium 2 推理	~5,500 TFLOPS	288GB	3D Torus	推理专用

Google Cloud：TPU v5p / v6e / v7 Ironwood / TPU 8t (训练) + 8i (推理) 拆分 (2026-04)
AWS：Trainium 3 (2025-12 GA, 3nm) / Trainium 2
本地 / 私有云：Intel Gaudi 3（开放标准以太网）