AMD Instinct MI300A (APU)

Overview

AMD Instinct MI300A is an APU-architecture AI training card featuring GPU + CPU integrated packaging and a unified memory architecture similar to Apple's M-Series. Building on the MI300X (pure GPU) foundation, it adds 24 Zen 4 CPU cores sharing a 128GB HBM3 memory pool.

HPC performance monster: 1.5 PFLOPS FP8 / 2.5 PFLOPS FP16. The world's first exascale supercomputer, El Capitan (Lawrence Livermore National Laboratory), uses 44,000+ MI300A units.

Core Specifications

Item	Spec
Architecture	CDNA 3 + Zen 4 (APU)
Process	TSMC 5nm + 6nm Chiplet
GPU Stream Processors	14,592 (228 CUs)
CPU Cores	24 Zen 4 cores (×4 CCD)
Unified Memory	128 GB HBM3 (CPU+GPU shared)
Memory Bandwidth	5.3 TB/s
FP16 Compute	1.5 PFLOPS (dense) / 2.5 PFLOPS (sparse)
FP8 Compute	1.5 PFLOPS (dense) / 2.5 PFLOPS (sparse)
INT8	1.5 POPS
TDP	600 W
Interface	PCIe Gen5 ×16 + Infinity Fabric
Interconnect	Infinity Fabric 4 (896 GB/s)
Launch	2024-01 (El Capitan deployment)
Price	$15,000-$20,000 (OEM)

APU Architecture Explained

Unified Memory Advantage

CPU + GPU share 128GB HBM3 (no data copies needed).
5.3 TB/s bandwidth (HBM3e rated 5.3 TB/s).
Ideal for HPC numerical simulation (CPU handles logic, GPU handles parallel computation).

Chiplet Design

3× 5nm SoC chiplets (GPU + I/O)
6× 6nm IOD chiplets (memory controller + Infinity Fabric)
24 Zen 4 cores distributed across SoC die
Active interposer interconnect

Comparison with MI300X

Metric	MI300A	MI300X
CPU	24 Zen 4 cores	None
Memory	128GB HBM3	192GB HBM3
Bandwidth	5.3 TB/s	5.3 TB/s
FP16	1.5 PFLOPS	1.5 PFLOPS
TDP	600W	750W
Use	HPC + AI	Pure AI

El Capitan Supercomputer

2024 TOP500 #1 (2024-11)
1.742 ExaFLOPS FP64 (double precision)
44,544 MI300A units
Power consumption ~30 MW (vs 50+ MW for top x86 supercomputers)
HPC tasks: Nuclear weapon simulation, climate change, materials science

Vendor Information

Item	Detail
Vendor	AMD
Product Page	https://www.amd.com/en/products/accelerators/instinct-mi300a.html
OEM Price	$15,000-$20,000
Target Market	HPC, exascale, AI training

Use Cases

✅ HPC + AI convergence (El Capitan-class supercomputers)
✅ Numerical simulation + ML hybrid (climate, materials, life sciences)
✅ Large model training (replaces 192GB MI300X)
✅ Graph neural networks requiring CPU acceleration
❌ Pure LLM inference (use MI300X or H100)
❌ Edge deployment (600W TDP)

AMD MI300X - Pure GPU sibling
AMD MI350 - Next-gen CDNA 4
Apple M3 Ultra - Also an APU
Cerebras WSE-3 - Supercomputing comparison

Overview​

Core Specifications​

APU Architecture Explained​

Unified Memory Advantage​

Chiplet Design​

Comparison with MI300X​

El Capitan Supercomputer​

Vendor Information​

Use Cases​

Related Cards​