Skip to main content

NVIDIA Vera Rubin Platform Deep Dive: 6-Chip Package, 288GB HBM4, 50 PFLOPS FP4

· 5 min read
AI Compute Cards Wiki Editorial
Industry Research Team

The NVIDIA Vera Rubin platform is NVIDIA's next-generation flagship computing platform after Blackwell. This article provides an in-depth analysis covering the naming origin, 6-chip packaging, memory subsystem, compute matrix, networking architecture, rack-scale solution, and software ecosystem.

Naming Origin: Honoring Astronomer Vera Rubin

NVIDIA chose "Vera Rubin" as the codename for the next-generation platform, honoring astronomer Vera Florence Cooper Rubin (1928-2016). In the 1960s-70s, through studying galactic rotation curves, she provided the first conclusive proof of dark matter's existence.

NVIDIA naming the next-generation platform of the AI compute revolution after her carries deep philosophical meaning:

  • Dark matter is the "invisible" yet dominant mass component of the universe
  • AI compute is the "invisible" yet dominant underlying infrastructure of the digital economy

6-Chip Packaging (CoWoS-L)

The Vera Rubin platform adopts the industry's first 6-chip CoWoS-L packaging:

ChipQuantityRoleProcess
Vera CPU1Host CPU / Prefetch / Interconnect ControllerTSMC 3NP
Rubin GPU Die2Matrix Compute CoresTSMC 3NP / 4NP
I/O / HBM Base Die3HBM4 PHY + I/O + InterconnectTSMC 4NP

Compared to the Blackwell B300 Ultra's 2-chip package (only 2 GPU dies), Vera Rubin features a 3× increase in chip count, with substantial transistor count growth.

Core Specifications (Per GPU)

ItemSpecification
ArchitectureRubin
Transistor Count~340 billion (per GPU)
Memory288 GB HBM4
Memory Bandwidth22 TB/s (2.75× HBM3e)
FP4 Tensor (sparse)50 PFLOPS
FP8 Tensor (sparse)25 PFLOPS
FP16/BF16 Tensor12.5 PFLOPS
TDP~1,800 W (liquid cooling required)

Data Convention: Post-Blackwell NVIDIA products continue to use sparse compute as the official metric. FP4 50 PF = dense 25 PF. For cross-vendor comparison, AMD MI400's 40 PF FP4 dense = NVIDIA Rubin R200's 80 PF FP4 sparse.

Vera CPU Deep Dive

The Vera CPU is NVIDIA's first self-designed ARM-based server CPU, succeeding Grace:

ItemSpecification
ArchitectureARM v9.2 Olympus
Core Count88 cores (single package)
ProcessTSMC 3NP
L2 Cache1 MB per core
L3 CacheShared 264 MB
Memory12-channel DDR5-8000
Memory Bandwidth614 GB/s
CXL 2.0Yes (Type-3 memory expansion)
PCIePCIe Gen 6 (128 lanes)
TDP300-450 W

Why ARM? NVIDIA acquired ARM in 2020 (failed), but obtained the permanent ARM architecture license and Neoverse series code. Vera is essentially NVIDIA's "ARM Olympus" design.

Rubin GPU: 4x Blackwell Performance

The Rubin GPU achieves 4× Blackwell B300 Ultra performance through:

  • 2× SMs (Streaming Multiprocessors)
  • 2× HBM4 bandwidth (22 TB/s vs HBM3e 11 TB/s)
  • FP4 Tensor Core (new precision, 4× FP8)
  • Transformer Engine 3 (9× DNN inference acceleration)
  • RAS Engine (Reliability, Availability, Serviceability)

Memory Subsystem: HBM4 288GB

The HBM4 standard, formalized in 2026, brings:

  • Per-stack capacity: 36 GB → 48 GB (33% increase)
  • Per-stack bandwidth: 1.2 TB/s → 1.6 TB/s (33% increase)
  • Stack height: 12-Hi → 16-Hi
  • Per-package capacity: 288 GB (6 stacks × 48 GB)
  • Per-package bandwidth: 22 TB/s (theoretical peak)

Compute Matrix: FP4 Everywhere

Vera Rubin is the first NVIDIA platform to natively support FP4 data type:

PrecisionThroughput (per GPU, sparse)Use Cases
FP450 PFLOPSInference, small-batch training
FP825 PFLOPSLLM training, large model inference
FP16/BF1612.5 PFLOPSTraditional training, scientific computing
FP326.25 PFLOPSHPC, traditional scientific computing
FP643.1 PFLOPSNumerical simulation

Networking: ConnectX-9 28.8 TB/s

The ConnectX-9 is NVIDIA's next-generation network adapter:

  • Bandwidth: 1.6 Tbps per port (2× ConnectX-8)
  • GPU ratio: 2× ConnectX-9 per GPU
  • Aggregate GPU-to-GPU: 28.8 TB/s (per Rubin GPU)
  • Protocols: NVLink 6, PCIe Gen 6, RoCE v2, InfiniBand NDR
  • Encryption: AES-256-GCM hardware acceleration

Rack-Scale Solutions

Vera Rubin supports two rack-scale configurations:

1. Rubin NVL72 (1 Rack)

  • GPU count: 72 (NVL72 1×72 topology)
  • CPU count: 36 Vera
  • Total HBM4: 20.7 TB
  • Total bandwidth: 1.6 PB/s
  • Peak FP4: 1.4 EFLOPS (1 Rack unit)
  • Power: ~130 kW

2. Rubin NVL576 (8 Racks)

  • GPU count: 576 (NVL576 8×72)
  • CPU count: 288 Vera
  • Total HBM4: 165 TB
  • Total bandwidth: 12.7 PB/s
  • Peak FP4: 28.8 EFLOPS (full cluster)
  • Power: ~1.1 MW (single data hall)

1.1 MW single-hall power consumption marks Vera Rubin NVL576 as the first data hall that requires substation-level dedicated power supply, posing new challenges to data center design.

Software Ecosystem

The Vera Rubin platform's software stack:

  • CUDA 13.0: Full FP4 / FP8 / FP16 / BF16 / TF32 support
  • cuDNN 9.0: FP4 Tensor Core acceleration
  • TensorRT 11.0: LLM FP4 quantization deployment
  • Megatron-LM 0.12: Distributed training framework
  • TensorRT-LLM 2.0: LLM inference optimization
  • NeMo 2.0: LLM/Speech AI/Visual Agent framework

Competitive Analysis

VendorProductPeak FP4 denseMemoryMemory BWProcess
NVIDIAVera Rubin R20025 PF (50 PF sparse)288 GB HBM422 TB/s3NP
AMDMI40040 PF dense432 GB HBM4~12 TB/s3N + 5N
GoogleTPU Ironwood v72.3 PF dense192 GB HBM7.4 TB/s5N
AWSTrainium 31.8 PF dense144 GB HBM2.7 TB/s3N
HuaweiAscend 9200.45 PF dense192 GB HBM2e4 TB/s6N SMIC

Apple Silicon M5 Ultra (2026 H2 predicted) 400B FP8 = NVIDIA Rubin R200's 25 PF FP8 sparse (dense) × 6.25 = 156 PF FP8. Apple still 6× behind in single-chip performance, but with 384GB UMA + 1 TB/s bandwidth, it's a strong local LLM inference option.

Conclusion

Vera Rubin is the culmination of 5 architectural innovations:

  1. 6-chip CoWoS-L packaging (industry first)
  2. 288GB HBM4 (3× B200 capacity)
  3. 22 TB/s memory bandwidth (2.75× HBM3e)
  4. 50 PFLOPS FP4 sparse (2.5× B300 Ultra)
  5. ConnectX-9 28.8 TB/s (2× ConnectX-8)

For AI workloads, Vera Rubin is the next-generation foundation for trillion-parameter model training and Agentic AI inference.

💡 Disclaimer: All "predicted" and "rumored" data points are clearly marked and do not constitute investment advice.