Skip to main content

NVIDIA Rubin R200

NVIDIA Rubin R200 is NVIDIA's next-generation AI GPU announced at GTC 2026, built on TSMC 3nm process with 336 billion transistors, equipped with 288GB HBM4 memory, delivering 50 PFLOPS FP4 inference compute, succeeding the Blackwell architecture.

Key Specifications

SpecificationValue
GPU ArchitectureRubin Architecture (MCM multi-chip module design)
Process NodeTSMC 3nm (4NP custom process)
Transistor Count336 billion
FP4 Inference Compute50 PFLOPS
FP8 Training Compute35 PFLOPS
FP16 ComputeEstimated ~25 PFLOPS
FP32 Compute130 TFLOPS
FP64 Compute200 TFLOPS
INT8 ComputeEstimated ~100 PFLOPS
Memory Capacity288 GB
Memory TypeHBM4
Memory Bandwidth22 TB/s
InterconnectNVLink 6 (3.6 TB/s unidirectional)
TDP1,800-2,300W (liquid cooling required)
Release DateMarch 17, 2026
Mass ProductionSecond half of 2026
Pricing$350-400M per NVL72 rack

Architecture & Specifications

Rubin R200 adopts a multi-chip module (MCM) design, with the core comprising:

  • 2 compute dies (GPU dies)
  • 2 I/O dies (handling HBM controllers and NVLink physical layer)
  • 8 HBM4 memory stacks

Key Technical Innovations

  1. Third-Generation Transformer Engine

    • Supports hardware-level adaptive precision compression
    • Dynamically switches precision without rewriting model code (FP4/FP6/FP8/FP16/BF16/TF32/FP32/FP64)
  2. NVLink 6 Interconnect

    • Single GPU full interconnect bandwidth of 3.6 TB/s (bidirectional)
    • NVL72 rack total bandwidth of 260 TB/s
  3. HBM4 Memory

    • 288GB capacity (1.5× increase over Blackwell B200's 192GB HBM3e)
    • 22 TB/s bandwidth (2.75× increase over Blackwell B200's 8 TB/s)

Performance Comparison

ComparisonBlackwell B200Rubin R200Improvement
Transistor Count208 billion336 billion1.6×
Memory Capacity192GB HBM3e288GB HBM41.5×
Memory Bandwidth8 TB/s22 TB/s2.75×
NVLink Bandwidth1.8 TB/s3.6 TB/s
FP4 Inference~10 PFLOPS50 PFLOPS
TDP1,000W1,800-2,300W1.8-2.3×

Platform Configurations

Vera Rubin NVL72

  • 72× Rubin R200 GPUs
  • 36× Vera CPUs
  • Total Compute: ~3.6 EFLOPS FP4
  • Total Memory: 20.7 TB HBM4
  • Total Bandwidth: 1.58 PB/s
  • TDP: ~180kW (full liquid cooling required)
  • Pricing: $3.5-4M

Vera Rubin NVL144 (Planned)

  • 144× Rubin R200 GPUs
  • 72× Vera CPUs
  • Total Compute: ~7.2 EFLOPS FP4
  • LLM Inference Cost: Reduced to 1/10 of Blackwell platform

Mass Production & Delivery

  • Mass Production: Second half of 2026
  • First Customers: AWS, Azure, Google Cloud, Oracle Cloud
  • On-Premises Users: Q1 2027 availability
  • Partner Products: Market delivery in second half of 2026

Application Scenarios

Rubin R200 targets data center and supercomputing scenarios, suitable for:

  • Trillion-parameter LLM training
  • High-performance AI inference (low latency, high throughput)
  • Reinforcement learning (RL) and agentic AI
  • Scientific computing and AI factories

References