NVIDIA Hopper GH100 Chiplet GPU to Feature 144 SMs, 9216 FP32 Cores, 4608 FP64 Cores, 128GB+ HBM3 Memory [Report]

NVIDIA plans to launch its next-gen Hopper data center GPUs later this year, with a possible announcement at GTC 2022 in late March. Based on a chiplet design, the GH100 will accelerate inferencing and other machine learning workloads, while also pushing record big data and HPC performance. The GH100 full-fat die will pack a total of 144 SMs (Streaming Multiprocessors) across 8 GPCs (Graphics Processing Clusters), and 72 TPCs (Texture Processing Clusters).

Data Center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100NVIDIA H100
GPU CodenameGP100GV100GA100GH100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA AmpereNVIDIA Hopper
FP32 Cores / SM64646464
FP32 Cores / GPU3584512069129216
FP64 Cores / SM32323232
FP64 Cores / GPU1792256034564608
INT32 Cores / SMNA646464
INT32 Cores / GPUNA512069129216
Tensor Cores / SMNA842?
Tensor Cores / GPUNA640432?
Texture Units224320432576
Memory Interface4096-bit HBM24096-bit HBM25120-bit HBM26144-bit HBM3?
Memory Size16 GB32 GB / 16 GB40 GB128GB?
Memory Data Rate703 MHz DDR877.5 MHz DDR1215 MHz DDR1600 MHz DDR?
Memory Bandwidth720 GB/sec900 GB/sec1555 GB/sec?
L2 Cache Size4096 KB6144 KB40960 KB96000 KB?
TDP300 Watts300 Watts400 Watts500W?
TSMC Manufacturing Process16 nm FinFET+12 nm FFN7 nm N75 nm N5

The key upgrade over Ampere is the one additional TPC per SM, bringing up the total to 72 (each GPC packing 9 TPCs). This increases the overall ALU count by 30-35%, bringing the FP32 figure to 9,216, and FP64 to 4,608. If Hopper will actually leverage a chiplet approach, then we’re probably looking at just one of the two dies making up the entire SKU.

Running the numbers, you get a total of 288 SMs, 144 TPCs, 16 GPCs, resulting in an overall core count of 18,432 and 9,216 for the FP32 and FP64 ALUs, respectively. If NVIDIA wants to overcome AMD’s recently announced Instinct MI200 family, then these figures are kind of necessary.

The memory pool will also grow quite significantly with Hopper, increasing from “just” 40GB HBM2 on the GA100 to at least 128GB of HBM3 on the GH100. HBM 3 allows for 24GB stacks across a 1,024-bit bus which means NVIDIA can use up to 16-24GB stacks acoss six 1,204-bit memory controllers. The L2 cache is another mystery, but considering that the GA100 already had over 40MB, it’s probable that we’ll see 96MB on the GH100, the same as the AD102.

Areej Syed

Processors, PC gaming, and the past. I have written about computer hardware for over seven years with over 5000 published articles. I started during engineering college and haven't stopped since. On the side, I play RPGs like Baldur's Gate, Dragon Age, Mass Effect, Divinity, and Fallout. Contact:
Back to top button