NVIDIA Next-Gen Hopper GH100 Data Center GPU Unveiled: 4nm, 18432 Cores, 700W Power Draw, 4000 TFLOPs of Mixed Precision Compute

NVIDIA announced its much anticipated Hopper data center graphics architecture today. Retaining the building blocks of the GA100 “Ampere” die, the GH100 significantly expands its low precision compute capabilities. We’re looking at an incredible 4000 TFLOPs of INT4, 2000 TFLOPs of INT8, 1000 TFLOPs of BF16 and FP16, and a respectable 500 TFLOPs of TF32 performance when leveraging sparse matrices.

The non-matrix performance is less impressive, promising 60 TFLOPs of FP64, 60 TFLOPs of FP32, and 120 TFLOPs of FP16/BF16 compute. In comparison, AMD’s recently launched Instinct MI250X boasts an incredible 96 TFLOPs of FP32 and 47 TFLOPs of FP64 compute performance.

Going by the specs on paper, NVIDIA has invested heavily in integer matrix multiplication, offering 10x more performance than the MI250X in these workloads (4000/2000 TFLOPs vs 383 TFLOPs). AMD, on the other hand, has focused on traditional FP32 and FP64 performance.

Internally, the FP64 and INT32 core counts are unchanged, but FP32 has been bumped up to 128 per SM, just like Ampere and Ada. There are four Tensors per SM for a total of 528 for the entire H100 GPU. For memory, we’re looking at a moderate 60MB of L3 cache and (up to) six 512-bit HBM2e memory stacks. The memory is said to be clocked at 1600MHz.

Data Center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100NVIDIA H100
GPU CodenameGP100GV100GA100GH100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA AmpereNVIDIA Hopper
FP32 Cores / SM646464128
FP32 Cores / GPU35845120691216896
FP64 Cores / SM32323232
FP64 Cores / GPU1792256034568448
INT32 Cores / SMNA646464
INT32 Cores / GPUNA512069128448
Tensor Cores / SMNA8424
Tensor Cores / GPUNA640432528
Texture Units224320432528
Memory Interface4096-bit HBM24096-bit HBM25120-bit HBM2512-bit x5
Memory Size16 GB32 GB / 16 GB40 GB128GB?
Memory Data Rate703 MHz DDR877.5 MHz DDR1215 MHz DDR1600 MHz DDR?
Memory Bandwidth720 GB/sec900 GB/sec1555 GB/sec?
L2 Cache Size4096 KB6144 KB40960 KB60MB
TDP300 Watts300 Watts400 Watts700W
TSMC Manufacturing Process16 nm FinFET+12 nm FFN7 nm N74 nm N4

The other highlight is the inclusion of PCIe Gen 5 and the NVLink bus interface, enabling up to 900GB/s of GPU-to-GPU bandwidth. Overall, the H100 offers a substantial 4.9 TB/s of external bandwidth. Finally, this monster GPU has a TDP of 700W despite featuring TSMC’s N4 node. The 4nm (N4) process node is a refinement of the N5 (5nm) node.

Areej Syed

Processors, PC gaming, and the past. I have written about computer hardware for over seven years with over 5000 published articles. I started during engineering college and haven't stopped since. On the side, I play RPGs like Baldur's Gate, Dragon Age, Mass Effect, Divinity, and Fallout. Contact:
Back to top button