NVIDIA is slated to launch its next-gen RTX 40 series graphics cards later this year (most likely September 2022). Based on the Ada Lovelace microarchitecture and TSMC’s N5 (5nm) process node, these GPUs are slated to offer more than twice as much performance as their preceding RTX 30 “Ampere” family. All this will, of course, come at a cost not only for consumers but NVIDIA as well. According to the company’s Q4 earnings report, by the end of the quarter, Team Green had spent up to $9 billion for inventory purchases and prepayments for future products.
Although the hard launch of the RTX 40 series parts won’t come any sooner than Q3 2022, the first GPU featuring the Ada Lovelace microarchitecture may be announced at GTC 2022. We’re talking about the successor to the A100. It’s worth noting that the A100 was fabbed on TSMC’s 7nm node, whereas the RTX 30 series leveraged Samsung’s 8nm process.
Data Center GPU | NVIDIA Tesla P100 | NVIDIA Tesla V100 | NVIDIA A100 |
---|---|---|---|
GPU Codename | GP100 | GV100 | GA100 |
GPU Architecture | NVIDIA Pascal | NVIDIA Volta | NVIDIA Ampere |
GPU Board Form Factor | SXM | SXM2 | SXM4 |
SMs | 56 | 80 | 108 |
TPCs | 28 | 40 | 54 |
FP32 Cores / SM | 64 | 64 | 64 |
FP32 Cores / GPU | 3584 | 5120 | 6912 |
FP64 Cores / SM | 32 | 32 | 32 |
FP64 Cores / GPU | 1792 | 2560 | 3456 |
INT32 Cores / SM | NA | 64 | 64 |
INT32 Cores / GPU | NA | 5120 | 6912 |
Tensor Cores / SM | NA | 8 | 42 |
Tensor Cores / GPU | NA | 640 | 432 |
GPU Boost Clock | 1480 MHz | 1530 MHz | 1410 MHz |
Looking at the GV100 and the GA100, it’d be fair to assume that the AD100 will feature a different SM floorplan as well, and possibly increase the SM count to over 130. Like always, the FP32: FP64 will be 1:1, with Tensor cores and sparse matrix multiplication getting special attention. Furthermore, unlike the RTX 40 series, it’ll be paired with HBM2e memory (over 100GB of it).
The RTX 4080, on the other hand, should feature 16GB of GDDR6X memory running at around 21Gbps, while the RTX 4090 should pack somewhere between 20-30GB of GDDR6X memory. In terms of specifications, we’re looking at an FP32 core count of up to 18,432. The AD102 flagship is rumored to feature 144 SMs distributed across 12 GPCs. That results in a raw compute gain of over 2.5x (90 TFLOPs) over the GA102, granted the core is running close to 2GHz.
GPU | TU102 | GA102 | AD102 | GH202 |
---|---|---|---|---|
Arch | Turing | Ampere | Ada Lovelace | Hopper |
Process | TSMC 12nm | Sam 8nm LPP | TSMC 5nm | 3nm? |
GPC | 6 | 7 | 12 | ~20 |
TPC | 36 | 42 | 72 | ~140 |
SMs | 72 | 84 | 144 | ~300 |
Shaders | 4,608 | 10,752 | 18,432 | ~36,000? |
TFLOPs | 16.1 | 37.6 | 90 TFLOPs? | 150 TFLOPs+ |
Memory | 11GB GDDR6 | 24GB GDDR6X | 24GB GDDR6X | 32GB GDDR7? |
Bus Width | 384-bit | 384-bit | 384-bit | 512-bit |
TGP | 250W | 350W | 600W? | 600W+ |
Launch | Sep 2018 | Sep 20 | Aug-Sep 2022 | 2024 |
Read more: