AMD Ryzen 9 5900X3D Prototype with 192MB of L3, Multiple 3D V-Cache Dies Spotted

When AMD launched the Ryzen 9 7950X3D and the 7900X3D, most of us hoped for multiple 3D V-Cache dies, one for each CCD. We instead got a hybrid design with one 3D stacked and a standard Zen 4 CCD. The resulting chips were slower than the Ryzen 7 7800X3D, which came with a single 3D stacked V-Cache die. We have also heard rumors of a potential Ryzen 5 5600X3D, a late AM4 V-Cache part with two cores disabled and slightly lower clocks.

GamersNexus recently visited AMD’s Austin campus, getting a first look at the inner workings of the engineering team. The team witnessed the testing and validation of the engineering samples when they first came from the foundry and the subsequent fusion process.

The engineers explained the birth of the Ryzen 7 5800X3D, AMD’s most successful AM4 CPU. Turns out that 3D V-Cache was originally supposed to be a server-only offering. The Ryzen 7 5800X3D was the result of a spare sample and an engineer’s curiosity to test it on AM4. The result was the fastest gaming consumer CPU at the time of launch.

AMD has been testing different configurations of the V-Cache technology, including multiple 3D stacked cache dies on the same SKU. One such sample can be seen in the above shot. We’re looking at a 12-core Ryzen 5000 chip with 12 cores featuring 192MB of L3 cache (128MB of V-Cache). Both the CCDs have their respective cache die.

AMD decided against this configuration because it failed to provide any tangible gains in gaming. The whole point of 3D V-Cache is to have a uniform L3 cache pool where a large chunk of the code can be cached. This stores the frequently accessed shader data on-die, significantly improving the hit rates and reducing cache/memory-related pipeline stalls.

With two distinct cache pools, the latency benefit is somewhat lost. The reasons are multiple. Most games leverage only up to eight CPU cores, and storing data related to these threads on the other die does the opposite of what it’s supposed to (increases latency). Furthermore, splitting the cache also splits the data into two distinct pools that are physically apart, worsening the cache residency.

The game doesn’t know which cache pool exists on which die and next to which cores. This can lead to data related to CCD0 being stored on CCD1 or vice versa, randomly or due to the former being full. Consequently, you’re fetching data from a different die with an induced latency, producing results similar to what you’d expect with high-speed DDR5 RAM.

Source: YouTube.