• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

What Happened to the 9950X3Dv2?

IMHO, waiting for 10950x3D2 (11950x ?? AMD did 3 -> 5, 5 -> 7, 7->9 so probably 9->11) makes more sense as it will have much lower ccd to cdd latency. The big open question is what is the full impact to cache misses if you have bigger cache on each ccd?
 
IMHO, waiting for 10950x3D2 (11950x ?? AMD did 3 -> 5, 5 -> 7, 7->9 so probably 9->11) makes more sense as it will have much lower ccd to cdd latency. The big open question is what is the full impact to cache misses if you have bigger cache on each ccd?
Yeah - this seems to be a CPU only for workstation-type workloads and not for gaming. AM6 is the play for anyone that is a gamer. Although there will be a TON of people that treat this as AMD's "Intel KS" part because they want the best now (until next year). It did not last long on Amazon so I wonder if they even made that many.
 
A 10800x3D with 12 cores and 1xCCD would be the best processor in AMD’s gaming lineup.
But mainly because of the higher IPC/clock speed, not because of the higher core count, since, as we can see right now, 8 cores are more than enough for 99% of games.
And once Steam starts offering shader caches as downloads, we won’t need cores for shader compilation anytime soon.
 
A 10800x3D with 12 cores and 1xCCD would be the best processor in AMD’s gaming lineup.
But mainly because of the higher IPC/clock speed, not because of the higher core count, since, as we can see right now, 8 cores are more than enough for 99% of games.
And once Steam starts offering shader caches as downloads, we won’t need cores for shader compilation anytime soon.
Exactly this. Zen 6 single-CCD X3D will be the ultimate end processor for gamers on AM5 to upgrade to, not any of the dual-CCD options.
 
Welp, now we know that more v-cache doesn't help in gaming at all it seems.

It explains why they never released it before. Who is the CPU really for?
 
Disappointing. I wonder if drivers and/or bios updates would help.
I don't think so.
Welp, now we know that more v-cache doesn't help in gaming at all it seems.

It explains why they never released it before. Who is the CPU really for?
Well, to amend your statement to be more accurate: We now know that extra v-cache on the second CCD doesn't really help in gaming. The cross-CCD latency penalty still applies.

In other news, the one game that I was curious about (Star Citizen) with the 9950X3D2 has results exactly as I feared... mediocre. Zen 6 X3D with twelve cores on a single die is definitely the play.

View: https://youtu.be/zYeu141cRFw?t=470
 
Welp, now we know that more v-cache doesn't help in gaming at all it seems.
would it be on the same CCD maybe it would, a game want to live all on the same CCD, this product was never much expected to help gaming (outside some OS scheduler mistake)

Who is the CPU really for?
On phoronix test suite the 9950x3d->9950x3d2 boost is not that dissimilar to going from a 9900x to a 9950x, you need a workload that love cache and parrallelisable accross multiple ccd, some scientific computing will be able to take advantage.

https://www.phoronix.com/review/amd-ryzen-9950x3d2-linux/6
Look at the openFoam 10 , small mesh size, 30-35% faster than the regular 9950x3d (who was much faster than the 9950x)
 
Last edited:
It's just as expected anyhow, you can still find games where it helps maybe, but do you play them? And how much do you care about them, specifically?

Wendell found Borderlands 4, for example:
View: https://youtu.be/1PcO2k6vprY?t=664

He also highlights some evident driver issues though, since the 8 core part smashes the 16 core one in some titles so, yeah... dual CCD still sucks for gamers :) (can easily be fixed but fails out of the box)

Oh and it also gets really hot and pulls a fair amount of power with all that cache, again not very surprising, but I would actually not want to put it on my entry level AM5 motherboard, I think.
 
Last edited:
I found this in a video, so I guess AMD has done something to optimize latency between CCDs, and it helps, as we can see in some of the Phoronix benchmarks, where the dual cache provides a significant benefit.

1776796596544.png
 
I strained my eyes but your screenshot doesn't show any improvement or significant difference?

In fact many numbers look lower aka better on the right part, which is the V1.
 
I strained my eyes but your screenshot doesn't show any improvement or significant difference?

In fact many numbers look lower aka better on the right part, which is the V1.
Yes, I expected them to perform worse, but their results are very close.
I wonder if AMD copied the data to both caches to reduce the need for data transfer between the two CCDs.
 
where the dual cache provides a significant benefit.
how so ? about a tie it seem (which is logical), even a bit worse, difference is probably just the clock speed a bit lower (that would be why it is worse for both, same ccd or cross) ?

In real world usage, cache hit miss could go down, but that advantage will not show up here in that graph that look at the latency when you need to go read it on the other CCD.

copied the data to both caches to reduce the need for data transfer between the two CCDs.
If you need coherence that would assure the need to always transfer the data and a fast dirty/clean up to date status for everything and of course you cut the effective cache capacity in half (if it is small enough what you can fit it in 50% of the cache, may has well simply run all that task on a single CCD for now, I imagine some special case exist for such mirror mode too if large portion can be considered read-only safely, would be interesting to see)
 
Last edited:
I don't think so.

Well, to amend your statement to be more accurate: We now know that extra v-cache on the second CCD doesn't really help in gaming. The cross-CCD latency penalty still applies.

In other news, the one game that I was curious about (Star Citizen) with the 9950X3D2 has results exactly as I feared... mediocre. Zen 6 X3D with twelve cores on a single die is definitely the play.

View: https://youtu.be/zYeu141cRFw?t=470

Yep, essentially as expected. We also now know a 7950X3D2 would have been a major flop given that the dual-CCD penalties don't magically go away just because the caches are symmetrical.
It's just as expected anyhow, you can still find games where it helps maybe, but do you play them? And how much do you care about them, specifically?

Wendell found Borderlands 4, for example:
View: https://youtu.be/1PcO2k6vprY?t=664

He also highlights some evident driver issues though, since the 8 core part smashes the 16 core one in some titles so, yeah... dual CCD still sucks for gamers :) (can easily be fixed but fails out of the box)

Oh and it also gets really hot and pulls a fair amount of power with all that cache, again not very surprising, but I would actually not want to put it on my entry level AM5 motherboard, I think.

So basically the same Windows Game Bar and scheduler issues that have plagued the dual CCD parts in gaming from the beginning. Symmetrical cache is not a fix for the scheduler issues.
 
I strained my eyes but your screenshot doesn't show any improvement or significant difference?

In fact many numbers look lower aka better on the right part, which is the V1.
Many of the squares, especially the worse ones in the ligher green color, are 2 - 3 nanoseconds better. And some are more like 4 - 6 ns better.

In latency sensitive scenarios, that may show as much as a handful of percent better performance. If its latency sensitive and also scales well with lots of threads, the cumulative benefits could be double digits.
 
Many of the squares, especially the worse ones in the ligher green color, are 2 - 3 nanoseconds better. And some are more like 4 - 6 ns better.

In latency sensitive scenarios, that may show as much as a handful of percent better performance. If its latency sensitive and also scales well with lots of threads, the cumulative benefits could be double digits.
Yea but they are better on the V1 according to this picture, but maybe that's a mistake.
 
how so ? about a tie it seem (which is logical), even a bit worse, difference is probably just the clock speed a bit lower (that would be why it is worse for both, same ccd or cross) ?

In real world usage, cache hit miss could go down, but that advantage will not show up here in that graph that look at the latency when you need to go read it on the other CCD.
The difference is 200 MHz plus another 200 MHz from PBO (if used (and hardly achieved)).
In some cases, a higher TDP can be more helpful than the clock speed, or a combination of both.

Yes, these aren't real-world applications, but they emulate them, so they come close to the real thing.

1776811242300.png


Here the frequency didn't help on 1xCCDs.
1776811433952.png

Here is very bad CCD to CCD latency on V1.
1776811527162.png


Here is double vs 1xCCD
1776811857622.png

etc.
 
Here is very bad core to core latency on V1.
That not a latency benchmark, having more cache can help augment cache it and not needing to go read in ram (or on the other core CCD L3), this can happen with a worse latency easily (which the v2 has, according to your graph)
 
That not a latency benchmark, having more cache can help augment cache it and not needing to go read in ram (or on the other core CCD L3), this can happen with a worse latency easily (which the v2 has, according to your graph)
Yeah I mean CCD to CCD and on V2 everything is in cache, on V1 it just jump between both CCDs.
 
Here’s the most interesting part of the games.
Does the game engine need more cache, or is there simply too much data being requested—or both?
It’s very intriguing to know why this happens.

1776814525142.png
 
It’s very intriguing to know why this happens.
it is a strange one, was able to push a 5600x at 100% usage (12 threads), gained a lot from going 8p core 8 ecore to 8pcore-16 ecore at launch:
https://www.pcgamer.com/hardware/wh...quality-not-core-count-that-matters-the-most/

Having to make a bad guess, it can go and use over 8 cores and the work being done of the second CCD take advantage of the larger cache being available...

It could be prefering an L3 even on the bad CCD to ram (even tough the latency can be worse, the bandwith is much faster and the latency more reliable, best ddr5 is 10-15ns faster but busy with many core using it can create worse than that scenario).

For games that fit well on a single CCD, the extra cache on the other does not bring much, for a game that can spread with heavy work over 12 cores like boderland 4... start to show.
 
Here’s the most interesting part of the games.
Does the game engine need more cache, or is there simply too much data being requested—or both?
It’s very intriguing to know why this happens.

View attachment 798476
That's a pretty different result from what TechSpot got. TechSpot's result was more along the lines of "meh, don't worry about it". It was basically tied with the 9950X3D in BL4. Of course they tested different settings - medium and badass, while that test was run on "very high".
TechSpot review: https://www.techspot.com/review/3114-amd-ryzen-9-9950x3d2/

So far it seems like the 9950X3D2 is pretty much a "don't bother" for gaming. A bit faster once in a while, but not worth it $ wise over a 9800X3D or 9850X3D for just gaming. Then for productivity + gaming it only seems worth it if over a 9950X3D if you have some app that really likes the extra cache on both CCDs.
 
Yeah I mean CCD to CCD and on V2 everything is in cache, on V1 it just jump between both CCDs.
But the cache is split on the V2, so there is still jumping between both CCDs, anyways. V1 is 32+64MB on CCD0, 32MB on CCD1, V2 is 32+64 on each CCD, that's it.

Unless your data is small enough to fit in the cache of one CCD... but then the V1 (or a 9800X3D) will provide the same gaming experience... which is exactly what the reviews show.
 
Last edited:
it is a strange one, was able to push a 5600x at 100% usage (12 threads), gained a lot from going 8p core 8 ecore to 8pcore-16 ecore at launch:
https://www.pcgamer.com/hardware/wh...quality-not-core-count-that-matters-the-most/

Having to make a bad guess, it can go and use over 8 cores and the work being done of the second CCD take advantage of the larger cache being available...

It could be prefering an L3 even on the bad CCD to ram (even tough the latency can be worse, the bandwith is much faster and the latency more reliable, best ddr5 is 10-15ns faster but busy with many core using it can create worse than that scenario).

For games that fit well on a single CCD, the extra cache on the other does not bring much, for a game that can spread with heavy work over 12 cores like boderland 4... start to show.
Intel cores are faster, but AMD cores get the data faster - so in real world scheduler is what made one or the other "quality".

Since the Borderlands 4 engine has plenty of room for parallelism and can take advantage of multiple cores, having a large cache is helpful in this situation.
But yes, that’s rare for a game engine.
I didn't know that for BL4 :)

That's a pretty different result from what TechSpot got. TechSpot's result was more along the lines of "meh, don't worry about it". It was basically tied with the 9950X3D in BL4. Of course they tested different settings - medium and badass, while that test was run on "very high".
TechSpot review: https://www.techspot.com/review/3114-amd-ryzen-9-9950x3d2/

So far it seems like the 9950X3D2 is pretty much a "don't bother" for gaming. A bit faster once in a while, but not worth it $ wise over a 9800X3D or 9850X3D for just gaming. Then for productivity + gaming it only seems worth it if over a 9950X3D if you have some app that really likes the extra cache on both CCDs.
Steven Walton's results sometimes differ from those in other parts of the world and have a good chance of being wrong.

Yes, for gaming, the 9950X3D2 is simply too expensive a toy, but if someone can afford it, why not? After all, people buy $2,000 phones when they could make calls with one for $10.
 
Here’s the most interesting part of the games.
Does the game engine need more cache, or is there simply too much data being requested—or both?
It’s very intriguing to know why this happens.

View attachment 798476
The game probably scales up with extra cores. And having Vcache on both CCD's, means the only penalty is cross CCD communication. With the 9950X3D, you would also have the imbalance of the 2nd CCD not having extra cache. So, it may either try to send to RAM, or simply try and keep everything on the Vcache CCD. Thus, losing the core scaling benefits.

Cyberpunk used to benefit from the extra cores of dual CCD CPUs. But, it seems like they have changed how that game schedules CPUs, like 3 or 4 times.
 
It would be cool if they could make one big cache that both cores could use in the 10x series
Won't happen. Dual CCD chips will still have per CCD L3 cache. Zen6 is going to up to 12 cores per CCD though, so you'll be able to get a 12-core chip with a single cache or a 24-core setup with 2x 12-core CCDs. Zen6 is also supposed to bring big improvements to the CCD to I/O die interconnect.
 
Not that it really matters much but from reviewing Skatterbencher's findings:

Stock PPT is 270w (ouch) vs 200w on the V1, but the CPU doesn't ever reach that - probably some hard amps/voltage limits at play to preserve the cache, still a lot more power than the V1 for sure. After overclocking, the V1 and V2 were essentially tied in power draw. The V2 does not have an improved memory controller whatsoever, no surprises there but some thought it might be the case.

CCD0 is better than CCD1 by 200mhz, so this is still not even a symmetrical chip (and we're just talking +100mhz on V2 CCD0 vs the V1 CCD0).

All in all despite the price tag the silicon is clearly not that different in quality, any 9950X3D V1 can already be pushed to 5.6ghz easily on the vcache CCD after all.

One theoretical upside is that even without special drivers/settings, and unlike the V1 chip, Windows should want to use CCD0 when running a game (and focusing the application window, which some reviewers clearly forget to haha), but this of course does not fix the cross CCD latency issue if the game tries to use more threads when it shouldn't.

Eh, let's see what happens with Zen 6 and Intel stuff instead :)
 
Back
Top