A few weeks in the past, UL (previously Futuremark) launched the newest check in its ongoing 3DMark gaming benchmark suite, CPU Profile. The premise behind this new CPU-specific check is a simulation to measure how processor efficiency scales with cores and threads. Usually 3DMark checks are designed to measure general gaming efficiency – and thus are largely a GPU benchmark – nevertheless this one is a little bit completely different because it focuses extra particularly on CPU efficiency. So we wished to check out UL's newest check to get a greater concept of what precisely it's testing, what precisely it's making an attempt to perform, and simply how helpful it is likely to be.
UL’s 3DMark and the New Take a look at
The 3DMark software program (with the tagline 'The Gamer's Benchmark') has been a staple of the artificial benchmarking neighborhood for its number of checks designed to emulate completely different ranges of gaming complexity. From a singular interface, customers can run easy checks aimed toward cell and built-in graphics efficiency, to mid-level gaming at affordable resolutions and element, as much as overengineered checks for methods that don’t exist but. Every of the checks gives a baseline set of graphics calculations designed to emulate online game efficiency and produces a composite quantity to symbolize that efficiency for that market. In the event you’ve ever heard of Time Spy or Hearth Strike, two common benchmarking checks notably for overclockers, then 3DMark is the place it comes from.

3DMark additionally acts as a car for brand spanking new characteristic checks. Over time UL has launched separate particular checks to seek out draw name limitations, DirectX Raytracing processing efficiency, Variable Charge Shading (VRS) efficiency, PCIe 4.Zero testing, and NVIDIA DLSS efficiency. The most recent check to this portfolio is the CPU Profile, the purpose of this text.
What's the CPU Take a look at Measuring
The CPU Profile check showcases a easy low decision scene derived from the imagery of the newest gaming checks. The speed limiter of this scene is the uncooked CPU calculations within the background – the check runs an efficient 150 frames of photographs, nevertheless every body includes a parallel compute framework based mostly on the flocking of birds.
Chicken flocking, additionally recognized in simulation as boids (bird-oid object, not an accent factor), includes the interplay of a lot of objects in motion to one another relying on small random motion and guidelines relating to separation, alignment, and cohesion. Every boid has to:
- be cautious of its distance to different boids in a pack (separation),
- the path of journey relative to others (alignment), and
- the will to maneuver in direction of a mean place inside line of sight (cohesion)
We’ve all seen how birds transfer in mass flocks, or fish in shoals, and there are precise mathematical fashions that can be utilized to simulate it. A minor adjustment in separation, alignment, and cohesion can modify precisely how all of them work together and transfer.
From a simulation standpoint, every boid is impartial in its actions such that it may be calculated in parallel to others, nevertheless every boid must have data of its native surroundings and the positions and instructions of different boids inside that surroundings. The extra boids within the native surroundings, the larger the lookup desk for that particular person needs to be – the dimensions of that lookup desk on every time step is usually a combination between separation distance and line of sight: the extra objects a person can see/is interacting with without delay, the larger that calculation. The information for this lookup desk needs to be polled from many various locations in cache and reminiscence, virtually at random, and for good simulation, on each timestep as nicely.
For anybody that desires to play with a 100 boid simulation of their browser, Ben Eater has an excellent one, or customers can play with Github code right here with a Javascript model. It is a single threaded design, and simply can scale to a couple thousand on a single core with none optimized code.

Boids with easy edge boundary circumstances
Past that, boid simulation isn’t often run on CPU cores anyhow. Customers can work together with a GPU model of their browser immediately, with 65000+ boids operating very fortunately.
So with all this discuss boids, the CPU Profile check in 3DMark is doing precisely this simulation completely*. The workload outlined on 3DMark’s states that they've a easy, extremely optimized simulation of boids cut up into two components.
- One: Half the boids use SSSE3 optimized directions
- Two: Half the boids use AVX2 optimized directions, in any other case SSSE3
The benchmark does six separate sub-tests based mostly on the variety of threads: 1, 2, 4, 8, 16, max. Slightly than giving an general rating, the check palms the consumer six completely different scores, based mostly on a easy calculation:
- Rating = 350,000 / common body time
The simulation lasts for a hard and fast 150 frames, so every sub-test has the identical mounted calculation simulation (and we assume the identical mounted seeds for RNG). On the quickest processors, the max threads part can take underneath 10 seconds, permitting the simulation to run with CPUs working totally inside turbo clockspeeds (we’ll get again to why this issues later), whereas the one thread part on the slowest processors can take 5 minutes or so.
The tip outcomes web page is one thing that appears like this:

The check provides you six completely different outcomes together with a system information tracker if it was enabled.
The final word objective of the check being to benchmark CPU efficiency at a number of completely different thread counts, making a check that may scale up to make use of the entire threads a shopper CPU can present, but in addition affords a take a look at efficiency with decrease thread counts, which is the place many video games lie immediately. Put one other manner, on deciding whether or not to have a single-threaded or multi-threaded gaming check, UL determined to do each by testing with a number of thread counts.
*On launch UL's web site stated the check was in two components with a physics engine, nevertheless UL has clarified to us in e-mail that this was a replica/paste error from a earlier check. The web site has since been up to date.
Dialogue of the Take a look at
Usually when probing a brand new check for our benchmark suite, it pays off to take a essential eye to what precisely the check is measuring and the way it pertains to the true world. Each benchmark has a spot in a overview lineup, although it's at all times essential to quantify the place it ought to be, and what weight ought to be given to the outcomes. For instance, we've got real-world checks that help in efficiency on that software program, however we even have a mixture of artificial checks for general efficiency notion. Normally we focus extra on the true world checks for evaluation and advice, however the small portion of synthetics assist in sustaining baselines and for those who wish to see them.
Usually we filter 3DMark’s gaming checks into that latter portion of artificial testing. With the identical program model and the identical video drivers, we will see how completely different processors and graphics playing cards scale in mild of the artificial workload, even when the artificial workload is making an attempt to emulate a mean gaming expertise. UL has been fairly clear that the objective of 3DMark’s gaming checks is to do exactly that – emulate actual world efficiency.
Sadly, the commentary across the CPU Profile check is quite unclear. You is likely to be forgiven for considering that the check is designed to showcase the place a processor is likely to be restricted in gaming; after all of the check is shipped alongside a half-dozen different GPU gaming checks and throughout the check itself, we’re handled to some very game-looking imagery.

The arrows on the left look to be boids (300-ish?), however not sure if associated to the simulation in any respect
In apply, it is unclear whether or not the photographs proven on display have something to do with the simulation at hand (whereas UL has responded to a few emails, they haven’t answered this instantly but). We solely see 300 or so boids on display, and but a easy simulation on a single core of a Core i7-6950X can simply do a couple of thousand.
If we go into UL’s press launch for the check, the headline for the web page is ‘New CPU benchmarks for players and overclockers’, the web page describes that it runs a CPU simulation throughout 1,2,4,8,16, max threads. For every of these sub-tests, it additionally provides a quick indication of what the check is helpful for. Right here is our abstract of UL’s press launch on the sub-tests:
- 1 Thread: Uncooked CPU efficiency, however others scores are higher indicators of gaming.
- 2 Threads: Greatest for DX9 video games resembling DOTA2, League, and CS:GO
- Four Threads: Greatest for DX9 video games resembling DOTA2, League, and CS:GO
- Eight Threads: Fashionable DX12 video games, correlates will with 3DMark TimeSpy
- 16 Threads: Computational duties, much less related for gaming
- Max Threads: Full Efficiency, not related for gaming
In gaming workloads, we might usually agree with this. Nevertheless, the underlying workload utilized in CPU Profile shouldn't be a gaming workload. That is the place the confusion kicks in. UL says that its boid simulation is akin to comparable conditions in video games, even to the purpose the place having half utilizing SSSE3 and half utilizing AVX2 is extra akin to recreation engines utilizing completely different optimizations; nevertheless it utterly skips over the truth that in each one in all its sub-tests, the ‘recreation’ is CPU restricted, even at Eight threads, and at 16 threads. That is advantageous for a CPU-speciifc check, however it's naive of how most video games perform on high-end {hardware}.
As talked about above, UL hasn’t said how dense its boid simulation is, nor the way it scales; by AnandTech’s estimates you want at the least 2000+ to saturate a single thread with unoptimized code, so with optimized code scaled throughout Eight threads or 16 threads, we must be 50000 or 100000 flocking objects in a simulation area. For video games that showcase boid flocking environments, most of them are utilizing secondary physics, i.e. unable to be influenced by the character, however those who do have interacting physics, they're unlikely to be simulating on this scale. Furthermore, there’s nothing to say {that a} recreation engine gained’t merely enhance/lower the boids within the simulation based mostly on efficiency.
Orthogonal to all of that is the size of the check. As a result of the check is a hard and fast 150 frames no matter what number of threads are working, it means the most effective processors can churn by means of the max threads in a couple of seconds, whereas the slowest processors take a number of minutes in 1T mode. The dialogue level right here is right down to how every processor induces its Turbo modes.
At varied occasions prior to now decade, Intel and AMD has privately expressed concern for giant max thread workloads that take only some seconds to finish - often max thread workloads require sufficient time for a processor to hit a gradual state frequency, and so finishing inside the turbo window makes the check an unrepresentative metric. Take, for instance, CineBench R20 that may full in 5 seconds with a better common pixels per second than a Cinema4D check that may take a couple of hours. Furthermore, gaming is some time journey of turbo outcomes, and never a hard and fast workload always at turbo. If Intel and AMD have beforehand said that these kinds of in-turbo max thread checks are irrelevant for efficiency comparisons, then the brand new CPU Profile check would fall to the same destiny.
We approached UL with this, together with the concept the CPU Profile simulation ought to be a hard and fast time as a substitute of a hard and fast set of frames, however in the long run UL disagreed. One of many objectives of the check was apparently having a brief check size. They wished the Eight thread consequence to correlate to Time Spy Excessive outcomes, which meant discovering a time that labored whereas additionally being quick was a objective of the undertaking. UL additionally said {that a} 150 mounted body check ends in a hard and fast quantity of labor, and instructed that slower methods will course of much less with mounted time steps - I ought to level out that is irrelevant should you're taking a mean when mounted time steps are in place. Over 150 frames, UL said they may assure a balanced workload throughout all threads (one thing which does not occur in gaming), and past that the consistency of the check would diverge in its outcomes.
Finally, I disagree with a few of UL's selections right here, and discover that a variety of these arguments appear arbitrary at finest – particularly given my very own expertise in constructing our in-house checks resembling 3DPM (which by the way does do mounted time, not mounted compute). This additionally implies that I’m having a tough time correlating what this benchmark is doing to how a consumer can interpret the outcomes for a gaming workload. What UL has accomplished right here is create a CPU benchmark, at the start, and it seems that merely utilizing a simulation mechanic ‘that can be utilized in video games’ is being described as a software to assist determine gaming efficiency. At this stage, with the data I've at hand, I stay unconvinced that the workload is gaming-relevant.
Outcomes
Typical for a UL benchmark, CPU Profile generates a sequence of dimensionless scores. These scores instantly correlate to the underlying benchmark, however they are not a particular measurement in and of themselves. Complicating issues a bit for CPU profile, the benchmark generates half a dozen scores – so except you learn the documentation, the information can come off as glut of numbers which are missing context.

Instance from UL's web site
Having a look at these numbers, UL states on its web site that the outcomes assist showcase the consequence in comparison with others, but in addition the overclocking potential on your processor. It is a trace that this benchmark is definitely higher for overclockers than anybody else, as having six completely different outcomes numbers and 6 completely different suggestions for CPU overclocking doesn’t assist learn how to interpret gaming a lot, particularly given the bar showcasing the rating is kind of small and never providing any extra context.

Outcomes from one in all our CPUs, onerous to see these bars
There’s additionally the matter of presenting the consequence as a rating. All of UL’s checks give a rating on the finish, and as we’ve showcased above the outcomes for this check a calculation of an arbitrary quantity (350000) divided by the typical body time (in milliseconds). The rationale for not giving the outcomes as a uncooked body time is straightforward psychology – larger numbers look higher on graphs and are simpler to interpret. So by dividing a quantity by the typical body time, every part will get a scale. It additionally helps that eradicating the models of the consequence may cut back confusion. The draw back right here is that the preliminary quantity may be very arbitrary.
On the web site, UL calls it a reference worth utilizing ‘a time fixed set to 70 multiplied by a rating fixed set to 5000’, which involves 350000. There aren't any explanations as to why these numbers exist, although we will interpret that 70 meant to be 70 milliseconds, and if a rating achieves 70 milliseconds (be aware you want an Eight core processor to get that) then the ultimate result's 5000 factors. Nearly all processors in all sub-tests will rating underneath this, showcasing that the pivot for the outcomes scaling is definitely increased than most processors will obtain.
With the information, UL might have merely represented the information as a mean body price. For instance, listed here are some outcomes for the Ryzen 7 2700X, an Eight core/16 thread processor, operating at inventory with JEDEC reminiscence. The desk showcases the uncooked common body time, UL’s rating, and a mean body price metric.
| 3DMark CPU Profile AMD Ryzen 7 2700X | |||
| AnandTech | Common Body Time (milliseconds) | 3DMark CPU Rating | Common Frames Per Second |
| 1T | 660.9 ms | 530 | 1.5 fps |
| 2T | 380.Eight ms | 919 | 2.6 fps |
| 4T | 217.Three ms | 1611 | 4.6 fps |
| 8T | 121.5 ms | 2881 | 8.2 fps |
| 16T | 78.6 ms | 4453 | 12.7 fps |
| nT | 78.Zero ms | 4487 | 12.Eight fps |
Observe that in case your recreation is operating at 12 frames per second on a Ryzen 7 2700X, then one thing is ready too excessive anyway.
However as we begin itemizing a number of processors, this information will get extreme and dense very quick.
| 3DMark CPU Profile Outcomes Given as Common FPS | ||||||
| AnandTech | R9 5950X | R9 3950X | R7 2700X | i9 11900Ok | i9 9900KS | |
| 1T | 2.7 | 2.2 | 1.5 | 3.1 | 2.3 | |
| 2T | 5.1 | 4.0 | 2.6 | 6.2 | 4.7 | |
| 4T | 8.4 | 6.4 | 4.6 | 11.7 | 9.2 | |
| 8T | 14.1 | 11.0 | 8.2 | 20.7 | 17.2 | |
| 16T | 22.4 | 19.1 | 12.7 | 24.8 | 20.7 | |
| nT | 31.1 | 28.6 | 12.8 | 24.8 | 20.7 | |
How ought to we order this desk? Ought to it's ordered by 1T outcomes, or by max thread outcomes? If we’re specializing in gaming, maybe we should always order by 2T/4T or 8T as a substitute, which makes the opposite outcomes further information that we’re discarding for being irrelevant or making it too complicated. As is often the case, the draw back to providing multi-dimensional information – on this case, outcomes with a number of portions of threads – is that it turns into a complete lot more durable to current it in a easy method.
Thus far I’ve run the check on 24 processors, from a 64-core Threadripper right down to a twin core Apollo Lake. Slightly than a desk of outcomes, these outcomes are ordered through which processor scores the very best for every of the sub-tests. There’s even a Sandy and Ivy Bridge in there.

The ensuing graph is kind of noisy, particularly because the quickest excessive thread rely processors are usually not the quickest low thread rely processors (and vice versa). Finally a graph like this may look higher with only a few parts on it, resembling right here:

This showcases that the Core i9-11900Ok scores finest on this check, till it hits 16 threads when the additional reminiscence bandwidth of the 3990X takes over. It ought to be famous that Tiger Lake does abysmal on this check, simply behind the R9 3950X in 1T and behind the i3-9100F in max threads, as the ability limits of the cell processor matter greater than the additional threads. I might want to test with a U-series AMD to see what the distinction is right here.
By and enormous once we scale out to extra threads, we see that having a extra full system helps on this check, nevertheless within the single threaded mode, it doesn’t all appear to be about IPC, which is maybe one of many limits of the boid simulation. We are able to truly see the Core i3 carry out higher in 2T/4T in comparison with the Ryzen 9 3950X, maybe on account of cross-thread discuss over the chiplets being extra of a priority.

The benchmark in full: not sure if any of this pertains to what's truly being calculated...
How this all pertains to gaming although is a query left unanswered. It’s a powerful CPU check, and as a simulation of flocking habits, has the best parts for a scientific workload price analyzing. Nevertheless, decoding the efficiency scaling as a perform of gaming efficiency with a CPU-limited workload isn’t actually related right here, I really feel – at the least not with out extra info from UL about how they're decoding this check. We've got been emailing with UL backwards and forwards to know the check, and we're ready to see if any additional info shall be made accessible. The factor is that almost all video games which are CPU restricted, particularly DX9 esports titles, are bottlenecking on draw calls from the processor to the GPU, and are not ready on CPU compute besides in a couple of fractional situations. This makes the CPU Profile check extra for the acute overclockers in that regard, making an attempt to eke out efficiency throughout CPU and reminiscence.
Posting Komentar
Posting Komentar