Home » AMD and Intel will gain between 44% and 82% performance under Linux

AMD and Intel will gain between 44% and 82% performance under Linux

A new patch for Linux and a very specific improvement that will also affect Intel and AMD, but not in terms of performance. And it is that just as happened in some way with Windows 10 and Windows 11, the task scheduler is continuously under development and optimization where the last has to do with NUMA in the Linux kernel 5.20, with which amd and Intel they will gain in performance.

Core task affinity has a lot to do with the ultimate performance of a processor, especially when the loads are balanced and as complicated as a server. So with AMD and its change and turn to NUMA Modifying performance is relatively easy for better or for worse.

Linux kernel 5.20 optimization and improvements on AMD and Intel


The modification that the kernel of this OS as a whole will undergo affects the find_idlest_group() functionso systems that contain more than one socket will benefit more from the changes commented by AMD engineer Prateek Nayak:

For systems containing multiple LLCs per socket, such as AMD Zen systems, users want to distribute bandwidth-intensive applications across multiple LLCs. “Stream” is one of those representative workloads where you get the best performance by limiting one stream thread per LLC. To ensure this, users have been known to pin tasks to a specific subset of processor(s) consisting of one processor per LLC when performing such bandwidth-intensive tasks.

Ideally, we would prefer each stream thread to run on a different processor from the allowed list of processors. However, current heuristics in find_idlest_group() does not allow this on initial placement.

For example, once the first four threads are distributed among the allowed CPUs on socket one, the rest of the threads start accumulating on those same CPUs when there are clearly usable CPUs on the second socket.

After the initial buildup on a small amount of CPU, although the load balancer eventually kicks in, it takes a while to come to equilibrium {4}{4} and even {4}{4} may not be stable, since we see a lot of “ping pong effect” between {4}{4} and {5}{3} and vice versa before a stable state is reached much later (1 streaming thread per CPU allowed) and therefore no further migration is needed.

We can detect this backlog and avoid it by checking if the number of CPUs allowed in the local pool is less than the number of tasks running in the local pool and use this information to send the fifth task on the next socket (after all, the the goal in this slow path is to find the most idle pool and the most idle processor during the initial location!).

What Nayak means is that until now the allocation was done on the first processor in socket 1 and when there were no more threads available it would go to the processor in socket 2 to allocate the resources . This reduces latency, but logically reduces performance as we will see below.

Intel and AMD will get a significant performance boost


If we use the specific Stream memory benchmark, the patch manages to increase the average performance by more than one 40% in processors AMD Zen. on servers intel xeon improvement is nothing less than between 54% and 82% in the best case, which this time corresponds to the copy of the data.


If we take into account the few lines of code necessary for this change, we see that the improvement is brutal, so as soon as the kernel 5.20 To be ready by the end of this summer, system administrators will need to implement these improvements and verify the performance improvements and their balances.

About the author


Add Comment

Click here to post a comment

Your email address will not be published.