

Windows 10 poor performance compared to Windows 7 (page fault handling is not sc...
source link: https://stackoverflow.com/questions/45024029/windows-10-poor-performance-compared-to-windows-7-page-fault-handling-is-not-sc
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

We set up two identical HP Z840 Workstations with the following specs
- 2 x Xeon E5-2690 v4 @ 2.60GHz (Turbo Boost ON, HT OFF, total 28 logical CPUs)
- 32GB DDR4 2400 Memory, Quad-channel
and installed Windows 7 SP1 (x64) and Windows 10 Creators Update (x64) on each.
Then we ran a small memory benchmark (code below, built with VS2015 Update 3, 64-bit architecture) which performs memory allocation-fill-free simultaneously from multiple threads.
#include <Windows.h>
#include <vector>
#include <ppl.h>
unsigned __int64 ZQueryPerformanceCounter()
{
unsigned __int64 c;
::QueryPerformanceCounter((LARGE_INTEGER *)&c);
return c;
}
unsigned __int64 ZQueryPerformanceFrequency()
{
unsigned __int64 c;
::QueryPerformanceFrequency((LARGE_INTEGER *)&c);
return c;
}
class CZPerfCounter {
public:
CZPerfCounter() : m_st(ZQueryPerformanceCounter()) {};
void reset() { m_st = ZQueryPerformanceCounter(); };
unsigned __int64 elapsedCount() { return ZQueryPerformanceCounter() - m_st; };
unsigned long elapsedMS() { return (unsigned long)(elapsedCount() * 1000 / m_freq); };
unsigned long elapsedMicroSec() { return (unsigned long)(elapsedCount() * 1000 * 1000 / m_freq); };
static unsigned __int64 frequency() { return m_freq; };
private:
unsigned __int64 m_st;
static unsigned __int64 m_freq;
};
unsigned __int64 CZPerfCounter::m_freq = ZQueryPerformanceFrequency();
int main(int argc, char ** argv)
{
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
int ncpu = sysinfo.dwNumberOfProcessors;
if (argc == 2) {
ncpu = atoi(argv[1]);
}
{
printf("No of threads %d\n", ncpu);
try {
concurrency::Scheduler::ResetDefaultSchedulerPolicy();
int min_threads = 1;
int max_threads = ncpu;
concurrency::SchedulerPolicy policy
(2 // two entries of policy settings
, concurrency::MinConcurrency, min_threads
, concurrency::MaxConcurrency, max_threads
);
concurrency::Scheduler::SetDefaultSchedulerPolicy(policy);
}
catch (concurrency::default_scheduler_exists &) {
printf("Cannot set concurrency runtime scheduler policy (Default scheduler already exists).\n");
}
static int cnt = 100;
static int num_fills = 1;
CZPerfCounter pcTotal;
// malloc/free
printf("malloc/free\n");
{
CZPerfCounter pc;
for (int i = 1 * 1024 * 1024; i <= 8 * 1024 * 1024; i *= 2) {
concurrency::parallel_for(0, 50, [i](size_t x) {
std::vector<void *> ptrs;
ptrs.reserve(cnt);
for (int n = 0; n < cnt; n++) {
auto p = malloc(i);
ptrs.emplace_back(p);
}
for (int x = 0; x < num_fills; x++) {
for (auto p : ptrs) {
memset(p, num_fills, i);
}
}
for (auto p : ptrs) {
free(p);
}
});
printf("size %4d MB, elapsed %8.2f s, \n", i / (1024 * 1024), pc.elapsedMS() / 1000.0);
pc.reset();
}
}
printf("\n");
printf("Total %6.2f s\n", pcTotal.elapsedMS() / 1000.0);
}
return 0;
}
Surprisingly, the result is very bad in Windows 10 CU compared to Windows 7. I plotted the result below for 1MB chunk size and 8MB chunk size, varying the number of threads from 2,4,.., up to 28. While Windows 7 gave slightly worse performance when we increased the number of threads, Windows 10 gave much worse scalability.
We have tried to make sure all Windows update is applied, update drivers, tweak BIOS settings, without success. We also ran the same benchmark on several other hardware platforms, and all gave similar curve for Windows 10. So it seems to be a problem of Windows 10.
Does anyone have similar experience, or maybe know-how about this (maybe we missed something ?). This behavior has made our multithreaded application got significant performance hit.
*** EDITED
Using https://github.com/google/UIforETW (thanks to Bruce Dawson) to analyze the benchmark, we found that most of the time is spent inside kernels KiPageFault. Digging further down the call tree, all leads to ExpWaitForSpinLockExclusiveAndAcquire. Seems that the lock contention is causing this issue.
*** EDITED
Collected Server 2012 R2 data on the same hardware. Server 2012 R2 is also worse than Win7, but still a lot better than Win10 CU.
*** EDITED
It happens in Server 2016 as well. I added the tag windows-server-2016.
*** EDITED
Using info from @Ext3h, I modified the benchmark to use VirtualAlloc and VirtualLock. I can confirmed significant improvement compared to when VirtualLock is not used. Overall Win10 is still 30% to 40% slower than Win7 when both using VirtualAlloc and VirtualLock.
Recommend
-
15
Aug 1, 2017 Another Attempt At Speculative Page Fault Handling https://lwn.net/Articles/730531/ Speculative page-fault handling(投机性缺页异常处理)的又一次尝试 大家都知道避免缺页异常带来的性能损耗最好的办法是避...
-
12
Degrees and certs are poor indicators of performance I just love certifications and degrees. They make excellent kindling. Back when I was working for a school district, one of the "tech people" at a certain elementary sch...
-
9
Topaz – Transient Fault Handling Application Block Here’s a neat little NuGet package. The Transient Fault Handling Application Block for Windows...
-
10
Common Issues with Poor Performance in Ionic ApplicationsOne of the services I often perform is to spend a few hours doing a high-level code review of an Ionic codebase, and then writing up any advice or recommendations I have based on what I...
-
12
This article offers a few tips on how to improve your website performance and diagnose why it may not be running as efficiently as possible. This optimization article is for any websites that were custom-built using HTML/C...
-
9
Fault handling in F# with Polly In this week's post, Matt takes a look at Polly, the open-source .NET library for resilience and handling transient faults. By
-
4
Reading Time: 4 minutesIn the last blog, we discussed some basics of Apigee. If you haven’t created your proxies yet then, read the blog Apigee101: Constru...
-
12
Question Poor performance, please help ...
-
6
Nic Lin's Blog喜歡在地上滾的工程師最近研究一個在 Postgres 奇怪的效能問題。情況是這樣的,我有一個通知系統 (Notifications) 的 table,也有對常用的搜尋打 index...
-
14
xamarin forms Xamarin.Forms/MAUI CollectionView performance issue: it...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK