Blog.Xinxing

Articles about tech, physics, and everything else related to life!

Speed up dotnet core app

Run with scalable VM pool

I have a dotnet core app which I used to run some batch jobs. I have a lot of such jobs. To get things done faster, I rent some powerful virtual machines (VMs) from Microsoft Azure. You can also do it on AWS or Google cloud as you like. These machines have tens of cores and tens of GB of memory as well. So definitely they should run faster than on my laptop. On the other hand, the VM pool scales out if there are jobs pending so that more worker VMs are working in parallel, while it stops all VMs if there is no job there so I only pay for the time period I really run jobs.

Parallelize dotnet core jobs

Sounds like a perfect plan, right? The result turns out to be not really satisfying: one job, which cost 19 hours on my laptop, cost 10 hours on the VM. Soon I found the issue: my CPU utilization is low! Just like that we can parallelize jobs in more machines, within one VM we can parallelize them in more processes, probably by calling the application multiple times within a script. We can also parallelize the jobs in more threads within one dotnet core app process. For me I chose the latter approach as it is easier to manage job data within C# code. This time the job only took 78 minutes! A significant improvement, right? Let’s take a look at the CPU utilization:
CPU utilization before setting CPU group flags for dotnet core app
CPU utilization before setting the flags

Use all CPU cores

After parallelizing the jobs with up to 100 threads, triple the number of cores, I would like to consume all CPU cores. But sadly I only used less than 20%. With a scalable VM pool like this, I was wasting 80% of my money and time! With a bit of online investigation I found the root cause: Windows servers groups CPU cores into CPU groups, while a dotnet core app will only use one CPU group by default! To fully utilize the CPU, set the following variables:
set COMPlus_Thread_UseAllCpuGroups=1
set COMPlus_GCCpuGroup=1
set COMPlus_gcServer=1
With this trick my CPU utilization grew up to ~90%:
CPU utilization after setting CPU group flags for dotnet core app
CPU utilization after setting the flags
With no surprise, it took only 22 minutes to run the job!

Summary

By parallelizing job execution and setting proper dotnet core flags, I reduced the time cost of a job from 19 hours to 22 minutes, with the help of a scalable, pay as you go, VM pool. Tagged , , , , , , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *