Increased Thread Limit for Compile Tools

From Valve Developer Community
Jump to: navigation, search

The maximum amount of threads used by both VVIS and VRAD is limited to 16 as they are both implementing the same class; threads.cpp. This limitation has been part of the Source SDK since 2006 and derives from the processors common for that era. If you run an older version of Source (2013 or older), you can compile these tools with a new limit yourself as the source code is available. If you want to adapt the tools for a newer version, you will need to patch them.

Warning:Spawning more threads will not necessarily decrease your compile time, these dlls are meant to be used for machines that have more than 8 physical cores!
Note:Patching dlls and exes is a grey zone of software licensing. These dlls were only patched to improve the reliant tools and to serve the Source developers interests. This is not a 'crack'!

Patched DLLs for CS:GO SDK

If you require more threads for compiling CS:GO maps, then you can download the patched dlls below. Please take notice of the disclaimer and back up your original vvis_dll.dll and vrad_dll.dll before you replace them:

Mirror: CM2.Network NOTE: Outdated!

Replace your vvis_dll.dll and vrad_dll.dll in SteamApps\common\Counter-Strike Global Offensive\bin. The inner workings are explained in the section below, if one is interested.

Patching the DLLs

Figure 1
Figure 2
Note:You will require an advanced knowledge of assembly and how memory works to perform these steps. This is not step-by-step.

The maximum amount of threads used by both VVIS and VRAD is limited to 16 using preprocessor directives.

threads.h

#define MAX_TOOL_THREADS 16

threads.cpp

#include "threads.h"
#define	MAX_THREADS 16

CRunThreadsData g_RunThreadsData[MAX_THREADS];
HANDLE g_ThreadHandles[MAX_THREADS];

if ( numthreads > MAX_TOOL_THREADS )
    numthreads = MAX_TOOL_THREADS;

This means we will need to change 3 things:

  1. Find and replace the if statement to check for > 32 instead of > 16 (easy)
  2. Increase the size of the .data memory segment to make room for a bigger g_RunThreadsData array.
  3. Replace all calls to point towards the new memory addresses.

We can use the previous memory address of g_RunThreadsData (16 * (12 + 4) bytes) for g_ThreadHandles. Start off by increasing the .data segment size with CFF Explorer (luckily there is some space between .data and _RDATA).

Now launch IDA Free/Pro and find all cross-references to 'CreateThread'. Search for the subroutine that looks like Figure 1 (g_RunThreadsData & g_ThreadHandles will not be named that way).

Leave the graph mode (Space) and enable op codes. The line marked in Figure 2 is our register used for the if statement from the C++ code, using your favorite Hex editor find and replace the BE 10 00 00 00 (Warning: NOT unique) with BE 20 00 00 00 (0x20 = 32 decimal). Congrats, we have now completed our first task.

Now jump to the memory address provided by g_RunThreadsData (doubleclick ukn_XXXXXXX) copy and save the memory address (for example .data:12461E50). Go to the end of the old .data memory segment (you should have increased the size of .data with CFF Explorer some steps before) and copy that memory address. Find all cross-references to 'g_RunThreadsData' (ukn_XXXXXXX) and replace all calls to point to the new memory address using the Hex editor and the op codes.

Replace each call to 'g_ThreadHandles' you can find via cross-references with the old memory address of g_RunThreadsData. After this you are done.

Note:Make sure to convert the addresses to little endian when applying them with the Hex editor.

Performance Results

32 vCore machine running VVIS with all cores used (patched vvis used)
Figure 3: Average CPU usage with 32 cores & 32 threads
Note:The tests were conducted with an unoptimised .vmf file to handle the worst case. Also the results show only VVIS.

Servers used

32 vCores (1.8GHz), 224GB Ram, 500GB SSD, CoreOS (32 threads, DigitalOcean)

4 physical Cores (3.40 GHz), 32GB Ram, 3TB HDD, Debian (8 threads, Hetzner)

Results

The tests concluded that increasing the thread count above 16 will decrease the average CPU usage to 70% (Figure 3) and the overall compile time by only about 4.5%. This can be traced back to the fact that in the end only a couple of vCores remain working on difficult world units (WU) and the other ones being idle (Figure 3). Additional possible performance influences: wine (to run VVIS on Linux) and VPS (multiple vCores splitting the same physical core).

  • 32 vCores (32 threads, ~70% Usage): 43 minutes, 10 seconds elapsed
  • 32 vCores (32 threads, 16 vCores @ ~100% Usage (unpatched VVIS)): 45 minutes, 23 seconds elapsed
  • 4 physical Cores (8 threads, ~100% Usage): 52 minutes, 53 seconds elapsed

Additional Results

  • 4 physical cores (3.4GHz) (8 threads, ~100% Usage) + 4x4 vCores (VMPI): 47 minutes, 1 second elapsed
  • 16 vCores (High Memory) (16 threads, ~90% Usage): 55 minutes, 11 seconds elapsed
  • 16 vCores (Standard) (16 threads, ~80% Usage): 1 hour, 9 minutes, 42 seconds elapsed
  • 10 x 4 vCores (Standard) (VMPI, ~100% Usage): 1 hour, 12 minutes, 3 seconds elapsed
  • 8 vCores (Standard) (8 threads, ~90% Usage): 1 hour, 45 minutes, 19 seconds elapsed
  • 4 vCores (Standard) (4 threads, ~100% Usage): 1 hour, 52 minutes, 10 seconds elapsed

Conclusion

Throwing more cores at the problem does not help. Optimize your maps before compiling!

Disclaimer

Theses dlls are based upon their respective dlls from the CS:GO SDK (vvis_dll.dll & vrad_dll.dll), which we all know and trust not to destroy our PCs. That said, use these at your own risk. No responsibility is claimed for damage to your computer or games that may occur while using these patched dlls.