Optimizing DLLs
January 2024
The Source SDK Visual Studios Projects properties and options are acceptable for many mods, but it can be made better and faster with some small changes to utilize Microsoft's optimizing C++ compiler. These changes produce code better tuned for newer processors without the need to re-write anything.
These changes are implemented by added extra switches and options to the property pages of your client and server projects. You can find this by selecting Project then Properties from the menu in the Visual C++ 2003 IDE.
Contents
Options
In the project properties page of your client and server, change these settings to optimize your DLL files.
Disable debugging
Set the compiler to use the release configuration instead of the debug configuration. This will apply several stock options. Most notably, it will define the NDEBUG macro instead of the DEBUG macro, allowing the preprocessor to remove various debugging aids, such as range checks and assertions, that consumes time.
Another thing this does is that it switches the runtime from the debugging version to the optimized release version.
Function inlining
The compiler normally implements function calls by adding code that saves things that the call could destroy and then a branch instruction to the code of the called function.
By inlining you are telling the compiler to instead place a copy of the called function in place. This has several advantages. Most notably, trivial functions (like many in the c++ standard library) can be much cheaper than the code that deals with the function call itself. By just copying the function code to the spot you can skip that overhead.
Note that the resulting code may be slightly larger. But it should run faster.
Loop unrolling
Loops allow you to do the same thing over and over again, but they are not free. Each time the loop body ends there has to be code that deals with the loop semantics. That code has a slight overhead.
Loop unrolling aims to trade space for that overhead. The compiler will simply replace the loop with the loop body repeated the requested number of times.
Of course, this makes the resulting code larger, but that is the trade off for not having the overhead of the loop semantics.
Target a modern CPU version
- Optimization -> Optimize for Processor - set to Pentium 4 and Above(/G7)
The G7 flag produces code which is optimized for Pentium 4 processors and above which, in some cases can lead to a 10% improvement in execution speed. The code will still run on older processors, just not as fast. - Code Generation -> Enable Enhanced Instruction Set - set to Streaming SIMD Extensions (/arch:SSE)
The arch flag enables the use of instructions found on processors that support enhanced instruction sets e.g., the SSE and SSE2 extensions of Intel 32-bit processors.Note:Be careful with this setting as it will prevent the code running on processors which don't support these extensions
You can enable SSE (Streaming SIMD Extensions) optimizations due to the minimum requirements of HL2 being a 1.2 GHz CPU, that defaults to a Pentium 4 or AMD Athlon processor that supports this technology.
Linker
- Optimization -> Enabled COMDAT Folding - set to Remove Redundant COMDATs (/OPT:ICF)
/OPT:ICF removes redundant COMDAT symbols from the linker output. - Optimization -> Optimize for Windows98 - set to No (/OPT:NOWIN98)
/OPT:NOWIN98 controls the section alignment in the final image. You should use this switch when building DLL's only for Windows NT, Windows 2000, Windows XP or later.
Whole Program Optimization / Link time code generation
- Whole Program Optimization - set to Yes.
Specifies that the program will be optimized across .obj boundaries.
Using whole program optimization means that each build will take longer, since the compiler effectively runs on all the files at the same time each build, but the compiler will on the other hand be able to optimize across compilation units, resulting in better performing output.
Profile guided optimization (PGO)
The compiler normally has to guess a lot about how the code will run as it compiles. By using PGO you run the program and capture actual usage statistics that you can feed back to the compiler, allowing it to know reality instead of having to guess. This can allow the compiler to drop hints for the CPU on what is likely and what is unlikely, letting the CPU use it's own optimizations more efficiently.
This is difficult to automatize, since it involves actually playing the game and ensuring that various parts of the code actually gets executed.