This article relates to the game "Dota 2". Click here for more information.
This article relates to the SDK/Workshop Tools for "Dota 2 Workshop Tools". Click here for more information.
This article's documentation is for Source 2. Click here for more information.

Particle Performance

From Valve Developer Community
Jump to: navigation, search
English (en)中文 (zh)
... Icon-Important.png

Max Particles

By default each system is set to 1004 max particles. This much memory is allocated for the system regardless of whether it's used or not. So after setting up your system, you'll want to take a look at the counter to see how many particles you're using and set the max particles to this number. This will help to keep the memory usage to a sane amount.

Threading

Particles are multithreaded by system. So if you have multiple systems, they'll be distributed over as many threads as are available. So on its face, it's good to split complex systems up into multiple systems to take advantage of this feature. That said, there's overhead to each system, so there's a limited benefit to the usefulness of this approach, especially on simpler systems. Splitting up a system that only has a few dozen particles will almost certainly be a net loss. However, if you system has in the thousand+ range of particles, it's worth treating it as multiple lower count particle systems which can be multi-threaded. A system and it's children are always on the same thread because parents and children can pass data to each other and as such need to be be grouped together.

SIMD

Right now, most particle operators, Initializers, etc. work in SIMD. This means on current hardware they're generally doing all math on groups of four particles simultaneously. Theoretically in the future this will scale up to wider numbers on different kinds of hardware. The point here is that if you stick to multiples of four, you'll be making better use of the system. Scaling a system down to the closest multiple of four will make it slightly more optimal. This isn't a huge gain, but it's something to be aware of.

Sharing data

There are a variety of ways to share data between parent and children systems. Some of the more complex Initializers or operators can minimize their performance impact by writing out their results to a control point, which can then be read by children and use the same data without having to do any of the work.

Collision

In general, collision is an expensive operation. The default mode 0 does traces for each particle every frame. This is expensive and doesn't scale well. However, the Collision constraint allows for a few different collision modes which allow for much faster collisions that trade off accuracy for speed. Collision Mode 3 is the best accuracy/performance tradeoff for dynamic collisions with particles in unpredictable locations/movements.