User:Braindawg/performance

From Valve Developer Community
< User:Braindawg
Revision as of 12:02, 22 October 2025 by Braindawg (talk | contribs) (→‎Benchmark)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page includes tips and tricks for optimizing VScript performance. All of these performance tests were done in Team Fortress 2 and many can be used in other Source 2013-based titles. Your mileage may vary in VScript supported games prior to the SDK update (Left 4 Dead 2Portal 2Alien Swarm).

Benchmark figures come from this benchmarking tool.

Warning.pngWarning:Only optimize your scripts if you need to! Some of these tips may introduce extra unnecessary complexity to your projects. Premature optimization without knowing where your performance issues actually come from is extremely ill-advised.
Note.pngNote:The built-in performance counter for VScript has a lot of "noise", and depends heavily on other things executing on the main game server thread. Memory speed, CPU speed, And even file I/O, will greatly impact your results and the variance between them, even on repeated runs of the same code. The numbers shown here are averages and "ballpark" figures taken from 5 or more repeated runs. A third-party benchmarking tool does exist, however there is unfortunately no convenient download link for it. Look around mapping/content creation discord servers for it.

Folding

Functions

Folding functions in the context of VScript means folding them into the root table. This only needs to be done once on script load, and is recommended for functions that are commonly used.

Benchmark

Note.pngNote:Benchmark done on mvm_bigrock
/***********************************************************************************************************
 * FOLDING:                                                                                                *
 * Folding functions from their original scope into local/root scope is noticeably faster (~15-30%)        *
 * skips extra lookup instructions, also less verbose                                                      *
 ***********************************************************************************************************/
local GetPropString = NetProps.GetPropString.bindenv( NetProps )
local GetPropBool = NetProps.GetPropBool.bindenv( NetProps )
const MAX_EDICTS = 2048

function Benchmark::Unfolded() {

    for ( local i = 0, ent; i < Constants.Server.MAX_EDICTS; ent = EntIndexToHScript( i ), i++ ) {

        if ( ent ) {

            NetProps.GetPropString( ent, "m_iName" )
            NetProps.GetPropString( ent, "m_iClassname" )
            NetProps.GetPropBool( ent, "m_bForcePurgeFixedupStrings" )
        }
    }
}

// 20% faster, maybe more
function Benchmark::Folded() {

    for ( local i = 0, ent; i < MAX_EDICTS; ent = EntIndexToHScript( i ), i++ ) {

        if ( ent ) {

            GetPropString( ent, "m_iName" )
            GetPropString( ent, "m_iClassname" )
            GetPropBool( ent, "m_bForcePurgeFixedupStrings" )
        }
    }
}

Result:

Configuration Results
Unfolded 1.76ms
Folded 1.32ms

Constants

Similar to folding functions, folding pre-defined Constant values into the constant table (or the root table) increases performance significantly.

Benchmark

local _CONST = getconsttable()

// fold every pre-defined constant into the const table
if ( !( "ConstantNamingConvention" in ROOT ) )
	foreach( a, b in Constants )
		foreach( k, v in b )
            _CONST[k] <- v != null ? v : 0

setconsttable(_CONST)

function Benchmark::UnfoldedConst() {

    for (local i = 1; i <= Constants.Server.MAX_EDICTS; i++)
        local temp = i
}

function Benchmark::FoldedConst() {

    for (local i = 1; i <= MAX_EDICTS; i++)
        local temp = i
}

Result:

Configuration Results
Unfolded 0.356ms
Folded 0.033ms

Root table vs Constant table

Unlike values inserted into the root table, values inserted into the constant table are cached at the pre-processor level. What this means is, if you intend to use constants (const keyword or getconsttable().foo <- "bar"), you must do this before any other scripts are executed, otherwise your script will not be able to read them.

Benchmark

::SomeGlobalVar <- 0x7FFFFFFF
const GLOBAL_VAR = 0x7FFFFFFF

function Benchmark::RootSetLookup() {

    for (local i = 1; i <= 10000; i++)
        local temp = ::SomeGlobalVar
}

// ~20-40% faster
function Benchmark::ConstSetLookup() {

    for (local i = 1; i <= 10000; i++)
        local temp = GLOBAL_VAR
}

Result:

Configuration Results
Root 0.267ms
Const 0.154ms

Strings

Formatting

Squirrel supports two main ways to format strings: Concatenation using the + symbol, and the format() function.

For large amounts of formatting, format() is significantly faster than concatenation. For < 3 concatenations however, format() is slower.

Tip.pngTip:For formatting entity handles and functions, use .tostring() and format it as a string

ToKVString

the TOKVString() VScript function takes a Vector/QAngle and formats the values into a string. For example, Vector(0, 0, 0).ToKVString() returns "0 0 0"

On top of being less cumbersome to write, ToKVString() is marginally faster than format().

However, when formatting multiple ToKVString() outputs into a new string, concatenation is faster due to less function calls.

Benchmark

local mins = Vector(-1, -2, -3)
local maxs = Vector(1, 2, 3)
local kvstring = ""

function Benchmark::StringConcat() {

    for ( local i = 0; i < 10000; i++ )
        kvstring = mins.x + "," + mins.y + "," + mins.z + "," + maxs.x + "," + maxs.y + "," + maxs.z
}

function Benchmark::StringFormat() {

    for ( local i = 0; i < 10000; i++ )
        kvstring = format("%g,%g,%g,%g,%g,%g", mins.x, mins.y, mins.z, maxs.x, maxs.y, maxs.z)
}

function Benchmark::KVStringFormat() {

    for (local i = 0; i < 10000; i++ )
        kvstring = format("%s %s", mins.ToKVString(), maxs.ToKVString())
}

function Benchmark::KVStringConcat() {

    for (local i = 0; i < 10000; i++ )
        kvstring = mins.ToKVString() + " " + maxs.ToKVString()
}

Result:

Configuration Results
StringConcat 35.0847ms
StringFormat 23.0143ms
KVStringFormat 19.9377ms
KVStringConcat 18.3142ms

Character Comparisons

Strings in squirrel, like many C-style languages, are just an array of characters, and characters are just integers in disguise (the ascii code). This means for simple comparisons (e.g. only checking the first character for chat commands) you can directly look up the index in the string to get the character, then do very simple (and fast) integer comparisons, rather than unnecessary function calls and string comparisons. Note that characters are represented with single quotes ('a') rather than double quotes.

Benchmark

local map_name = GetMapName()

function Benchmark::StartsWith() {

    for ( local i = 0; i < 10000; i++ )
        if ( startswith( map_name, "workshop/" ) )
            local test = true
}

function Benchmark::CharCompare() {
    
    // workshop loaded maps all have the "workshop/" prefix, meaning '/' is always the 9th character
    // arrays are 0-indexed, so the 9th character would be map_name[8] (9 - 1)
    for ( local i = 0; i < 10000; i++ )
        if ( 8 in map_name && map_name[8] == '/' )
            local test = true
}

Result:

Configuration Results
StartsWith 1.87ms
CharCompare 0.492ms

Spawning Entities

in VScript, there are four common ways to spawn entities:

  1. CreateByClassname + DispatchSpawn
  2. SpawnEntityFromTable
  3. SpawnEntityGroupFromTable
  4. point_script_template entity + AddTemplate

CreateByClassname + DispatchSpawn vs SpawnEntityFromTable

In general, performance is not a major concern when spawning entities. In special circumstances though, you may need to spawn and kill a temporary entity in an already expensive function. A notable example of an entity that would need this is trigger_stun. This entity will not attempt to re-stun the same player multiple times, so it is not possible to spawn a single entity and repeatedly fire StartTouch/EndTouch on the same target.

In situations like this, CreateByClassname + DispatchSpawn is roughly 4x faster in comparison to SpawnEntityFromTable.

Benchmark

local CreateByClassname = Entities.CreateByClassname.bindenv( Entities )
local SetPropBool = NetProps.SetPropBool.bindenv( NetProps )
local SetPropString = NetProps.SetPropString.bindenv( NetProps )
local DispatchSpawn = Entities.DispatchSpawn.bindenv( Entities )

// anywhere from 15-30% faster for single entity spawning
// The table passed to SpawnEntityFromTable needs to be interpreted and converted to something C++ can understand
// meanwhile CreateByClassname/netprop/keyvaluefromstring are simple 1:1 C++ bindings
function Benchmark::ByClassname() {

    for (local i = 0; i < 100; i++) {

        local ent = CreateByClassname( "logic_relay" )
        DispatchSpawn( ent )
        SetPropString( ent, "m_iName", "__relay" )
    }
}

function Benchmark::FromTable() {

    for (local i = 0; i < 100; i++) {

        SpawnEntityFromTable( "logic_relay", { targetname = "__relay" } )
    }
}

Result:

Configuration Results
FromTable 0.0428ms
ByClassname 0.0156ms

SpawnEntityGroupFromTable vs point_script_template

When spawning multiple entities at the same time, it is more efficient to use either SpawnEntityGroupFromTable or a point_script_template entity. These options also have the added benefit of respecting parent hierarchy, so the parentname keyvalue works as intended.

point_script_template is both more flexible and faster. SpawnEntityGroupFromTable has several major limitations in comparison to point_script_template, and is generally not recommended. See the VScript documentation for more details on how to use point_script_template.

Benchmark

function Benchmark::EntityGroupFromTable() {

    // spawn origins are right outside of bigrock spawn
    SpawnEntityGroupFromTable({
        [0] = {
            func_rotating =
            {
                message = "hl1/ambience/labdrone2.wav",
                volume = 8,
                responsecontext = "-1 -1 -1 1 1 1",
                targetname = "crystal_spin",
                vscripts = "rotatefix", // see func_rotating vdc page for this
                spawnflags = 65,
                solidbsp = 0,
                rendermode = 10,
                rendercolor = "255 255 255",
                renderamt = 255,
                maxspeed = 48,
                fanfriction = 20,
                origin = Vector(278.900513, -2033.692993, 516.067200),
            }
        },
        [2] = {
            tf_glow =
            {
                targetname = "crystalglow",
                parentname = "crystal",
                target = "crystal",
                Mode = 2,
                origin = Vector(278.900513, -2033.692993, 516.067200),
                GlowColor = "0 78 255 255"
            }
        },
        [3] = {
            prop_dynamic =
            {
                targetname = "crystal",
                solid = 6,
                renderfx = 15,
                rendercolor = "255 255 255",
                renderamt = 255,
                physdamagescale = 1.0,
                parentname = "crystal_spin",
                modelscale = 1.3,
                model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
                MinAnimTime = 5,
                MaxAnimTime = 10,
                fadescale = 1.0,
                fademindist = -1.0,
                origin = Vector(278.900513, -2033.692993, 516.067200),
                angles = QAngle(45, 0, 0)
            }
        },
    })
}

// ~15-25% faster for batch entity spawning
function Benchmark::PointScriptTemplate() {

    local script_template = Entities.CreateByClassname("point_script_template")

    script_template.AddTemplate("func_rotating", {
        message = "hl1/ambience/labdrone2.wav",
        volume = 8,
        targetname = "crystal_spin2",
        spawnflags = 65,
        solidbsp = 0,
        rendermode = 10,
        rendercolor = "255 255 255",
        vscripts = "rotatefix",
        renderamt = 255,
        maxspeed = 48,
        fanfriction = 20,
        origin = Vector(175.907211, -2188.908691, 516.031311),
    })

    script_template.AddTemplate("tf_glow", {
            target = "crystal2",
            Mode = 2,
            origin = Vector(175.907211, -2188.908691, 516.031311),
            GlowColor = "0 78 255 255"
    })

    script_template.AddTemplate("prop_dynamic", {
        targetname = "crystal2",
        solid = 6,
        renderfx = 15,
        rendercolor = "255 255 255",
        renderamt = 255,
        physdamagescale = 1.0,
        parentname = "crystal_spin2",
        modelscale = 1.3,
        model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
        MinAnimTime = 5,
        MaxAnimTime = 10,
        fadescale = 1.0,
        fademindist = -1.0,
        origin = Vector(175.907211, -2188.908691, 516.031311),
        angles = QAngle(45, 0, 0)
    })

    script_template.AcceptInput( "ForceSpawn", null, null, null )
}

Result:

Configuration Results
SpawnEntityGroupFromTable 0.72ms
PointScriptTemplate 0.61ms

Iterating through players

When iterating over all players in the map, it is generally not recommended to use FindByClassname on the player entity in high playercount environments (>8-12 players). Iterating over the first MaxClients number of entindexes and grabbing the player from PlayerInstanceFromIndex(i) is notably faster and not much more complex to write in these circumstances.

The performance of player iteration depends heavily on how many players are actively in the server. In low playercount environments, the PlayerInstanceFromIndex approach is slower due to extra unnecessary iterations. In high playercount environments, `FindByClassname` runs a more expensive loop on every entity in the map to find players.

If you want the fastest option at the cost of complexity, you should collect player entities in your own global table or array in an event such as player_team or player_activate, remove them on player_disconnect, then iterate over that when necessary. Using a table gives you the added bonus of having a cache of player user IDs, which is faster to look up compared to reading the player_manager netprop.

Warning.pngWarning:player_activate does not fire for tfbots!

Benchmark

The first script must be executed before the second one!

Todo: update benchmarks
::ALL_PLAYERS <- {}
::Events <- {
    function OnGameEvent_player_team(params)
    {
        local player = GetPlayerFromUserID(params.userid)
        
        if ( player in ALL_PLAYERS ) return
        ALL_PLAYERS[ player ] <- params.userid
 
    }
    
    function OnGameEvent_player_disconnect(params) 
    {
        local player = GetPlayerFromUserID(params.userid)
    
        if ( !(player in ALL_PLAYERS) ) return

        delete ALL_PLAYERS[ player ]
    }
}
__CollectGameEventCallbacks(Events)
::maxClients <- MaxClients().tointeger()

for (local player; player = Entities.FindByClassname(player, "player");)
{
    printl(player)
}

for (local i = 1; i <= maxClients; i++)
{
    local player = PlayerInstanceFromIndex(i)
    
    if (!player) continue

    printl(player)
}

foreach(player in ALL_PLAYERS.keys())
{
    printl(player)
}

Result:

Configuration Results
FindByClassname 0.1289ms
Index iteration 0.0856ms
Array/Table iteration 0.0679ms

Squirrel Performance Tips

Arrays and Tables

Arrays in squirrel are, in practice, tables where the index is an integer value.

The .len() function call is relatively expensive. We can avoid this overhead by directly checking the index.

/*****************
 * LENGTH CHECKS *
 *****************/
function Benchmark::Len() {

    for ( local i = 0; i < 1000; i++ )
        if ( arr.len() == 1000 )
            local len = true
}

// ~40% faster, no _OP_PREPCALLK/_OP_CALL instructions
function Benchmark::Idx() {

    for ( local i = 0; i < 1000; i++ )
        if ( 999 in arr && !(1000 in arr) )
            local len = true
}

Additionally, the integer 0 will return the value false in squirrel. For specifically checking an empty array, this falsy evaluation is slightly faster than directly checking if length equals 0

/****************************
 * EMPTY ARRAY/TABLE CHECKS *
 ****************************/
function Benchmark::LenExplicit() {

    for ( local i = 0; i < 1000; i++ )
        if ( arr.len() != 0 )
            local len = true
}

// ~2-5% faster, no _OP_NE instruction
function Benchmark::LenFalsy() {
    
    for ( local i = 0; i < 1000; i++ )
        if ( arr.len() )
            local len = true
}

Tables

As shown above, we can circumvent the performance cost of .len() by using direct index look-ups where possible. Instead of using .len() for tables, We can create a helper class with a "length" member, and add/subtract from this whenever we insert/delete an item from the table.

// direct length index lookups instead of .len() calls.
Benchmark.NewTable <- class {

    _tbl   = null // the real table in our class
    length = 0 // length variable, static so other functions can't override it.

    constructor( tbl = null ) {  this._tbl = ( tbl || {} ) ; this.length = this._tbl.len() }

    function get(k) { return _tbl[k] }
    function set(k, v) { k in _tbl ? _tbl[k] = v : (length++, _tbl[k] <- v) }
    function del(k) { ( length--, delete _tbl[k] ) }
}

local tab = Benchmark.NewTable()
local _tbl = tab._tbl

// insert stuff into the table and increment the table length
for (local i = 0; i <= 1000; i++)
{
    tab.set("value_" + i, i )
}

// .len() eval
function Benchmark::Len() {
    for (local i = 0; i < 1000; i++)
        print(_tbl.len() == 1000)
}

// index lookup, ~2-5% faster
function Benchmark::Length() {
    for (local i = 0; i < 1000; i++)
        print(tab.length == 1000)
}

This of course has performance implications of its own, and heavily depends on how often you are reading data from a table vs writing to it. You may only see performance benefits if you are checking table lengths a lot, but writing/reading infrequently

Benchmark

Configuration Results
Len 0.075ms
Idx 0.046ms
LenExplicit 0.071ms
LenFalsy 0.067ms
Len (table) 9.4ms
Length (table) 9.3ms

Variable look-up and caching

Squirrel will look for variables in the following order:

  1. local variables
  2. "outer" local variables (locals that are in parent scope)
  3. constants
  4. root table

For example:

  1. this will print the number 3 (outer local)
  2. commenting out local thing1 will print 2 (const)
  3. commenting out const thing1 will print 1 (root)
  4. uncommenting local thing1 = 0 will print 0 (local)
::thing1 <- 1
const thing1 = 2
local thing1 = 3

::GetThing1 <- function() {
    // local thing1 = 0
    return thing1
}

print( GetThing1() )

Traversing scopes to find variables like this will negatively impact performance. It is better to cache variables as locals before expensive loops or fast-firing functions (thinks).

::SomeGlobalVar <- 0

function Benchmark::_OnDestroy() { delete ::SomeGlobalVar }

function Benchmark::SlowIncrement() 
{
    for (local i = 1; i <= 1000; i++)
        SomeGlobalVar++
}

// 10x faster!?
function Benchmark::FastIncrement()
{
    local myvar = 0

    for (local i = 1; i <= 1000; i++)
        myvar++

    SomeGlobalVar += myvar
}

Root table lookups

Todo: confusing result. :: generates an extra _OP_LOADROOT instruction, and the in-game counter shows the exact opposite result of the third-party benchmarking tool by a significant degree...?
function Benchmark::NormalLookup() {

    for (local i = 1; i <= 1000; i++)
        SomeGlobalVar++
}

// 10x faster!?
function Benchmark::RootLookup() {

    for (local i = 1; i <= 1000; i++)
        ::SomeGlobalVar++
}

Benchmark

Configuration Results
SlowIncrement 0.591ms
FastIncrement 0.021ms
NormalLookup 0.584ms
RootLookup 0.058ms

conditional operators

Squirrel supports the && / || operators for condensing your if-statements and other expressions into less code. However, there is a performance consideration to using these operators instead of "unrolling" your statements into multiple lines.

Whenever these expression operators are used, squirrel allocates the result of that expression on the stack, so you can use them for something like this:

// this function can return a table OR null
function GetTable( input ) {
   return input > 5 ? null : { blah = input }
}
// make sure we always have a table for this variable, even if the function returns null
local mytable = GetTable( 10 ) || {}

If you are using these operators in an if-statement however, squirrel still "creates" this variable that you have no use for, then sends it straight to the garbage collector. if-statements with no conditional operators do simple jump instructions instead, and short-circuiting a long else/if chain does not do any stack assignments.

If this condition passes every single else/if statement however, it will be slower than the conditional operators, because the VM needs to churn through more bytecode by comparison.

In summary: Unroll your conditional operators into else/ifs in situations where the condition fails more often than it passes, or only passes in situations where performance is not a concern (pre-round), and condense your long else/if chains into conditional operators for the opposite scenario.

Benchmark

/************************************************************************************************************************************
 * CONDITIONAL TESTING:                                                                                                             *
 * Squirrel's bytecode compiler is not smart enough to replace conditionals in boolean expressions with simple _OP_JZ instructions. *
 * Instead, it will always output _OP_AND or _OP_OR, which assign an extra variable to the stack.                                   *
 * This means if/else chains are slightly faster to short circuit, as they skip this stack assignment.                              *
 * More if/else chains means more bytecode, so && and || are still faster when the condition needs to go down the whole chain.      *
 * Difference is ~5-15%                                                                                                             *
 ************************************************************************************************************************************/

local a = 1, b = 2, c = 3
blah <- true
test <- false

function Benchmark::ConditionalShortCircuit_OR() {

    test = false

    for (local i = 0; i < 100000; i++) {

        if ( a == 1 || b == 2 || c == 3 )
            test <- true

        blah <- test

    }
}

// ~5-15% faster to short circuit
function Benchmark::IfElseShortCircuit_OR() {

    test = false

    for (local i = 0; i < 100000; i++) {

        if ( a == 1 )
            test <- true
        else if ( b == 2 )
            test <- true
        else if ( c == 3 )
            test <- true

        blah <- test

    }
}
Configuration Results
ConditionalShortCircuit_Or 9.58ms
IfElseShortCircuit_OR 9.49ms