User:Braindawg/performance

From Valve Developer Community
Jump to: navigation, search

This page includes tips and tricks for optimizing VScript performance. While some of these tricks can be transferred to other supported titles (such as Left 4 Dead 2), they were all tested in Team Fortress 2. Your mileage may vary in other games.

In most practical use cases, performance is a non-issue when working with VScript, however there can be times when your script is just too slow to work effectively, or you are getting a lot of performance warnings in console. There are many ways to speed up your scripts, and many of those ways are quite simple to implement.

Warning.pngWarning:Only optimize your scripts if you need to! Some of these tips may introduce extra unnecessary complexity to your project. Premature optimization without knowing where your performance issues actually come from is extremely ill-advised.
Note.pngNote:Benchmarks were done using a benchmarking tool that is not publicly released. Set vscript_perf_warning_spew_ms to 0 and run these scripts separately if you would like to benchmark them yourself.

VScript Performance Tips

This section is for performance tips specific to the VScript API (specifically Team Fortress 2's)

Folding Functions

Folding functions in the context of VScript means folding them into the root table. It is recommended that you do this for functions that are commonly used in expensive operations.

Benchmark

Note.pngNote:Benchmark done on tc_hydro
::ROOT <- getroottable()
foreach(k, v in ::NetProps.getclass())
	if (k != "IsValid" && !(k in ROOT))
		ROOT[k] <- ::NetProps[k].bindenv(::NetProps)

foreach(k, v in ::Entities.getclass())
	if (k != "IsValid" && !(k in ROOT))
		ROOT[k] <- ::Entities[k].bindenv(::Entities)

for (local prop; prop = Entities.FindByClassname(prop, "prop_dynamic");)
    NetProps.GetPropString(prop, "m_iName")
for (local prop; prop = FindByClassname(prop, "prop_dynamic");)
    GetPropString(prop, "m_iName")

Result:

Configuration Results
Unfolded 0.1439ms
Folded 0.0999ms

Constants

Folding Constants

Similar to folding functions, folding pre-defined Constant values into the constant table (or the root table) increases performance significantly.

Benchmark

local j = 0

for (local i = 1; i <= Constants.Server.MAX_EDICTS; i++)
    j++

const MAX_EDICTS = 2048
local e = 0

for (local i = 1; i <= MAX_EDICTS; i++)
    e++

Result:

Configuration Results
Unfolded 0.1127ms
"Folded" 0.0423ms

Root table vs Constant table

Unlike values inserted into the root table, values inserted into the constant table are cached at the pre-processor level. What this means is, while accessing them is faster, it may not be feasible to fold your constants into the constant table if they are folded in the same script file that references them.

If you intend to insert values into the constant table, you must do this before any other scripts are executed, otherwise your script will not be able to read any values from it.

Benchmark

::ROOT_VALUE <- 2
const CONST_VALUE = 2
for (local i = 0; i <= 10000; i++)
    i += CONST_VALUE

for (local i = 0; i <= 10000; i++)
    i += ROOT_VALUE

Result:

Configuration Results
Constant 0.0767ms
Root 0.1037ms

String Formatting

Squirrel supports two main ways to format strings: Concatenation using the + symbol, and the format() function.

format() does not support formatting entity handles and other VScript-specific datatypes, however it does support formatting strings, integers, and floats. It is also significantly faster than concatenation.

ToKVString

the TOKVString() VScript function takes a Vector/QAngle and formats the values into a string. For example, Vector(0, 0, 0).ToKVString() would be "0 0 0"

On top of being less cumbersome to write, ToKVString() is marginally faster than format(). Interestingly though, when formatting multiple ToKVString() outputs into a new string, concatenation may be faster.

Benchmark

local mins = Vector(-1, -2, -3)local maxs = Vector(1, 2, 3)local keyvalues = { responsecontext = "-10 -10 -10 10 10 10" }

for (local i = 0; i < 10000; i++)
    keyvalues.responsecontext <- mins.x.tostring() + "," + mins.y.tostring() + "," + mins.z.tostring() + "," + maxs.x.tostring() + "," + maxs.y.tostring() + "," + maxs.z.tostring()

for (local i = 0; i < 10000; i++)
    keyvalues.responsecontext <- format("%g,%g,%g,%g,%g,%g", mins.x, mins.y, mins.z, maxs.x, maxs.y, maxs.z)
;    

    
for (local i = 0; i < 10000; i++)
    keyvalues.responsecontext <- format("%s %s", mins.ToKVString(), maxs.ToKVString())
;    

for (local i = 0; i < 10000; i++)
    keyvalues.responsecontext <- mins.ToKVString() + " " + maxs.ToKVString()

Result:

Configuration Results
concat 39.0847ms
format 24.0123ms
ToKVString 19.9377ms
ToKVString + concat 18.5166ms

Spawning Entities

in VScript, there are four common ways to spawn entities:

- CreateByClassname + DispatchSpawn

- SpawnEntityFromTable

- SpawnEntityGroupFromTable

- point_script_template entity + AddTemplate

CreateByClassname + DispatchSpawn vs SpawnEntityFromTable

In general, performance is not a major concern when spawning entities. In special circumstances though, you may need to spawn and kill a temporary entity in an already expensive function. A notable example of an entity that would need this is trigger_stun. This entity will not attempt to re-stun the same player multiple times, so it is not possible to spawn a single entity and repeatedly fire StartTouch/EndTouch on the same target.

In situations like this, CreateByClassname + DispatchSpawn is roughly 4x faster in comparison to SpawnEntityFromTable.

Benchmark

trigger_stun = SpawnEntityFromTable("trigger_stun", 
{
    stun_type = 2,
    stun_effects = true,
    stun_duration = 3,
    move_speed_reduction = 0.1,
    trigger_delay = 0.1,
    spawnflags = 1,
})

trigger_stun = Entities.CreateByClassname("trigger_stun")
trigger_stun.KeyValueFromInt("stun_type", 2)
trigger_stun.KeyValueFromInt("stun_effects", 1)
trigger_stun.KeyValueFromFloat("stun_duration", 3.0)
trigger_stun.KeyValueFromFloat("move_speed_reduction", 0.1)
trigger_stun.KeyValueFromFloat("trigger_delay", 0.1)
trigger_stun.KeyValueFromInt("spawnflags", 1)
DispatchSpawn(trigger_stun)

Result:

Configuration Results
SpawnEntityFromTable 0.0428ms
CreateByClassname 0.0156ms

SpawnEntityGroupFromTable vs point_script_template

When spawning multiple entities at the same time, it is more efficient to use either SpawnEntityGroupFromTable or a point_script_template entity. These options also have the added benefit of respecting parent hierarchy, so the parentname keyvalue works as intended.

point_script_template is both more flexible and faster. SpawnEntityGroupFromTable has several major limitations in comparison to point_script_template, and is generally not recommended. See the VScript documentation for more details on how to use point_script_template.

Benchmark

//spawn origins are right outside of bigrock spawn
SpawnEntityGroupFromTable({
    [0] = {
        func_rotating =
        {
            message = "hl1/ambience/labdrone2.wav",
            volume = 8,
            responsecontext = "-1 -1 -1 1 1 1",
            targetname = "crystal_spin",
            spawnflags = 65,
            solidbsp = 0,
            rendermode = 10,
            rendercolor = "255 255 255",
            renderamt = 255,
            maxspeed = 48,
            fanfriction = 20,
            origin = Vector(278.900513, -2033.692993, 516.067200),
        }
    },
    [2] = {
        tf_glow =
        {
            targetname = "crystalglow",
            parentname = "crystal",
            target = "crystal",
            Mode = 2,
            origin = Vector(278.900513, -2033.692993, 516.067200),
            GlowColor = "0 78 255 255"
        }
    },
    [3] = {
        prop_dynamic =
        {
            targetname = "crystal",
            solid = 6,
            renderfx = 15,
            rendercolor = "255 255 255",
            renderamt = 255,
            physdamagescale = 1.0,
            parentname = "crystal_spin",
            modelscale = 1.3,
            model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
            MinAnimTime = 5,
            MaxAnimTime = 10,
            fadescale = 1.0,
            fademindist = -1.0,
            origin = Vector(278.900513, -2033.692993, 516.067200),
            angles = QAngle(45, 0, 0)
        }
    },
})

local script_template = Entities.CreateByClassname("point_script_template")

script_template.AddTemplate("func_rotating", {
    message = "hl1/ambience/labdrone2.wav",
    volume = 8,
    targetname = "crystal_spin2",
    spawnflags = 65,
    solidbsp = 0,
    rendermode = 10,
    rendercolor = "255 255 255",
    renderamt = 255,
    maxspeed = 48,
    fanfriction = 20,
    origin = Vector(175.907211, -2188.908691, 516.031311),
})

script_template.AddTemplate("tf_glow", {
        target = "crystal2",
        Mode = 2,
        origin = Vector(175.907211, -2188.908691, 516.031311),
        GlowColor = "0 78 255 255"
})

script_template.AddTemplate("prop_dynamic",{
    targetname = "crystal2",
    solid = 6,
    renderfx = 15,
    rendercolor = "255 255 255",
    renderamt = 255,
    physdamagescale = 1.0,
    parentname = "crystal_spin2",
    modelscale = 1.3,
    model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
    MinAnimTime = 5,
    MaxAnimTime = 10,
    fadescale = 1.0,
    fademindist = -1.0,
    origin = Vector(175.907211, -2188.908691, 516.031311),
    angles = QAngle(45, 0, 0)
})

EntFireByHandle(script_template, "ForceSpawn", "", -1, null, null)

Result:

Configuration Results
SpawnEntityGroupFromTable 0.2382ms
point_script_template 0.1100ms

Iterating through players

When iterating over all players in the map, it is generally not recommended to use FindByClassname on the player entity if performance is a concern. Iterating over the first MaxClients number of entindexes and grabbing the player from PlayerInstanceFromIndex(i) is notably faster and not much more complex to write. If you want the fastest option at the cost of complexity though, you should collect player entities in your own global array in an event such as player_team or player_activate (and remove them in player_disconnect), then iterate over that array when necessary.

Benchmark

players are collected in a mapspawn.nut

::playerArray <- []
::Events <- {
    function OnGameEvent_player_team(params)
    {
        local player = GetPlayerFromUserID(params.userid)
        
        if (playerArray.find(player) != null) return
        
        playerArray.append(player)
    }
    
    function OnGameEvent_player_disconnect(params) 
    {
        local player = GetPlayerFromUserID(params.userid)
    
        if (playerArray.find(player) == null) return

        playerArray.remove(player)
    }
}
__CollectGameEventCallbacks(Events)
::maxClients <- MaxClients().tointeger()

for (local player; player = Entities.FindByClassname(player, "player");)
{
    printl(player)
}

for (local i = 1; i <= maxClients; i++)
{
    local player = PlayerInstanceFromIndex(i)
    
    if (player == null) continue

    printl(player)
}

foreach(player in playerArray)
{
    printl(player)
}

Result:

Configuration Results
FindByClassname 0.1289ms
Index iteration 0.0856ms
Array iteration 0.0679ms

Squirrel Performance Tips

This section is for general squirrel tips that are largely independent of the VScript API

array.len() and table.len()

If you know the variable you are working with is always going to be an array or table, you can optimize your array/table length checks significantly

Arrays

Arrays in squirrel are, effectively, tables where the index is an integer value. This means that we can effectively perform a single index lookup on our array to check if our array is a specified length.

local arr = array(1000)

for (local i = 0; i < 1000, i++)
    print(arr.len() == 1000)

for (local i = 0; i < 1000, i++)
   print((999 in arr and (!1000 in arr)) //check if the index 999 exists, but not if index 1000 exists.  If index 1000 doesn't exist then this will return true

Additionally, the integer 0 will return the value false in squirrel. For specifically checking an empty array, this falsy evaluation is faster than directly checking if length equals 0

local arr = []
for (local i = 0; i < 1000, i++)
    if (arr.len() == 0)
        print(i)

for (local i = 0; i < 1000, i++)
    if (!arr.len()) // 0 = false
        print(i)

Tables

As shown above, we can circumvent the performance cost of .len() by using direct index look-ups where possible, and using falsy evaluation instead of strict type checking.

Instead of using .len(), We can keep an index called "length" in our table, and add/subtract from this table value whenever we insert/delete an item from the table.

local tab = {length = 0}

//insert stuff into the table and increment the table length
for (local i = 0; i < 1001; i++)
{
    tab[format("value_%d", i)] <- i
    tab.length++
}

//.len() eval
for (local i = 0; i < 1000; i++)
    print(tab.len() == 1000)

//index lookup
for (local i = 0; i < 1000; i++)
    print(tab.length == 1000)

This of course has performance implications of its own, and heavily depends on how often you are reading data from a table vs writing to it.