User:Braindawg/performance: Difference between revisions
| Line 548: | Line 548: | ||
Additionally, the integer 0 will return the value false in squirrel. For specifically checking an empty array, this falsy evaluation is slightly faster than directly checking if length equals 0 | Additionally, the integer 0 will return the value false in squirrel. For specifically checking an empty array, this falsy evaluation is slightly faster than directly checking if length equals 0 | ||
<source lang=js | <source lang=js> | ||
/**************************** | /**************************** | ||
* EMPTY ARRAY/TABLE CHECKS * | * EMPTY ARRAY/TABLE CHECKS * | ||
Revision as of 11:13, 22 October 2025
This page includes tips and tricks for optimizing VScript performance. All of these performance tests were done in
and many can be used in other
-based titles. Your mileage may vary in VScript supported games prior to the SDK update (![]()
![]()
).
Benchmark figures come from this benchmarking tool.
Folding
Functions
Folding functions in the context of VScript means folding them into the root table. This only needs to be done once on script load, and is recommended for functions that are commonly used.
Benchmark
/***********************************************************************************************************
* FOLDING: *
* Folding functions from their original scope into local/root scope is noticeably faster (~15-30%) *
* skips extra lookup instructions, also less verbose *
***********************************************************************************************************/
local GetPropString = NetProps.GetPropString.bindenv( NetProps )
local GetPropBool = NetProps.GetPropBool.bindenv( NetProps )
const MAX_EDICTS = 2048
function Benchmark::Unfolded() {
for ( local i = 0, ent; i < Constants.Server.MAX_EDICTS; ent = EntIndexToHScript( i ), i++ ) {
if ( ent ) {
NetProps.GetPropString( ent, "m_iName" )
NetProps.GetPropString( ent, "m_iClassname" )
NetProps.GetPropBool( ent, "m_bForcePurgeFixedupStrings" )
}
}
}
// 20% faster, maybe more
function Benchmark::Folded() {
for ( local i = 0, ent; i < MAX_EDICTS; ent = EntIndexToHScript( i ), i++ ) {
if ( ent ) {
GetPropString( ent, "m_iName" )
GetPropString( ent, "m_iClassname" )
GetPropBool( ent, "m_bForcePurgeFixedupStrings" )
}
}
}
Result:
| Configuration | Results |
|---|---|
Unfolded
|
1.76ms
|
Folded
|
1.32ms
|
Constants
Similar to folding functions, folding pre-defined Constant values into the constant table (or the root table) increases performance significantly.
Benchmark
local _CONST = getconsttable()
// fold every pre-defined constant into the const table
if ( !( "ConstantNamingConvention" in ROOT ) )
foreach( a, b in Constants )
foreach( k, v in b )
_CONST[k] <- v != null ? v : 0
setconsttable(_CONST)
function Benchmark::UnfoldedConst() {
for (local i = 1; i <= Constants.Server.MAX_EDICTS; i++)
local temp = i
}
function Benchmark::FoldedConst() {
for (local i = 1; i <= MAX_EDICTS; i++)
local temp = i
}
Result:
| Configuration | Results |
|---|---|
Unfolded
|
0.356ms
|
Folded
|
0.033ms
|
Root table vs Constant table
Unlike values inserted into the root table, values inserted into the constant table are cached at the pre-processor level. What this means is, while accessing them is faster, it may not be feasible to fold your constants into the constant table if they are folded in the same script file that references them.
If you intend to insert values into the constant table (const keyword or getconsttable().foo <- "bar"), you must do this before any other scripts are executed, otherwise your script will not be able to read any values from it.
Benchmark
::SomeGlobalVar <- 0x7FFFFFFF
const GLOBAL_VAR = 0x7FFFFFFF
function Benchmark::RootSetLookup() {
for (local i = 1; i <= 10000; i++)
local temp = ::SomeGlobalVar
}
// ~20-40% faster
function Benchmark::ConstSetLookup() {
for (local i = 1; i <= 10000; i++)
local temp = GLOBAL_VAR
}
Result:
| Configuration | Results |
|---|---|
Root
|
0.267ms
|
Const
|
0.154ms
|
Strings
Formatting
Squirrel supports two main ways to format strings: Concatenation using the + symbol, and the format() function.
For large amounts of formatting, format() is significantly faster than concatenation. For < 3 concatenations however, format() is slower.
.tostring() and format it as a stringToKVString
the TOKVString() VScript function takes a Vector/QAngle and formats the values into a string. For example, Vector(0, 0, 0).ToKVString() returns "0 0 0"
On top of being less cumbersome to write, ToKVString() is marginally faster than format().
However, when formatting multiple ToKVString() outputs into a new string, concatenation is faster due to less function calls.
Benchmark
local mins = Vector(-1, -2, -3)
local maxs = Vector(1, 2, 3)
local kvstring = ""
function Benchmark::StringConcat() {
for ( local i = 0; i < 10000; i++ )
kvstring = mins.x + "," + mins.y + "," + mins.z + "," + maxs.x + "," + maxs.y + "," + maxs.z
}
function Benchmark::StringFormat() {
for ( local i = 0; i < 10000; i++ )
kvstring = format("%g,%g,%g,%g,%g,%g", mins.x, mins.y, mins.z, maxs.x, maxs.y, maxs.z)
}
function Benchmark::KVStringFormat() {
for (local i = 0; i < 10000; i++ )
kvstring = format("%s %s", mins.ToKVString(), maxs.ToKVString())
}
function Benchmark::KVStringConcat() {
for (local i = 0; i < 10000; i++ )
kvstring = mins.ToKVString() + " " + maxs.ToKVString()
}
Result:
| Configuration | Results |
|---|---|
StringConcat
|
35.0847ms
|
StringFormat
|
23.0143ms
|
KVStringFormat
|
19.9377ms
|
KVStringConcat
|
18.3142ms
|
Character Comparisons
Strings in squirrel, like many C-style languages, are just an array of characters, and characters are just integers in disguise (the ascii code). This means for simple comparisons (e.g. only checking the first character for chat commands) you can directly look up the index in the string to get the character, then do very simple (and fast) integer comparisons, rather than unnecessary function calls and string comparisons. Note that characters are represented with single quotes ('a') rather than double quotes.
Benchmark
local map_name = GetMapName()
function Benchmark::StartsWith() {
for ( local i = 0; i < 10000; i++ )
if ( startswith( map_name, "workshop/" ) )
local test = true
}
function Benchmark::CharCompare() {
// workshop loaded maps all have the "workshop/" prefix, meaning '/' is always the 9th character
// arrays are 0-indexed, so the 9th character would be map_name[8] (9 - 1)
for ( local i = 0; i < 10000; i++ )
if ( 8 in map_name && map_name[8] == '/' )
local test = true
}
Result:
| Configuration | Results |
|---|---|
StartsWith
|
1.87ms
|
CharCompare
|
0.492ms
|
Spawning Entities
in VScript, there are four common ways to spawn entities:
- CreateByClassname + DispatchSpawn
- SpawnEntityFromTable
- SpawnEntityGroupFromTable
- point_script_template entity + AddTemplate
CreateByClassname + DispatchSpawn vs SpawnEntityFromTable
In general, performance is not a major concern when spawning entities. In special circumstances though, you may need to spawn and kill a temporary entity in an already expensive function. A notable example of an entity that would need this is trigger_stun. This entity will not attempt to re-stun the same player multiple times, so it is not possible to spawn a single entity and repeatedly fire StartTouch/EndTouch on the same target.
In situations like this, CreateByClassname + DispatchSpawn is roughly 4x faster in comparison to SpawnEntityFromTable.
Benchmark
local CreateByClassname = Entities.CreateByClassname.bindenv( Entities )
local SetPropBool = NetProps.SetPropBool.bindenv( NetProps )
local SetPropString = NetProps.SetPropString.bindenv( NetProps )
local DispatchSpawn = Entities.DispatchSpawn.bindenv( Entities )
// anywhere from 15-30% faster for single entity spawning
// The table passed to SpawnEntityFromTable needs to be interpreted and converted to something C++ can understand
// meanwhile CreateByClassname/netprop/keyvaluefromstring are simple 1:1 C++ bindings
function Benchmark::ByClassname() {
for (local i = 0; i < 100; i++) {
local ent = CreateByClassname( "logic_relay" )
DispatchSpawn( ent )
SetPropString( ent, "m_iName", "__relay" )
}
}
function Benchmark::FromTable() {
for (local i = 0; i < 100; i++) {
SpawnEntityFromTable( "logic_relay", { targetname = "__relay" } )
}
}
Result:
| Configuration | Results |
|---|---|
FromTable
|
0.0428ms
|
ByClassname
|
0.0156ms
|
SpawnEntityGroupFromTable vs point_script_template
When spawning multiple entities at the same time, it is more efficient to use either SpawnEntityGroupFromTable or a point_script_template entity. These options also have the added benefit of respecting parent hierarchy, so the parentname keyvalue works as intended.
point_script_template is both more flexible and faster. SpawnEntityGroupFromTable has several major limitations in comparison to point_script_template, and is generally not recommended. See the VScript documentation for more details on how to use point_script_template.
Benchmark
function Benchmark::EntityGroupFromTable() {
// spawn origins are right outside of bigrock spawn
SpawnEntityGroupFromTable({
[0] = {
func_rotating =
{
message = "hl1/ambience/labdrone2.wav",
volume = 8,
responsecontext = "-1 -1 -1 1 1 1",
targetname = "crystal_spin",
vscripts = "rotatefix", // see func_rotating vdc page for this
spawnflags = 65,
solidbsp = 0,
rendermode = 10,
rendercolor = "255 255 255",
renderamt = 255,
maxspeed = 48,
fanfriction = 20,
origin = Vector(278.900513, -2033.692993, 516.067200),
}
},
[2] = {
tf_glow =
{
targetname = "crystalglow",
parentname = "crystal",
target = "crystal",
Mode = 2,
origin = Vector(278.900513, -2033.692993, 516.067200),
GlowColor = "0 78 255 255"
}
},
[3] = {
prop_dynamic =
{
targetname = "crystal",
solid = 6,
renderfx = 15,
rendercolor = "255 255 255",
renderamt = 255,
physdamagescale = 1.0,
parentname = "crystal_spin",
modelscale = 1.3,
model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
MinAnimTime = 5,
MaxAnimTime = 10,
fadescale = 1.0,
fademindist = -1.0,
origin = Vector(278.900513, -2033.692993, 516.067200),
angles = QAngle(45, 0, 0)
}
},
})
}
// ~15-25% faster for batch entity spawning
function Benchmark::PointScriptTemplate() {
local script_template = Entities.CreateByClassname("point_script_template")
script_template.AddTemplate("func_rotating", {
message = "hl1/ambience/labdrone2.wav",
volume = 8,
targetname = "crystal_spin2",
spawnflags = 65,
solidbsp = 0,
rendermode = 10,
rendercolor = "255 255 255",
vscripts = "rotatefix",
renderamt = 255,
maxspeed = 48,
fanfriction = 20,
origin = Vector(175.907211, -2188.908691, 516.031311),
})
script_template.AddTemplate("tf_glow", {
target = "crystal2",
Mode = 2,
origin = Vector(175.907211, -2188.908691, 516.031311),
GlowColor = "0 78 255 255"
})
script_template.AddTemplate("prop_dynamic", {
targetname = "crystal2",
solid = 6,
renderfx = 15,
rendercolor = "255 255 255",
renderamt = 255,
physdamagescale = 1.0,
parentname = "crystal_spin2",
modelscale = 1.3,
model = "models/props_moonbase/moon_gravel_crystal_blue.mdl",
MinAnimTime = 5,
MaxAnimTime = 10,
fadescale = 1.0,
fademindist = -1.0,
origin = Vector(175.907211, -2188.908691, 516.031311),
angles = QAngle(45, 0, 0)
})
script_template.AcceptInput( "ForceSpawn", null, null, null )
}
Result:
| Configuration | Results |
|---|---|
SpawnEntityGroupFromTable
|
0.72ms
|
PointScriptTemplate
|
0.61ms
|
Iterating through players
When iterating over all players in the map, it is generally not recommended to use FindByClassname on the player entity in high playercount environments (>8-12 players). Iterating over the first MaxClients number of entindexes and grabbing the player from PlayerInstanceFromIndex(i) is notably faster and not much more complex to write in these circumstances.
The performance of player iteration depends heavily on how many players are actively in the server. In low playercount environments, the PlayerInstanceFromIndex approach is slower due to extra unnecessary iterations. In high playercount environments, `FindByClassname` runs a more expensive loop on every entity in the map to find players.
If you want the fastest option at the cost of complexity, you should collect player entities in your own global table or array in an event such as player_team or player_activate, remove them on player_disconnect, then iterate over that when necessary. Using a table gives you the added bonus of having a cache of player user IDs, which is faster to look up compared to reading the player_manager netprop.
player_activate does not fire for tfbots!Benchmark
The first script must be executed before the second one!
::ALL_PLAYERS <- {}
::Events <- {
function OnGameEvent_player_team(params)
{
local player = GetPlayerFromUserID(params.userid)
if ( player in ALL_PLAYERS ) return
ALL_PLAYERS[ player ] <- params.userid
}
function OnGameEvent_player_disconnect(params)
{
local player = GetPlayerFromUserID(params.userid)
if ( !(player in ALL_PLAYERS) ) return
delete ALL_PLAYERS[ player ]
}
}
__CollectGameEventCallbacks(Events)
::maxClients <- MaxClients().tointeger()
for (local player; player = Entities.FindByClassname(player, "player");)
{
printl(player)
}
for (local i = 1; i <= maxClients; i++)
{
local player = PlayerInstanceFromIndex(i)
if (!player) continue
printl(player)
}
foreach(player in ALL_PLAYERS.keys())
{
printl(player)
}
Result:
| Configuration | Results |
|---|---|
FindByClassname
|
0.1289ms
|
Index iteration
|
0.0856ms
|
Array/Table iteration
|
0.0679ms
|
Squirrel Performance Tips
Arrays and Tables
Arrays in squirrel are, in practice, tables where the index is an integer value.
The .len() function call is relatively expensive. We can avoid this overhead by directly checking the index.
/*****************
* LENGTH CHECKS *
*****************/
function Benchmark::Len() {
for ( local i = 0; i < 1000; i++ )
if ( arr.len() == 1000 )
local len = true
}
// ~40% faster, no _OP_PREPCALLK/_OP_CALL instructions
function Benchmark::Idx() {
for ( local i = 0; i < 1000; i++ )
if ( 999 in arr && !(1000 in arr) )
local len = true
}
Additionally, the integer 0 will return the value false in squirrel. For specifically checking an empty array, this falsy evaluation is slightly faster than directly checking if length equals 0
/****************************
* EMPTY ARRAY/TABLE CHECKS *
****************************/
function Benchmark::LenExplicit() {
for ( local i = 0; i < 1000; i++ )
if ( arr.len() != 0 )
local len = true
}
// ~2-5% faster, no _OP_NE instruction
function Benchmark::LenFalsy() {
for ( local i = 0; i < 1000; i++ )
if ( arr.len() )
local len = true
}
Tables
As shown above, we can circumvent the performance cost of .len() by using direct index look-ups where possible. Instead of using .len() for tables, We can create a helper class with a "length" member, and add/subtract from this whenever we insert/delete an item from the table.
// direct length index lookups instead of .len() calls.
Benchmark.NewTable <- class {
_tbl = null // the real table in our class
length = 0 // length variable, static so other functions can't override it.
constructor( tbl = null ) { this._tbl = ( tbl || {} ) ; this.length = this._tbl.len() }
function get(k) { _tbl[k] }
function set(k, v) { k in _tbl ? _tbl[k] = v : (length++, _tbl[k] <- v) }
function del(k) { ( length--, delete _tbl[k] ) }
}
local tab = Benchmark.NewTable()
local _tbl = tab._tbl
// insert stuff into the table and increment the table length
for (local i = 0; i <= 1000; i++)
{
tab.set("value_" + i, i )
}
// .len() eval
function Benchmark::Len() {
for (local i = 0; i < 1000; i++)
print(_tbl.len() == 1000)
}
// index lookup, ~2.5% faster
function Benchmark::Length() {
for (local i = 0; i < 1000; i++)
print(tab.length == 1000)
}
This of course has performance implications of its own, and heavily depends on how often you are reading data from a table vs writing to it. You may only see performance benefits if you are checking table lengths a lot, but writing/reading infrequently
Benchmark
| Configuration | Results |
|---|---|
Len
|
0.075ms
|
Idx
|
0.046ms
|
LenExplicit
|
0.071ms
|
LenFalsy
|
0.067ms
|
Len (table)
|
9.4ms
|
Length (table)
|
9.3ms
|
Variable look-up and caching
Squirrel will look for variables in the following order:
- local variables
- "outer" local variables (locals that are in parent scope)
- constants
- root table
For example:
- this will print the number 3 (outer local)
- commenting out
local thing1will print 2 (const) - commenting out
::thing1would print 1 (root) - uncommenting
local thing1 = 0would print 0 (local)
::thing1 <- 1
const thing1 = 2
local thing1 = 3
::GetThing1 <- function() {
// local thing1 = 0
return thing1
}
print( GetThing1() )
Traversing scopes to find variables like this will negatively impact performance. It is better to cache variables as locals before expensive loops or fast-firing functions (thinks).
::SomeGlobalVar <- 0
function Benchmark::_OnDestroy() { delete ::SomeGlobalVar }
function Benchmark::SlowIncrement()
{
for (local i = 1; i <= 1000; i++)
SomeGlobalVar++
}
// 10x faster!?
function Benchmark::FastIncrement()
{
local myvar = SomeGlobalVar
for (local i = 1; i <= 1000; i++)
myvar++
SomeGlobalVar = myvar
}
Root table lookups
Prefixing a root-scoped variable with :: will skip this traversal process and improve performance considerably.
function Benchmark::NormalLookup() {
for (local i = 1; i <= 1000; i++)
SomeGlobalVar++
}
// 10x faster!?
function Benchmark::RootLookup() {
for (local i = 1; i <= 1000; i++)
::SomeGlobalVar++
}
Benchmark
| Configuration | Results |
|---|---|
SlowIncrement
|
0.591ms
|
FastIncrement
|
0.021ms
|
NormalLookup
|
0.584ms
|
RootLookup
|
0.058ms
|