Source Multiplayer Networking

From Valve Developer Community
Revision as of 14:59, 30 June 2005 by King2500 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Multiplayer games based on the Source engine use a Client/Server networking architecture. Usually a server is a dedicated host that runs the game and is authoritative about world simulation, game rules, and player input processing. A client is a player's computer connected to a game server. The client and server communicate with each other by sending small data packets at a high frequency (usually 20 to 30 packets per second). A client receives the current world state from the server and generates video and audio output based on these updates The client also samples data from input devices (keyboard, mouse, microphone, etc.) and sends these input samples back to the server for further processing. Clients only communicate with the game server and not between each other (like in a peer-to-peer application). In contrast with a single player game, a multiplayer game has to deal with a variety of new problems caused by packet-based communication.

Network bandwidth is limited, so the server can't send a new update packet to all clients for every single world change. Instead, the server takes snapshots of the current world state at a constant rate and broadcasts these snapshots to the clients. Network packets take a certain amount of time to travel between the client and the server (i.e. the ping time). This means that the client time is always a little bit behind the server time. Furthermore, client input packets are also delayed on their way back, so the server is processing temporally delayed user commands. In addition, each client has a different network delay which varies over time due to other background traffic and the client's framerate. These time differences between server and client causes logical problems, becoming worse with increasing network latencies. In fast-paced action games, even a delay of a few milliseconds can cause a laggy gameplay feeling and make it hard to hit other players or interact with moving objects. Besides bandwidth limitations and network latencies, information can get lost due to network packet loss.

PIC: networking1.gif

To cope with all these issues introduced by network communication, the Source engine uses multiple techniques to solve these problems, or at least make them less visible to the player. These techniques include data compression, interpolation, prediction, and lag compensation. These techniques are tightly coupled, and changes made within one system may affect other systems. This document describes the general functionality of these systems and how they work together.

Basic Networking

The server simulates the game in discrete time steps called ticks. By default, 66 ticks per second are simulated, but mods can specify their own tickrate. For example Counter-Strike:Source uses a lower tickrate of 33 ticks/second to reduce the server CPU load. During each tick, the server processes incoming user commands, runs a physical simulation step, checks the game rules, and updates all object states. After simulating a tick, the server decides if any client needs a world update and takes a snapshot of the current world state if necessary. A higher tickrate increases the simulation precision, but also requires more CPU power and available bandwidth on both server and client. The server admin may override the default tickrate with the -tickrate command line parameter, though tickrate changes done this way are not recommended because the mod may not work as designed if its tickrate is changed.

Clients usually have only a limited amount of available bandwidth. In the worst case, players with a modem connection can't receive more than 5 to 7 KB/sec. If the server would tried to send them updates with a higher data rate, packet loss would be unavoidable. Therefore, the client has to tell the server its incoming bandwidth capacity by setting the console variable rate (in bytes/second). This is the most important network variable for clients and it has to be set correctly for an optimal gameplay experience. The client can request a certain snapshot rate by changing cl_updaterate (default 20), but the server will never send more updates than simulated ticks or exceed the requested client rate limit. Server admins can limit data rate values requested by clients with sv_minrate and sv_maxrate (both in bytes/second). Also the snapshot rate can be restricted with sv_minupdaterate and sv_maxupdaterate (both in snapshots/second).

The client creates user commands from sampling input devices with the same tick rate that the server is running with. A user command is basically a snapshot of the current keyboard and mouse state. But instead of sending a new packet to the server for each user command, the client sends command packets at a certain rate of packets per second (usually 30). This means two or more user commands are transmitted within the same packet. Clients can increase the command rate with cl_cmdrate. This will increase responsiveness but requires more outgoing bandwidth, too.

Game data is compressed using delta compression to reduce network load. That means the server doesn't send a full world snapshot each time, but rather only changes (a delta snapshot) that happened since the last acknowledged update. With each packet sent between the client and server, acknowledge numbers are attached to keep track of their data flow. Usually full (non-delta) snapshots are only sent when a game starts or a client suffers from heavy packet loss for a couple of seconds. Clients can request a full snapshot manually with the cl_fullupdate command.

Responsiveness, or the time between user input and its visible feedback in the game world, are determined by lots of factors, including the server/client CPU load, simulation tickrate, data rate and snapshot update settings, but mostly by the network packet traveling time. The time between the client sending a user command, the server responding to it, and the client receiving the server's response is called the latency or ping (or round trip time). Low latency is a significant advantage when playing a multiplayer online game. Techniques like prediction and lag compensation try to minimize that advantage and allow a fair game for players with slower connections. Tweaking networking setting can help to gain a better experience if the necessary bandwidth and CPU power is available. We recommend keeping the default settings, since improper changes may cause more negative side effects than actual benefits.

Entity Interpolation

By default, the client receives about 20 snapshot per second. If the objects (entities) in the world were only rendered at the positions received by the server, moving objects and animation would look choppy and jittery. Dropped packets would also cause noticeable glitches. The trick to solve this problem is to go back in time for rendering, so positions and animations can be continuously interpolated between two recently received snapshot. This technique is called client side entity interpolation and is enabled by default with cl_interpolate 1. With 20 snapshots per second, a new update arrives about every 50 milliseconds. If the client render time is shifted back by 50 milliseconds, entities can be always interpolated between the last received snapshot and the snapshot before that. The Source engine does the entity interpolation with a 100-millisecond delay (cl_interp 0.1). This way, even if one snapshot is lost, there are always two valid snapshots to interpolate between. Take a look at the following figure showing the arrival times of incoming world snapshots:

PIC: interpolation.gif

The last snapshot received on the client was at tick 344 or 10.30 seconds. The client time continues to increase based on this snapshot and the client frame rate. If a new video frame is rendered, the rendering time is the current client time 10.32 minus the view interpolation delay of 0.1 seconds. This would be 10.22 in our example and all entities and their animations are interpolated using the correct fraction between snapshot 340 and 342.

Since we have an interpolation delay of 100 milliseconds, the interpolation would even work if snapshot 342 were missing due to packet loss. Then the interpolation could use snapshots 340 and 344. If more than one snapshot in a row is dropped, interpolation can't work perfectly because it runs out of snapshots in the history buffer. In that case the renderer uses extrapolation (cl_extrapolate 1) and tries a simple linear extrapolation of entities based on their known history so far. The extrapolation is done only for 0.25 seconds of packet loss (cl_extrapolate_amount), since the prediction errors would become to big after that.

The entity interpolation is causing a constant view "lag" of 100 milliseconds, even if you're playing on a listen server (server and client on the same machine). So if you turn on sv_showhitboxes the player hitboxes are drawn in server time, meaning they are ahead of the rendered player model by 100 milliseconds. This doesn't mean you have to lead you're aiming when shooting at other players since the server-side lag compensation knows about client entity interpolation and corrects this error. If you turn off interpolation on a listen server (cl_interpolate 0), the drawn hitboxes will match the rendered player model again, but the animations and moving objects will become very jittery.

Input Prediction

Lets assume a player has a network latency of 100 milliseconds and starts to move forward. The information that the +FORWARD key is pressed is stored in a user command and send to the server. There the user command is processed by the movement code and the player's character is moved forward in the game world. This world state change is transmitted to all clients with the next snapshot update. So the player would see his own change of movement with a 100 milliseconds delay after he started walking. This delay applies to all players actions like movement, shooting weapons, etc. and becomes worst with higher latencies.

A delay between player input and corresponding visual feedback creates a strange, unnatural feeling and makes it hard to move or aim precisely. Client-side input prediction (cl_predict 1) is a way to remove this delay and let the player's actions feel more instant. Instead of waiting for the server to update your own position, the local client just predicts the results of its own user commands. Therefore the clients runs exactly the same code and rules the server will use to process the user commands. After the prediction is finished, the local player will move instantly to the new location while the server still sees him at the old place.

After 100 milliseconds, the client will receive the server snapshot that contains the changes based on the user command he predicted earlier. Then the client compares the server position with his predicted position. If they are different, a prediction error has occurred. This indicates that the client didn't have the correct information about other entities and the environment when it processed the user command. Then the client has to correct its own position, since the server has final authority over client-side prediction. If cl_showerror 1 is turned on, clients can see when prediction errors happen. Corrections of prediction errors can be quite noticeable and will cause your view to jump a bit. To visually smooth that effect, the prediction error is corrected gradually over a short amount of time (cl_smoothtime). Prediction error smoothing can be turned off with cl_smooth 0.

Predicting an object's behavior only works if the clients knows same the rules and state of the object like the server does. That's usually not the case since the server knows more internal information about objects than the clients do. Clients see only a small part of the world and just get enough information to render objects. Therefore, prediction works only for your own player, and the weapons controlled by you. Proper prediction of other players or interactive objects is not possible on the client at this point.

Lag Compensation

Let's say a player shoots at a target at client time 10.5. The firing information is packed into a user command and sent to the server. While the packet is on its way through the network, the server continues to simulate the world, and the target might have moved to a different position. The user commands arrives at server time 10.6 and the server wouldn't detect the hit, even though the player has aimed exactly at the target. This error is corrected by the server-side lag compensation (sv_unlag 1)

The lag compensation system keeps a history of all recent player positions for a time span of about one second (can be changed with sv_maxunlag). If a user command is executed, the server estimates at what time the command was created. This command execution time is calculated as followed:

Command Execution Time = Current Server Time - Client Latency - Client View Interpolation 

Then the server moves all other players back to where they were at the command execution time. The user command is executed and the hit is detected correctly. After the user command has been processed, the players are moved back to their original position. On a listen server you can enable sv_showimpacts 1 to see the different server and client hitboxes:

PIC: lag_compensation.jpg

This screenshot was taken on a listen server with 200 milliseconds of lag (using net_fakelag), right after the server confirmed the hit. The red hitbox shows the target position on the client where it was 100 milliseconds ago. Since then, the target continued to move to the left while the user command was traveling to the server. After the user command arrived, the server restored the target position (blue hitbox) based on the estimated command execution time. The server traces the shot and confirms the hit (the client sees blood effects). Client and server hitboxes don't exactly match because of small precision errors in time measurement. Even a small difference of a few milliseconds can cause an error of several inches for fast-moving objects. Multiplayer hit detection is not pixel perfect and has known precision limitations based on the tickrate and the speed of moving objects. Increasing the tickrate does improve the precision of hit detection, but also requires more CPU, memory, and bandwidth capacity for server and clients.

The question arises, why is hit detection so complicated on the server? Doing the back tracking of player positions and dealing with precision errors while hit detection could be done client-side way easier and with pixel precision. The client would just tell the server with a "hit" message what player has been hit and where. We can't allow that simply because a game server can't trust the clients on such important decisions. Even if the client is "clean" and protected by VAC (Valve-Anti-Cheat), the packets could be still modified on a 3rd machine while routed to the game server. These "cheat proxies" could inject "hit" messages into the network packet without being detected by VAC (a "man-in-the-middle" attack).

Network latencies and lag compensation can creates paradoxes that seem illogical compared to the real world. For example, you can be hit by an attacker you can't even see anymore because you already took cover. What happened is that the server moved your player hitboxes back in time, where you were still exposed to your attacker. This inconsistency problem can't be solved in general because of the relative slow packet speeds. In the real world, you don't notice this problem because light (the packets) travels so fast and you and everybody around you see the same world as it is right now.

Net Graph

The Source engine offers a couple of tools to check your client connection speed and quality. The most popular one is the net graph, which can be enabled with net_graph 2. Incoming packets are represented by small lines moving from right to left. The height of each line reflects size of a packet. If a gap appears between lines, a packet was lost or arrived out of order. The lines are color-coded depending on what kind of data they contain.

Under the net graph, the first line shows your current rendered frames per second, your average latency, and the current value of cl_updaterate. The second line shows the size in bytes of the last incoming packet (snapshots), the average incoming bandwidth, and received packets per second. The third line shows the same data just for outgoing packets (user commands).

PIC: net_graph.jpg