Source BSP File Format

From Valve Developer Community
Revision as of 18:13, 25 April 2009 by TomEdwards (talk | contribs) (Geocities is shutting down, and this needs to be preserved)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

By Rof (rof(at)mellish.org.uk), October 2005. This article has been retrieved from Geocities to preserve it after the service's imminitent shutdown.

Introduction

This document describes the structure of the BSP file format used by the Source engine. The format is similar but not identical to the BSP file formats of the Half-Life 1 engine, which is in turn based on the Quake 1 and Quake 2 file formats, plus that of the later Quake 3:Arena. Because of this, Max McGuire's article, Quake 2 BSP File Format has been of invaluable help in understanding the overall structure of the format and the parts of it that have remained the same or similar to its predecessors.

This document is an extension of notes made during the writing of my Half-Life 2 bsp file decompiler, VMEX. It therefore focusses on those parts of the format necessary to perform map decompilation (conversion of the bsp file back into a VMF file which can be loaded by the Hammer map editor). Some parts of the format are not needed for this process, but what information I have about these sections will be mentioned.

Most of the information in this document comes from the Max McGuire article referenced above, from the source code included in the Source SDK (particularly the C header file public/bspfile.h), and from my own experimentation during the writing of VMEX. This document is completely unofficial and should not be considered any kind of official specification from Valve, nor am I affiliated with Valve in any way. Any corrections or information on the unknown parts of the format will be gratefully received.

This document describes version 19 of the BSP file format as used by the Source engine, which is used by Half-Life 2 single player (HL2SP), Half-Life 2 Deathmatch (HL2DM), and Counter-Strike:Source (CS:S). The game Vampire: The Masquerade Bloodlines (VTMB) uses a modified form of an earlier format, version 17; known differences will be mentioned in their respective sections, however because no SDK is available for VTMB, this information is mostly guesswork.

Preliminary information on version 20 of the format, which supports high dynamic range (HDR) lighting as used in Day of Defeat: Source (DoD:S), is tentatively covered. Until the Source SDK is updated, this information is also mostly guesswork.

A certain familiarity with C/C , geometry, and HL2 mapping terms is assumed on the part of the reader. Code (mostly C structures) is given in a fixed width font. Sometimes the structures as shown are modified from their actual definitions in the SDK header files, for reasons of clarity and consistency.

Overview

The BSP file contains the vast majority of the information needed by the Source engine to render and play a map. This includes the geometry of all the polygons in the level; references to the names and orientation of the textures to be drawn on those polygons; the data used to simulate the physical behaviour of the player and other items during the game; the location and properties of all brush-based, model (prop) based, and non-visible (logical) entities in the map; and the BSP tree and visibility table used to locate the player location in the map geometry and to render the visible map as efficiently as possible. Optionally, the map file can also contain any custom textures and models used on the level, embedded inside the map's Pakfile lump (see below).

Information not stored in the BSP file includes the map description text displayed by HL2DM and CS:S after loading the map (stored in the file mapname.txt) and the AI navigation file used by non-player characters (NPCs) which need to navigate the map (stored in the file mapname.nav). Because of the way the Source engine file system works, these external files may also be embedded in the bsp file's Pakfile lump, though usually they are not.

Official map files are stored in the Steam Game Cache File (GCF) format, and are accessed through the Steam filesystem by the game engine. They can be extracted from the GCF files using Nemesis' GCFScape for perusal outside of Steam.

The data in the BSP file is stored throughout in little-endian byte format, in common with the preceding BSP formats as used by HL1, Quake, etc. Byte-swabbing is required if loading the file on a big-endian format platform such as Java.

BSP File Header

The BSP file starts with a header. This structure identifies the file as a Valve Source Engine BSP file, identifies the version of the format, and is then followed by a directory listing of the location, length, and version of up to 64 subsections of the file, known as lumps, that store different parts of the map data. Finally, the map revision is given.

The structure of the header is given in the SDK's public/bspfile.h header file, a file which I will be referencing extensively throughout this document. The header is 1036 bytes long in total:

struct dheader_t
{
	int	ident;	// BSP file identifier
	int	version;	// BSP file version
	lump_t	lumps[HEADER_LUMPS];	// lump directory array
	int	mapRevision;	// the map's revision (iteration, version) number
};

Here ident is a 4-byte magic number defined as:

IDBSPHEADER	(('P'<<24) ('S'<<16) ('B'<<8) 'V')	//little-endian 'VBSP';

The first four bytes of the file are thus always 'V' 'B' 'S' 'P' (in ASCII). These bytes identify the file as a Valve BSP file; other BSP file formats use a different magic number (such as for iD Software's Quake engine games, which start with 'IBSP'). The HL1 BSP format does not use any magic number at all.

The second integer is the version of the BSP file format (BSPVERSION); for HL2 games (until the release of HDR lighting) this was 19 (decimal); VTMB uses an earlier version of the format, 17. Note that BSP file formats for other engines (HL1, Quake series, etc.) use entirely different version number ranges.

The newest Valve BSP version is 20, for maps supporting HDR lighting. This is currently only used in maps for DoD:S and The Lost Coast, but presumably all forthcoming maps for the Source engine will use this version number.

Then follows an array of 16-byte lump_t structures. HEADER_LUMPS is defined as 64, so there are 64 entries, however only 52 of these lumps are currently used, the rest being undefined.

Each lump_t is defined in bspfile.h:

struct lump_t
{
	int	fileofs;	// offset into file (bytes)
	int	filelen;	// length of lump(bytes)
	int	version;	// lump format version
	char	fourCC[4];	// lump ident code
};

The first two integers contain the byte offset (from the beginning of the bsp file) and byte length of that lump's data block; an integer defining the version number of the format of that lump (usually zero), and then a four byte identifier that is in practice always 0, 0, 0, 0. Unused members of the lump_t array (those that have no data to point to) have all elements set to zero.

Lump offsets (and their corresponding data lumps) are always rounded up to the nearest 4-byte boundary, though the lump length may not be.

The type of data pointed to by the lump_t array is defined by its position in the array; for example, the first lump in the array (Lump 0) is always the BSP file's entity data (see below). The actual location of the data in the BSP file is defined by the offset and length entries for that lump, and does not need to be in any particular order in the file; for example, the entity data is usually stored towards the end of the BSP file despite being first in the lump array. The array of lump_t headers is therefore a directory of the actual lump data, which may be located anywhere else in the file.

The order of the lumps in the array is defined as (lumps with unknown or uncertain purpose are marked with (?)):

Lump Name Purpose
0 Entities Map entities
1 Planes Plane array
2 Texdata Index to texture names
3 Vertexes Vertex array
4 Visibility Compressed visibility bit arrays
5 Nodes BSP tree nodes
6 Texinfo Face texture array
7 Faces Face array
8 Lighting Lightmap samples
9 Occlusion Occlusion data(?)
10 Leaves BSP tree leaf nodes
11 Unused
12 Edges Edge array
13 Surfedges Index of edges
14 Models Brush models (geometry of brush entities)
15 Worldlights Light entities
16 LeafFaces Index to faces in each leaf
17 LeafBrushes Index to brushes in each leaf
18 Brushes Brush array
19 Brushsides Brushside array
20 Areas Area array
21 AreaPortals Portals between areas
22 Portals Polygons defining the boundary between adjacent leaves(?)
23 Clusters Leaves that are enterable by the player
24 PortalVerts Vertices of portal polygons
25 Clusterportals Polygons defining the boundary between adjacent clusters(?)
26 Dispinfo Displacement surface array
27 OriginalFaces Brush faces array before BSP splitting
28 Unused
29 PhysCollide Physics collision data(?)
30 VertNormals Vertex normals(?)
31 VertNormalIndices Vertex normal index array(?)
32 DispLightmapAlphas Displacement lightmap data(?)
33 DispVerts Vertices of displacement surface meshes
34 DispLightmapSamplePos Displacement lightmap data(?)
35 GameLump Game-specific data lump
36 LeafWaterData (?)
37 Primitives Non-polygonal primatives(?)
38 PrimVerts (?)
39 PrimIndices (?)
40 Pakfile Embedded uncompressed-Zip format file
41 ClipPortalVerts (?)
42 Cubemaps Env_cubemap location array
43 TexdataStringData Texture name data
44 TexdataStringTable Index array into texdata string data
45 Overlays Info_overlay array
46 LeafMinDistToWater (?)
47 FaceMacroTextureInfo (?)
48 DispTris Displacement surface triangles
49 PhysCollideSurface Physics collision surface data(?)
50-52 Unused
53 LightingHDR HDR related lighting data(?)
54 WorldlightsHDR HDR related worldlight data(?)
55 LeaflightHDR1 HDR related leaf lighting data(?)
56 LeaflightHDR2 HDR related leaf lighting data(?)
57-63 Unused

Lumps 53-56 are only used in version 20 BSP files; the lump names are unofficial and currently only guesses can be made about their content.

The structure of the data lumps for the known entries is described below. Many of the lumps are simple arrays of structures; however some are of variable length depending on their content. The maximum size or number of entries in each lump is also defined in the bspfile.h file, as MAX_MAP_*.

Finally, the header ends with an integer containing the map revision number. This number is based on the revision number of the map's vmf file, which seems to increase each time the map is saved in the Hammer editor.

Immediately following the header is the first data lump. This can be any lump in the preceding list (pointed to using the offset field of that lump), though in practice the first data lump is Lump 1, the plane data array.

Lumps

Plane

The basis of the BSP geometry is defined by planes, which are used as splitting surfaces across the BSP tree structure.

The plane lump (1) is an array of dplane_t structures:

struct dplane_t
{
	Vector	normal;	// normal vector
	float	dist;	// distance from origin
	int	type;	// plane axis identifier
};

where the Vector type is a 3-vector defined as:

struct Vector
{
	float x;
	float y;
	float z;
};

Floats are 4 bytes long; there are thus 20 bytes per plane, and the plane lump should be a multiple of 20 bytes long.

The plane is represented by the element normal, a normal vector, which is a unit vector (length 1.0) perpendicular to the plane's surface. The position of the plane is given by dist, which is the distance from the map origin (0,0,0) to the nearest point on the plane.

Mathematically, the plane is described by the set of points (x, y, z) in the equation:

F(x,y,z) = Ax By Cz D

where A, B, and C are given by the components normal.x, normal.y and normal.z, and D is dist. Each plane is infinite in extent, and divides the whole of the map coordinate volume into three pieces, on the plane (F=0), in front of the plane (F>0), and behind the plane (F<0).

Note that planes have a particular orientation, corresponding to which side is considered "in front" of the plane, and which is "behind". The orientation of a plane can be flipped by negating the A, B, C, and D components.

The type member of the structure seems to contain flags that indicate planes that are perpendicular to coordinates axes, but is usually not used.

The can be up to 65536 planes in a map (MAX_MAP_PLANES).

Vertex

The vertex lump (3) is an array of coordinates of all the vertices (corners) of brushes in the map geometry. Each vertex is a Vector of 3 floats (x, y, and z), giving 12 bytes per vertex.

Note that vertices can be shared between faces, if the vertices coincide exactly.

There are a maximum of 65536 vertices in a map (MAX_MAP_VERTS).

Edge

The edge lump (12) is an array of dedge_t structures:

struct dedge_t
{
	unsigned short	v[2];	// vertex indices
};

Each edge is simply a pair of vertex indices (which index into the vertex lump array). The edge is defined as the straight line between the two vertices. Usually, the edge array is referenced through the Surfedge array (see below).

As for vertices, edges can be shared between adjacent faces. There is a limit of 256000 edges in a map (MAX_MAP_EDGES).

Surfedge

The Surfedge lump (13), presumable short for surface edge, is an array of (signed) integers. Surfedges are used to reference the edge array, in a somewhat complex way. The value in the surfedge array can be positive or negative. The absolute value of this number is an index into the edge array: if positive, it means the edge is defined from the first to the second vertex; if negative, from the second to the first vertex.

By this method, the Surfedge array allows edges to be referenced for a particular direction. (See the face lump entry below for more on why this is done).

There is a limit of 512000 (MAX_MAP_SURFEDGES) surfedges per map. Note that the number of surfedges is not necessarily the same as the number of edges in the map.

Face and Original Face

The face lump (7) contains the major geometry of the map, used by the game engine to render the viewpoint of the player. The face lump contains faces after they have undergone the BSP splitting process; they therefore do not directly correspond to the faces of brushes created in Hammer. Faces are always flat, convex polygons, though they can contain edges that are co-linear.

The face lump is one of the more complex structures of the map file. It is an array of dface_t entries, each 56 bytes long:

struct dface_t
{
	unsigned short	planenum;	// the plane number
	byte	side;	// faces opposite to the node's plane direction
	byte	onNode;	// 1 of on node, 0 if in leaf
	int	firstedge;	// index into surfedges
	short	numedges;	// number of surfedges
	short	texinfo;	// texture info
	short	dispinfo;	// displacement info
	short	surfaceFogVolumeID;	// ?
	byte	styles[4];	// switchable lighting info
	int	lightofs;	// offset into lightmap lump
	float	area;	// face area in units^2
	int	LightmapTextureMinsInLuxels[2];	// texture lighting info
	int	LightmapTextureSizeInLuxels[2];	// texture lighting info
	int	origFace;	// original face this was split from
	unsigned short	numPrims;	// primitives
	unsigned short	firstPrimID;
	unsigned int	smoothingGroups;	// lightmap smoothing group
};

The first member planenum is the plane number, i.e., the index into the plane array that corresponds to the plane that is aligned with this face in the world. Side is zero if this plane faces in the same direction as the face (i.e. "out" of the face) or non-zero otherwise.

Firstedge is an index into the Surfedge array; this and the following numedges entries in the surfedge array define the edges of the face. As mentioned above, whether the value in the surfedge array is positive or negative indicates whether the corresponding pair of vertices listed in the Edge array should be traced from the first vertex to the second, or vice versa. The vertices which make up the face are thus referenced in clockwise order; when looking towards the face, each edge is traced in a clockwise direction. This makes rendering the faces easier, and allows quick culling of faces that face away from the viewpoint.

Texinfo is an index into the Texinfo array (see below), and represents the texture to be drawn on the face. Dispinfo is an index into the Dispinfo array is the face is a displacement surface (in which case, the face defines the boundaries of the surface); otherwise, it is -1. SurfaceFogVolumeID appears to be related to drawing fogging when the player's viewpoint is underwater or looking through water.

OrigFace is the index of the original face which was split to produce this face. NumPrims and firstPrimID are related to the drawing of "Non-polygonal primitives" (see below). The other members of the structure are used to reference face-lighting info (see the Lighting lump, below).

The face array is limited to 65536 (MAX_MAP_FACES) entries.

The original face lump (27) has the same structure as the face lump, but contains the array of faces before the BSP splitting process is done. These faces are therefore closer to the original brush faces present in the precompile map than the face array, and there are less of them. The origFace entry for all original faces is zero. The maximum size of the original face array is also 65536 entries.

Both the face and original face arrays are culled; that is, many faces present before compilation of the map (primarily those that face towards the "void" outside the map) are remove from the array.

Version 17 BSP files contain a substantially modified dface_t structure. The known elements are:

struct dface_bsp17_t
{
	byte	unknown[32];
	unsigned short	planenum;
	byte	side;
	byte	onNode;
	int	firstedge;
	short	numedges;
	short	texinfo;
	short	dispinfo;
	byte	unknown[50];
	int	origFace;
	unsigned int	smoothingGroups;
};

The extra data seems to be related to lighting of the face, and makes the length of the structure 104 bytes per face. Both the face lump and the original face lump in version 17 files use this structure.

Brush and Brushside

The brush lump (18) contains all brushes that were present in the original vmf file before compiling. (It is the presence of the brush and brushside lumps in HL2 bsp files that makes decompiling them a much easier job than for HL1 files, which lacked this info). The lump is an array of 12-byte dbrush_t structures:

struct dbrush_t
{
	int	firstside;	// first brushside
	int	numsides;	// number of brushsides
	int	contents;	// contents flags
};

The first integer firstside is an index into the brushside array lump, this and the following numsides brushsides make up all the sides in this brush. The contents entry contains bitflags which determine the contents of this brush. The values are binary-ORed together, and are defined in the public/bspflags.h file:

CONTENTS_EMPTY	0	// No contents
CONTENTS_SOLID	0x1	// an eye is never valid in a solid
CONTENTS_WINDOW	0x2	// translucent, but not watery (glass)
CONTENTS_AUX	0x4
CONTENTS_GRATE	0x8	// alpha-tested "grate" textures.	Bullets/sight pass through, but solids don't
CONTENTS_SLIME	0x10
CONTENTS_WATER	0x20
CONTENTS_MIST	0x40
CONTENTS_OPAQUE	0x80	// things that cannot be seen through (may be non-solid though)
CONTENTS_TESTFOGVOLUME	0x100	// can see into a fogvolume (water)
CONTENTS_MOVEABLE	0x4000
CONTENTS_AREAPORTAL	0x8000
CONTENTS_PLAYERCLIP	0x10000
CONTENTS_MONSTERCLIP	0x20000
CONTENTS_CURRENT_0	0x40000
CONTENTS_CURRENT_90	0x80000
CONTENTS_CURRENT_180	0x100000
CONTENTS_CURRENT_270	0x200000
CONTENTS_CURRENT_UP	0x400000
CONTENTS_CURRENT_DOWN	0x800000
CONTENTS_ORIGIN	0x1000000	// removed before bsping an entity
CONTENTS_MONSTER	0x2000000	// should never be on a brush, only in game
CONTENTS_DEBRIS	0x4000000
CONTENTS_DETAIL	0x8000000	// brushes to be added after vis leafs
CONTENTS_TRANSLUCENT	0x10000000	// auto set if any surface has trans
CONTENTS_LADDER	0x20000000
CONTENTS_HITBOX	0x40000000	// use accurate hitboxes on trace

Some of these flags seem to be inherited from previous game engines and are not used in Source maps. They are also used to describe to contents of the map's leaves (see below). The CONTENTS_DETAIL flag is used to mark brushes that were in func_detail entities before compiling.

The brush array is limited to 8192 entries (MAX_MAP_BRUSHES).

The brushside lump (19) is an array of 8-byte structures:

struct dbrushside_t
{
	unsigned short	planenum;	// facing out of the leaf
	short	texinfo;	// texture info
	short	dispinfo;	// displacement info
	short	bevel;	// is the side a bevel plane?
};

Planenum is an index info the plane array, giving the plane corresponding to this brushside. Texinfo and dispinfo are references into the texture and displacement info lumps. Bevel is zero for normal brush sides, but 1 if the side is a bevel plane (which seem to be used for collison detection).

Unlike the face array, brushsides are not culled (removed) where they touch the void. Void-facing sides do however have their texinfo entry changed to that of a NODRAW texture during the compile process. Note there is no direct way of linking brushes and brushsides and the corresponding face array entries which are used to render that brush.

The maximum number of brushsides is 65536 (MAX_MAP_BRUSHSIDES). The maximum number of brushsides on a single brush is 128 (MAX_BRUSH_SIDES).

Node and Leaf

The node array (lump 5) and leaf array (lump 10) define the Binary Space Partition (BSP) tree structure of the map. The BSP tree is used by the engine to quickly determine the location of the player's viewpoint with respect to the map geometry, and along with the visibility information (see below), to decide which parts of the map are to be drawn.

The nodes and leaves form a tree structure. Each leaf represents a defined volume of the map, and each node represents the volume which is the sum of all its child nodes and leaves further down the tree.

Each node has exactly two children, which can be either another node or a leaf. A child node has two further children, and so on until all branches of the tree are terminated with leaves, which have no children. Each node also references a plane in the plane array. When determining the player's viewpoint, the engine is trying to find which leaf the viewpoint falls inside. It first compares the coordinates of the point with the plane referenced in the headnode (Node 0). If the point is in front of the plane, it then moves to the first child of the node; otherwise, it moves to the second child. If the child is a leaf, then it has completed its task. If it is another node, it then performs the same check against the plane referenced in this node, and follows the children as before. It therefore traverses the BSP tree until it finds which leaf the viewpoint lies in. The leaves, then, completely divide up the map volume into a set of non-overlapping, convex volumes defined by the planes of their parent nodes.

For more information on how the BSP tree is constructed, see the article "BSP for dummies" (http://www.planetquake.com/qxx/bsp/).

The node array consists of 32-byte structures:

struct dnode_t
{
	int	planenum;	// index into plane array
	int	children[2];	// negative numbers are -(leafs 1), not nodes
	short	mins[3];	// for frustom culling
	short	maxs[3];
	unsigned short	firstface;	// index into face array
	unsigned short	numfaces;	// counting both sides
	short	area;	// If all leaves below this node are in the same area, then
	// this is the area index. If not, this is -1.
	short	paddding;	// pad to 32 bytes length
};

Planenum is the entry in the plane array. The children[] members are the two children of this node; if positive, they are node indices; if negative, the value (-1-child) is the index into the leaf array (e.g., the value -100 would reference leaf 99).

The members mins[] and maxs[] are coordinates of a rough bounding box surrounding the contents of this node. The firstface and numfaces are indices into the face array that show which map faces are contained in this node, or zero if none are. The area value is the map area of this node (see below). There can be a maximum of 65536 nodes in a map (MAX_MAP_NODES).

The leaf array is an array with 56 bytes per element:

struct dleaf_t
{
	int	contents;	// OR of all brushes (not needed?)
	short	cluster;	// cluster this leaf is in
	short	area:9;	// area this leaf is in
	short	flags:7;	// flags
	short	mins[3];	// for frustum culling
	short	maxs[3];
	unsigned short	firstleafface;	// index into leaffaces
	unsigned short	numleaffaces;
	unsigned short	firstleafbrush;	// index into leafbrushes
	unsigned short	numleafbrushes;
	short	leafWaterDataID;	// -1 for not in water
	CompressedLightCube	ambientLighting;	// Precaculated light info for entities.
	short	padding;	// padding to 4-byte boundary
};

The leaf structure has similar contents to the node structure, except it has no children and no reference plane. Additional entries are the contents flags (see the brush lump, above), which shows the contents of any brushes in the leaf, and the cluster number of the leaf (see below). The area and flags members share a 16-bit bitfield and contain the area number and flags relating to the leaf. Firstleafface and numleaffaces index into the leafface array and show which faces are inside this leaf, if any. Firstleafbrush and numleafbrushes likewise index brushes inside this leaf through the leafbrush array.

The ambientLighting element is related to lighting of objects in the leaf, and consists of a CompressedLightCube structure, which is 24 bytes in length. Version 17 BSP files have a modified dleaf_t structure that omits the ambient lighting data, making the entry for each leaf only 32 bytes in length. The same shortened structure is also used for version 20 BSP files, with the ambient lighting information for LDR and HDR probably contained in the new lumps 55 and 56.

All leaves are convex polyhedra, and are defined by the planes of their parent nodes. They do not overlap. Any point in the coordinate space is in one and only one leaf of the map. A leaf which is not filled with a solid brush and can be entered by the player in the usual course of the game has a cluster number set; this is used in conjunction with the visibility information (below).

There are usually multiple, unconnected BSP trees in a map. Each one corresponds to an entry in model array (see below) and the headnode of each tree is referenced there. The first tree is the worldspawn model, the overall geometry of the level. Successive trees are the models of each brush entity in the map.

The creation of the BSP tree is done by the VBSP program, during the first phase of map compilation. Exactly how the tree is created, and how the map is divided into leaves, can be influenced by the map author by the use of HINT brushes, func_details, and the careful layout of all brushes in the map.

LeafFace and LeafBrush

The leafface lump (16) is an array of shorts which are used to map from faces referenced in the leaf structure to indices in the face array. The leafbrush lump (17) does the same thing for brushes referenced in leaves. Their maximum sizes are both 65536 entries (MAX_MAP_LEAFFACES, MAX_MAP_LEAFBRUSHES).

Texinfo, Texdata, TexdataStringData and TexdataStringTable

The texture information in a map is split across a number of different lumps. The Texinfo lump is the most fundamental, referenced by the face and brushside arrays, and it in turn references the other texture lumps.

The texinfo lump (6) contains an array of texinfo_t structures:

struct texinfo_t
{
	float	textureVecs[2][4];	// [s/t][xyz offset]
	float	lightmapVecs[2][4];	// [s/t][xyz offset] - length is in units of texels/area
	int	flags;	// miptex flags	overrides
	int	texdata;	// Pointer to texture name, size, etc.
}

Each texinfo is 72 bytes long.

The first array of floats is in essence two vectors that represent how the texture is orientated and scaled when rendered on the world geometry. The two vectors, s and t, are the mapping of the left-to-right and down-to-up directions in the texture pixel coordinate space, onto the world. Each vector has an x, y, and z component, plus an offset which is the "shift" of the texture in that direction relative to the world. The length of the vectors represent the scaling of the texture in each direction.

The 2D coordinates (u, v) of a texture pixel (or texel) are mapped to the world coordinates (x, y, z) of a point on a face by:

u = tv0,0 . x tv0,1 . y tv0,2 . z tv0,3

v = tv1,0 . x tv1,1 . y tv1,2 . z tv1,3

where tvA,B is textureVecs[A][B].

The lightmapVecs float array performs a similar mapping of the lightmap samples of the texture onto the world.

The flags entry contains bitflags which are defined in bspflags.h:

SURF_LIGHT	0x0001	// value will hold the light strength
SURF_SLICK	0x0002	// effects game physics
SURF_SKY	0x0004	// don't draw, but add to skybox
SURF_WARP	0x0008	// turbulent water warp
SURF_TRANS	0x0010	// surface is transparent
SURF_WET	0x0020	// the surface is wet
SURF_FLOWING	0x0040	// scroll towards angle
SURF_NODRAW	0x0080	// don't bother referencing the texture
SURF_HINT	0x0100	// make a primary bsp splitter
SURF_SKIP	0x0200	// completely ignore, allowing non-closed brushes
SURF_NOLIGHT	0x0400	// Don't calculate light on this surface
SURF_BUMPLIGHT	0x0800	// calculate three lightmaps for the surface for bumpmapping
SURF_NOSHADOWS	0x1000	// Don't receive shadows
SURF_NODECALS	0x2000	// Don't receive decals
SURF_NOCHOP	0x4000	// Don't subdivide patches on this surface
SURF_HITBOX	0x8000	// surface is part of a hitbox

The flags seem to be derived from the texture's .vmt file contents, and specify special properties of that texture.

Finally the texdata entry is an index into the Texdata array, and specifies the actual texture.

The index of a Texinfo (referenced from a face or brushside) may be given as -1; this indicates that no texture information is associated with this face. This occurs on compiling brush faces given the SKIP, CLIP, or INVISIBLE type textures in the editor.

The texdata array (lump 2) consists of the structures:

struct dtexdata_t
{
	Vector	reflectivity;	// RGB reflectivity
	int	nameStringTableID;	// index into TexdataStringTable
	int	width, height;		// source image
	int	view_width, view_height;
};

The reflectivity vector corresponds to the RGB components of the reflectivity of the texture, as derived from the material's .vtf file. This is probably used in radiosity (lighting) calculations of what light bounces from the texture's surface. The nameStringTableID is an index into the TexdataStringTable array (below). The other members relate to the texture's source image.

The TexdataStringTable (lump 44) is an array of integers which are offsets into the TexdataStringData (lump 43). The TexdataStringData lump consists of concatenated null-terminated strings giving the texture name.

There can be a maximum of 12288 texinfos in a map (MAX_MAP_TEXINFO). There is a limit of 2048 texdatas in the array (MAX_MAP_TEXDATA) and up to 256000 bytes in the TexdataStringData data block (MAX_MAP_TEXDATA_STRING_DATA). Texture name strings are limited to 128 characters (TEXTURE_NAME_LENGTH).

Model

A Model, in the terminology of the BSP file format, is a collection of brushes and faces, often called a "bmodel". It should not be confused with the prop models used in Hammer, which are usually called "studiomodels" in the SDK source.

The model lump (14) consists of an array of 24-byte dmodel_t structures:

struct dmodel_t
{
	Vector	mins, maxs;	// bounding box
	Vector	origin;	// for sounds or lights
	int	headnode;	// index into node array
	int	firstface, numfaces;	// index into face array
};

Mins and maxs are the bounding points of the model. Origin is the coordinates of the model in the world, if set. Headnode is the index of the top node in the node array of the BSP tree which describes this model. Firstface and numfaces index into the face array and give the faces which make up this model.

The first model in the array (Model 0) is always "worldspawn", the overall geometry of the whole map excluding entities (but including func_detail brushes). The subsequent models in the array are associated with brush entities, and referenced from the entity lump.

There is a limit of 1024 models in a map (MAX_MAP_MODELS), including the worldspawn model zero.

Visibity

The visibility lump (4) is in a somewhat different format to the previously mentioned lumps. To understand it, some discussion of how the Source engine's visibility system works in necessary.

As mentioned in the "Node and Leaf Lumps" section above, every point in the map falls into exactly one convex volume called a leaf. All leaves that are on the inside of the map (not touching the void), and that are not covered by a solid brush can potentially have the player's viewpoint inside it during normal gameplay. Each of these enterable leaves (also called visleaves) gets assigned a cluster number. In HL2 BSP files, each enterable leaf corresponds to just one cluster.

(The terminology is slightly confusing here. According to the "Quake 2 BSP File Format" article, in the Q2 engine there could be multiple adjacent leaves in each cluster - thus the cluster is so called because it is a cluster of leaves. As I understand it, it seems from the HL2 SDK source that this situation may also occur during the compilation of HL2 maps; however, after the VVIS compile process is finished these adjacent leaves (and their parent nodes) are merged into a single leaf. In all finished HL2 maps I have examined, it seems there is only ever one leaf per cluster. Therefore, in HL2 BSP files the distinction between clusters and enterable leaves (visleaves) is not meaningful.)

Each cluster, then, is a volume in the map that the player can potentially be in. To render the map quickly, the game engine draws the geometry of only those clusters which are visible from the current cluster. Clusters which are completely occluded from view from the player's cluster need not be drawn. Calculating cluster-to-cluster visibility is the responsibility of the VVIS compile tool, and the resulting data is stored in the Visibility lump.

Once the engine knows a cluster is visible, the leaf data references all faces present in that cluster, allowing the contents of the cluster to be rendered.

The data is stored as an array of bit-vectors; for each cluster, a list of which other clusters are visible from it are stored as individual bits (1 if visible, 0 if occluded) in an array, with the nth bit position corresponding to the nth cluster. This is known as the cluster's Potentially Visible Set (PVS). Because of the large size of this data, the bit vectors are compressed by run-length encoding groups of zero bits in each vector.

There is also a Potentially Audible Set (PAS) array created for each cluster; this marks which clusters can hear sounds occurring in other clusters. The PAS seems to be created by merging the PVS bits of all clusters in current cluster's PVS.

The Visibilty lump is defined as:

struct dvis_t
{
	int	numclusters;
	int	byteofs[numclusters][2]
};

The first integer is the number of clusters in the map. It is followed by an array of integers giving the byte offset from the start of the lump to the start of the PVS bit array for each cluster, followed by the offset to the PAS array. Immediately following the array are the compressed bit vectors.

The decoding of the run-length compression works as follows: To find the PVS of a given cluster, start at the byte given by the offset in the byteofs[] array. If the current byte in the PVS buffer is zero, the following byte multiplied by 8 is the number of clusters to skip that are not visible. If the current byte is non-zero, the bits that are set correspond to clusters that are visible from this cluster. Continue until the number of clusters in the map is reached.

Example C code to decompress the bit vectors can be found in the "Quake 2 BSP File Format" document.

The maximum size of the Visibility lump is 0x1000000 bytes (MAX_MAP_VISIBILITY); that is, 16 Mb.

Entity

The entity lump (0) is an ASCII text buffer, and stores the entity data in a format very similar to that used in the pre-compiled vmf files. Its general form is as follows:

{
	"world_maxs" "480 480 480"
	"world_mins" "-480 -480 -224"
	"maxpropscreenwidth" "-1"
	"skyname" "sky_wasteland02"
	"classname" "worldspawn"
}
{
	"origin" "-413.793 -384 -192"
	"angles" "0 0 0"
	"classname" "info_player_start"
}
{
	"model" "*1"
	"targetname" "secret_1"
	"origin" "424 -1536 1800"
	"Solidity" "1"
	"StartDisabled" "0"
	"InputFilter" "0"
	"disablereceiveshadows" "0"
	"disableshadows" "0"
	"rendermode" "0"
	"renderfx" "0"
	"rendercolor" "255 255 255"
	"renderamt" "255"
	"classname" "func_brush"
}

Entities are defined between opening and closing braces ("{" and "}") and list on each line a pair of key/value properties inside quotation marks. The first entity is always "worldspawn". The "classname" property gives the entity type, and the "targetname" property gives the entity's name as defined in Hammer (if it has one). The "model" property is slightly special if it starts with an asterisk (*), the following number is an index into the model array (see above) which corresponds to the brushes associated with that entity. Otherwise, the value contains the name of a prop model. Other key/value pairs correspond to the properties of the entity as set in Hammer.

Note that func_detail, env_cubemap, info_overlay and prop_static entities are striped out of the entity data by the compile process, and stored elsewhere in the bsp file.

The entity lump can be a maximum of 256 kbytes long (MAX_MAP_ENTSTRING) and contain up to 4096 entities (MAX_MAP_ENTITIES). Each key string can be a maximum of 32 characters (MAX_KEY) and the value strings up to 1024 characters (MAX_VALUE).

Game

The Game lump (35) seems to be intended to be used for map data that is specific to a particular game using the Source engine, so that the file format can be extended without altering the previously defined format. It starts with a game lump header:

struct dgamelumpheader_t
{
	int lumpCount;	// number of game lumps
	dgamelump_t gamelump[lumpCount];
};

where the gamelump directory array is defined by:

struct dgamelump_t
{
	int	id;	// gamelump ID
	unsigned short flags;	// flags
	unsigned short version;	// gamelump version
	int	fileofs;	// offset to this gamelump
	int	filelen;	// length
};

The gamelump is identified by the 4-byte id member, which defines what data is stored in it, and the byte position of the data (from the start of the file) and its length is given in fileofs and filelen.

Of interest is the gamelump which is used to store prop_static entities, which uses the gamelump ID of 'sprp' ASCII (1936749168 decimal). Unlike most other entities, prop_statics are not stored in the entity lump. The gamelump formats used in HL2 are defined in the public/gamebspfile.h header file.

The first element of the prop_static game lump is the dictionary; this is an integer count followed by the list of model (prop) names used in the map:

struct StaticPropDictLump_t
{
	int	dictEntries;
	char	name[dictEntries];	// model name
};

Each name entry is 128 characters long, null-padded to this length.

Following the dictionary is the leaf array:

struct StaticPropLeafLump_t
{
	int leafEntries;
	unsigned short	leaf[leafEntries];
};

Presumably, this array is used to index into the leaf lump to locate the leaves that each prop static is located in. Note that a prop static may span several leaves.

Next, an integer giving the number of StaticPropLump_t entries, followed by that many structures themselves:

struct StaticPropLump_t
{
	Vector	Origin;	// origin
	QAngle	Angles;	// orientation (pitch roll yaw)
	unsigned short	PropType;	// index into model name dictionary
	unsigned short	FirstLeaf;	// index into leaf array
	unsigned short	LeafCount;
	unsigned char	Solid;	// solidity type
	unsigned char	Flags;
	int	Skin;	// model skin numbers
	float	FadeMinDist;
	float	FadeMaxDist;
	Vector	LightingOrigin;	// for lighting
	float	ForcedFadeScale;	// only present in version 5 gamelump
};


The coordinates of the prop are given by the Origin member; its orientation (pitch, roll, yaw) is given by the Angles entry, which is a 3-float vector. The PropType element is an index into the dictionary of prop model names, given above. The other elements correspond to the location of the prop in the BSP structure of the map, its lighting, and other entity properties as set in Hammer. The last element (ForcedFadeScale) is only present in the prop_static structure if the gamelump is specified as version 5 (dgamelump_t.version above); both version 4 and version 5 static prop gamelumps are used in official HL2 maps.

Other gamelumps used in HL2 BSP files are the detail prop gamelump (ID is 'dprp'), and the detail prop lighting lump (ID: 'dplt'). These are used for the prop_detail entities (grass tufts, etc.) automatically emitted by certain textures when placed on displacement surfaces. In version 20 BSP files there is also another gamelump (ID: 'dplh') which is probably related to HDR lighting of detail props.

There does not seem to be a specified limit on the size of the game lump.

Dispinfo, DispVerts and DispTris

Displacement surfaces are the most complex parts of a BSP file, and I will cover only part of their format here. Their data is split over a number of different data lumps in the file, but the fundamental reference to them is through the dispinfo lump (26). Dispinfos are referenced from the face, original face, and brushside arrays.

struct ddispinfo_t
{
	Vector	startPosition;	// start position used for orientation
	int	DispVertStart;	// Index into LUMP_DISP_VERTS.
	int	DispTriStart;	// Index into LUMP_DISP_TRIS.
	int	power;	// power - indicates size of surface (2^power	1)
	int	minTess;	// minimum tesselation allowed
	float	smoothingAngle;	// lighting smoothing angle
	int	contents;	// surface contents
	unsigned short	MapFace;	// Which map face this displacement comes from.
	int	LightmapAlphaStart;	// Index into ddisplightmapalpha.
	int	LightmapSamplePositionStart;	// Index into LUMP_DISP_LIGHTMAP_SAMPLE_POSITIONS.
	CDispNeighbor	EdgeNeighbors[4];	// Indexed by NEIGHBOREDGE_ defines.
	CDispCornerNeighbors	CornerNeighbors[4];	// Indexed by CORNER_ defines.
	unsigned long	AllowedVerts[ALLOWEDVERTS_SIZE];	// active verticies
};

The structure is 176 bytes long. The startPosition element is the coordinates of the first corner of the displacement. DispVertStart and DispTriStart are indices into the DispVerts and DispTris lumps. The power entry gives the number of subdivisions in the displacement surface - allowed values are 2, 3 and 4, and these correspond to 4, 8 and 16 subdivisions on each side of the displacement surface. The structure also references any neighbouring displacements on the sides or the corners of this displacement through the EdgeNeighbors and CornerNeighbors members. There are complex rules governing the order that these neighbour displacements are given; see the comments in bspfile.h for more. The MapFace value is an index into the face array and is face that was turned into a displacement surface. This face is used to set the texture and overall physical location and boundaries of the displacement.

The DispVerts lump (33) contains the vertex data of the displacements. It is given by:

struct dDispVert
{
	Vector	vec;	// Vector field defining displacement volume.
	float	dist;	// Displacement distances.
	float	alpha;	// "per vertex" alpha values.
};

where vec is the normalized vector of the offset of each displacement vertex from its original (flat) position; dist is the distance the offset has taken place; and alpha is the alpha-blending of the texture at that vertex.

A displacement of power p references (2^p 1)^2 dispverts in the array, starting from the DispVertStart index.

The DispTris lump (48) contains "triangle tags" or flags related to the properties of a particular triangle in the displacement mesh:

struct dDispTri
{
	unsigned short Tags;	// Displacement triangle tags.
};

where the flags are:

DISPTRI_TAG_SURFACE	1
DISPTRI_TAG_WALKABLE	2
DISPTRI_TAG_BUILDABLE	4
DISPTRI_FLAG_SURFPROP1	8
DISPTRI_FLAG_SURFPROP2	16

There are 2x(2^p)^2 DispTri entries for a displacement of power p. They are presumably used to indicate properties for each triangle of the displacement such as whether the surface is walkable at that point (not too steep to climb).

There are a limit of 2048 Dispinfos per map, and the limits of DispVerts and DispTris are such that all 2048 displacements could be of power 4 (maximally subdivided).

Other displacement-related data are the DispLightmapAlphas (32) and DispLightmapSamplePos (34) lumps, which seem to relate to lighting of each displacement surface.

Pakfile

The Pakfile lump (40) is a special lump that can contains multiple files which are embedded into the bsp file. Usually, they contain special texture (.vtf) and material (.vmt) files which are used to store the reflection maps from env_cubemap entities in the map; these files are built and placed in the Pakfile lump when the "buildcubemaps" console command is executed. The Pakfile can optionally contain such things as custom textures and prop models used in the map, and are placed into the bsp file by using the BSPZIP program (or alternate programs such as [pakrat.html Pakrat]). These files are integrated into the game engine's filesystem and will be loaded preferentially before externally located files are used.

The format of the Pakfile lump is identical to that used by the Zip compression utility when no compression is specified (i.e., the individual files are stored in uncompressed format). If the Pakfile lump is extracted and written to a file, it can therefore be opened with WinZip and similar programs.

The header public/zip_uncompressed.h defines the structures present in the Pakfile lump. The last element in the lump is a ZIP_EndOfCentralDirRecord structure. This points to an array of ZIP_FileHeader structures immediately preceeding it, one for each file present in the Pak. Each of these headers then point to ZIP_LocalFileHeader structures that are followed by that file's data.

The Pakfile lump is usually the last element of the bsp file.

Cubemap

The Cubemap lump (42) contains the location of all env_cubemap entities in the map:

struct dcubemapsample_t
{
	int	origin[3];	// position of light snapped to the nearest integer
	unsigned char	size;	// resolution of cubemap, 0 - default
};

The origin member contains integer x,y,z coordinates of the cubemap, and the size member is resolution of the cubemap, specified as 2^(size-1) pixels square. If set as 0, the default size of 6 (32x32 pixels) is used. There can be a maximum of 1024 (MAX_MAP_CUBEMAPSAMPLES) cubemaps in a file.

When the "buildcubemaps" console command is performed, six snapshots of the map (one for each direction) are taken at the location of each env_cubemap entity. These snapshots are stored in a multi-frame texture (vtf) file, which is added to the Pakfile lump (see above). The textures are named cX_Y_Z.vtf, where (X,Y,Z) are the (integer) coordinates of the corresponding cubemap.

Faces containing materials that are environment mapped (e.g. shiny textures) reference their assigned cubemap through their material name. A face with a material named (e.g.) walls/shiny.vmt is altered (new Texinfo & Texdata entries are created) to refer to a renamed material maps/mapname/walls/shiny_X_Y_Z.vmt, where (X,Y,Z) are the cubemap coordinates as before. This .vmt file is also stored in the Pakfile, and references the cubemap .vtf file through its $envmap property.

Version 20 files contain extra cX_Y_Z.hdr.vtf files in the Pakfile lump, containing HDR texture files in RGBA16161616F (16-bit per channel) format.

Overlay

Unlike the simpler decals (infodecal entities), info_overlays are removed from the entity lump and stored separately in the Overlay lump (45). The structure is reflects the properties of the entity in Hammer almost exactly:

struct doverlay_t
{
	int	Id;
	short	TexInfo;
	unsigned short	FaceCountAndRenderOrder;
	int	Ofaces[OVERLAY_BSP_FACE_COUNT];
	float	U[2];
	float	V[2];
	Vector	UVPoints[4];
	Vector	Origin;
	Vector	BasisNormal;
};

The FaceCountAndRenderOrder member is split into two parts; the lower 14 bits are the number of faces that the overlay appears on, with the top 2 bits being the render order of the overlay (for overlapping decals). The Ofaces array, which is 64 elements in size (OVERLAY_BSP_FACE_COUNT) are the indices into the face array indicating which map faces the overlay should be displayed on. The other elements set the texture, scale, and orientation of the overlay decal. There can be a maximum of 512 overlays per file (MAX_MAP_OVERLAYS).

Lighting

The lighting lump (8) is used to store the static lightmap samples of map faces. Each lightmap sample is a colour tint that multiplies the colours of the underlying texture pixels, to produce lighting of varying intensity. These lightmaps are created during the VRAD phase of map compilation and are referenced from the dface_t structure. The current lighting lump version is 1.

Each dface_t may have a up to four lightstyles defined in its styles[] array (which contains 255 to represent no lightstyle). The number of luxels in each direction of the face is given by the two LightmapTextureSizeInLuxels[] members (plus 1), and the total number of luxels per face is thus (LightmapTextureSizeInLuxels[0] 1) * (LightmapTextureSizeInLuxels[1] 1).

Each face gives a byte offset into the lighting lump in its lightofs member (if no lighting information is used for this face e.g. faces with skybox, nodraw and invisible textures, lightofs is -1.) There are (number of lightstyles)*(number of luxels) lightmap samples for each face, where each sample is a 4-byte ColorRGBExp32 structure:

struct ColorRGBExp32
{
	byte r, g, b;
	signed char exponent;
};

Standard RGB format can be obtained from this by multiplying each colour component by 2^(exponent). For faces with bumpmapped textures, there are four times the usual number of lightmap samples, presumably containing samples used to compute the bumpmapping.

Immediately preceeding the lightofs-referenced sample group, there are single samples containing the average lighting on the face, one for each lightstyle, in reverse order from that given in the styles[] array.

Version 20 BSP files contain a second, identically sized lighting lump in lump 53. This is presumed to store more accurate (higher-precision) HDR data for each lightmap sample. The format is currently unknown, but is also 32 bits per sample.

The maximum size of the lighting lump is 0x1000000 bytes, i.e. 16 Mb (MAX_MAP_LIGHTING).

Other

There are nineteen other lumps defined in the HL2 BSP file format that have not yet been covered. These lumps were not needed for the creation of a decompiler, and so I have not researched them or their formats. There are also four lumps only present in version 20 BSP files. I will give general information and likely guesses to the content of these lumps.

The Occlusion lump (9) contains data on func_occluder entities which are switchable entities that block the drawing of visible entities behind them.

The Worldlights lump (15) contains information on each static light entity in the world, and seems to be used to provide semi-dynamic lighting for moving entities.

The Areas lump (20) references the Areaportals lump (21) and is used with func_areaportal and func_areaportalwindow entities to define sections of the map that can be switched to render or not render.

The Portals (22), Clusters (23), PortalVerts (24), ClusterPortals (25), and ClipPortalVerts (41) lumps are used by the VVIS phase of the compile to ascertain which clusters can see which other clusters. A cluster is a player-enterable leaf volume in the map (see above). A "portal" is a polygon boundary between a cluster or leaf and an adjacent cluster or leaf. Most of this information is also used by the VRAD program to calculate static lighting, and then is removed from the bsp file.

Lumps 29 (PhysCollide) and 49 (PhysCollideSurface) seem to be related to the physical simulation of entity collisions in the game engine.

The VertNormal (30) and VertNormalIndices (31) lumps may be related to smoothing of lightmaps on faces.

The FaceMacroTextureInfo lump (47) is a short array containing the same number of members as the number of faces in the map. If the entry for a face contains anything other than -1 (0xFFFF), it is an index of a texture name in the TexDataStringTable. In VRAD, the corresponding texture is mapped onto the world extents, and used to modulate the lightmaps of that face. There is also a base macro texture (located at materials/macro/mapname/base.vtf) that is applied to all faces if found. Only maps in VTMB seem to make any use of macro textures.

LeafWaterData (36) and LeafMinDistToWater (46) lumps may be used to determine player position with respect to water volumes.

The Primitives (37), PrimVerts (38) and PrimIndices (39) lumps are used in reference to "non-polygonal primitives". They are also sometimes called "waterstrips", "waterverts" and "waterindices" in the SDK Source, since they were originally only used to subdivide water meshes. They are now used to prevent the appearance of cracks between adjacent faces, if the face edges contain a "T-junction" (a vertex collinearly between two other vertices). The PrimIndices lump defines a set of triangles between face vertices, that tessellate the face. They are referenced from the Primatives lump, which is in turn referenced by the face lump data. Current maps do not seem to use the PrimVerts lump at all. (Ref.)

Version 20 files containing HDR lighting information have four extra lumps, the contents of which are currently uncertain. Lump 53 is always the same size as the standard lighting lump (8) and probably contains higher-precision data for each lightmap sample. Lump 54 is the same size as the worldlight lump (15) and presumably contains HDR-related data for each light entity. Lumps 55 and 56 both seem to be 24-byte records (possibly CompressedLightCube structures) with the same count as the number of leaves in the map. They are probably thus HDR-related per-leaf lighting information.