VPK (file format): Difference between revisions

From Valve Developer Community
Jump to navigation Jump to search
mNo edit summary
(VPK2)
Line 1: Line 1:
The [[VPK]] file format is a package format used by newer Source engine games like [[Left 4 Dead]] and [[Portal 2]] to store game related content.
The [[VPK]] file format is a package format used by post-GCF Source engine games to store content.


== Conception ==
== Conception ==
Line 5: Line 5:
Prior to [[Left 4 Dead]], typical Source engine games stored their content in [[GCF]] files.  Executable files, modifiable files (e.g. configuration files) and custom content were copied and stored locally on the user's hard drive.  Possibly brought on by [http://nemesis.thewavelength.net/index.php?c=216 poor performance], the [[NCF]] file format was introduced and all game content was copied entirely to the hard drive.  This, however, introduced a new problem.  Source engine materials and models are stored in thousands of small files and it would be expensive to continuously open and close these files.  The solution was the conception of the VPK file format which is used to store Left 4 Dead materials, models and particles in a handful of files which can be quickly accessed.
Prior to [[Left 4 Dead]], typical Source engine games stored their content in [[GCF]] files.  Executable files, modifiable files (e.g. configuration files) and custom content were copied and stored locally on the user's hard drive.  Possibly brought on by [http://nemesis.thewavelength.net/index.php?c=216 poor performance], the [[NCF]] file format was introduced and all game content was copied entirely to the hard drive.  This, however, introduced a new problem.  Source engine materials and models are stored in thousands of small files and it would be expensive to continuously open and close these files.  The solution was the conception of the VPK file format which is used to store Left 4 Dead materials, models and particles in a handful of files which can be quickly accessed.


== Design Decisions ==
== Features ==


=== Preload Data ===
=== Preload Data ===
Line 14: Line 14:


Previous Source engine games that had been distributed by the more advanced GCF file format had the luxury of internally fragmenting new and updated files.  This meant that new and updated files could be efficiently downloaded and saved with minimal bandwidth and disk IO.  Because the new VPK files are independent of distribution (Left 4 Dead is distributed by NCF files and Steam knows nothing of the VPK file format), their content is split up over multiple archives that seem to be limited to about 32 MB in size.  Because of this, when a file in a specific file is updated, only the VPK archive that contains the file needs to be updated.  Additionally, new files can be downloaded to their own individual archives.  This is why most of the newer archives are small in size; their contents are limited the the files added in a single update.
Previous Source engine games that had been distributed by the more advanced GCF file format had the luxury of internally fragmenting new and updated files.  This meant that new and updated files could be efficiently downloaded and saved with minimal bandwidth and disk IO.  Because the new VPK files are independent of distribution (Left 4 Dead is distributed by NCF files and Steam knows nothing of the VPK file format), their content is split up over multiple archives that seem to be limited to about 32 MB in size.  Because of this, when a file in a specific file is updated, only the VPK archive that contains the file needs to be updated.  Additionally, new files can be downloaded to their own individual archives.  This is why most of the newer archives are small in size; their contents are limited the the files added in a single update.
== Versions ==
; 1
: Left 4 Dead
: Left 4 Dead 2
: Alien Swarm
: Portal 2
: Source Filmmaker
: Dota 2
; 2
: Counter-Strike: Global Offensive


== File Format ==
== File Format ==
Line 19: Line 31:
A VPK package is actually spread out over multiple files sharing the same extension.  The directory is stored in a specific file called <name>_dir.vpk and the content is spread over several additional archive files called <name>_*.vpk (where * is the zero based archive index).  Consequentially, there are two file formats:
A VPK package is actually spread out over multiple files sharing the same extension.  The directory is stored in a specific file called <name>_dir.vpk and the content is spread over several additional archive files called <name>_*.vpk (where * is the zero based archive index).  Consequentially, there are two file formats:


=== Directory File Format ===
=== Directory ===


==== Header ====
==== Header ====


===== VPK 1 =====
Originally, the VPK file had no header or identifier.  This changed when the [http://store.steampowered.com/news/2620/ June 25, 2009 Left 4 Dead update] was released adding support for third party campaigns.  VPK directory files created after this date have the following header:
Originally, the VPK file had no header or identifier.  This changed when the [http://store.steampowered.com/news/2620/ June 25, 2009 Left 4 Dead update] was released adding support for third party campaigns.  VPK directory files created after this date have the following header:


struct VPKHeader
<source lang=cpp>struct VPKHeader_v1
{
{
unsigned int Signature;
const unsigned int Signature = 0x55aa1234;
unsinged int Version;
const unsinged int Version = 1;
unsigned int DirectoryLength;
unsigned int TreeLength; // The length of the directory
};
};
</source>


{| class=standard-table
If the file data is stored in the same file as the directory, its offset is <code>(sizeof(VPKHeader_v1) + TreeLength)</code>.
| '''Variable''' || '''Description'''
|-
| Signature || Identifying signature.  Always 0x55aa1234.
|-
| Version || Version number.  Always 1.
|-
| DirectoryLength || The length of the directory.  If the file data is not stored in archives, its offset is the offset of the start of the directory plus this number.
|}


One can check if a VPK file was created before the update by checking if the first four bytes do not match the above signature.
One can check if a VPK file was created before the update by checking if the first four bytes do not match the above signature.


==== Directory ====
===== VPK 2 =====
 
<source lang=cpp>struct VPKHeader_v2
{
const unsigned int Signature = 0x3412aa55;
const unsinged int Version = 2;
unsigned int TreeLength; // The length of the directory tree
 
int Unknown1; // 0 in CSGO
unsigned int FooterLength;
int Unknown3; // 48 in CSGO
int Unknown4; // 0 in CSGO
};
</source>
 
If the file data is stored in the same file as the directory, its offset is <code>(sizeof(VPKHeader_v2) + TreeLength)</code>.
 
==== Tree ====


The format of the directory is a little unorthodox.  It consists of a tree three levels deep that seems to be structured for file size.  The first level of the tree consists of file extensions (e.g. ''vmt'', ''vtf'' and ''mdl''), the second level consists of directory paths (e.g. ''materials/brick'', ''materials/decals/asphalt'' and ''models/infected''), and the third level consists of file names, file information and preload data.  Each tree node begins with a null terminated ASCII string and empty strings are used to signify the end of a parent node.  Pseudo-code to read the directory might look something like:
The format of the directory tree is a little unorthodox.  It consists of a tree three levels deep that seems to be structured for file size.  The first level of the tree consists of file extensions (e.g. ''vmt'', ''vtf'' and ''mdl''), the second level consists of directory paths (e.g. ''materials/brick'', ''materials/decals/asphalt'' and ''models/infected''), and the third level consists of file names, file information and preload data.  Each tree node begins with a null terminated ASCII string and empty strings are used to signify the end of a parent node.  Pseudo-code to read the directory might look something like:


  ReadString(file)
  ReadString(file)
Line 73: Line 96:
Immediately following the null terminator for the file name is the following structure:
Immediately following the null terminator for the file name is the following structure:


struct VPKDirectoryEntry
<source lang=cpp>
{
struct VPKDirectoryEntry
unsigned int CRC;
{
unsigned short PreloadBytes;
unsigned int CRC; // A 32bit CRC of the file's data.
unsigned short ArchiveIndex;
unsigned short PreloadBytes; // The number of bytes contained in the index file.
unsigned int EntryOffset;
 
unsigned int EntryLength;
// A zero based index of the archive this file's data is contained in.
unsigned short Terminator;
// If 0x7fff, the data follows the directory.
};
unsigned short ArchiveIndex;
 
// If ArchiveIndex is 0x7fff, the offset of the file data relative to the end of the directory (see the header for more details).
// Otherwise, the offset of the data from the start of the specified archive.
unsigned int EntryOffset;
 
// If zero, the entire file is stored in the preload data.
// Otherwise, the number of bytes stored starting at EntryOffset.
unsigned int EntryLength;


{| class=standard-table
const unsigned short Terminator = 0xffff;
| '''Variable''' || '''Description'''
};
|-
</source>
| CRC || A 32bit CRC of the file's data.
|-
| PreloadBytes || The number of preload bytes.
|-
| ArchiveIndex || The zero based index of the archive this file's data is contained in.  If 0x7fff, the data follows the directory.
|-
| EntryOffset || If ArchiveIndex is 0x7fff, the offset of the file data relative to the end of the directory (see the header for more details).  Otherwise, the offset of the data from the start of the specified archive.
|-
| EntryLength || If zero, the entire file is stored in the preload data, otherwise, the number of bytes stored starting at EntryOffset.
|-
| Terminator || Always 0xffff.
|}


If a file contains preload data, the preload data immediately follows the above structure.  The entire size of a file is PreloadBytes + EntryLength.
If a file contains preload data, the preload data immediately follows the above structure.  The entire size of a file is PreloadBytes + EntryLength.


=== Archive File Format ===
==== Footer ====
 
VPK2 adds a footer section (separate from embedded file data). Its purpose is unknown.
 
=== Archive ===


VPK Archives store raw file data.  They have no identifying header and know nothing of their contents.  Though not necessary, the raw file data is typically tightly packed.
VPK Archives store raw file data.  They have no identifying header and know nothing of their contents.  Though not necessary, the raw file data is typically tightly packed.
Line 115: Line 138:
If there is 1 null, the extension is skipped (It's the same extension as the last read entry), and then the path and name are read as usual.
If there is 1 null, the extension is skipped (It's the same extension as the last read entry), and then the path and name are read as usual.
If there are no nulls, the extension and path are the same as the last entry (Skipped), and only the name is read.
If there are no nulls, the extension and path are the same as the last entry (Skipped), and only the name is read.
This system has only been observed in VPK1.


==See also==
==See also==

Revision as of 04:53, 15 September 2012

The VPK file format is a package format used by post-GCF Source engine games to store content.

Conception

Prior to Left 4 Dead, typical Source engine games stored their content in GCF files. Executable files, modifiable files (e.g. configuration files) and custom content were copied and stored locally on the user's hard drive. Possibly brought on by poor performance, the NCF file format was introduced and all game content was copied entirely to the hard drive. This, however, introduced a new problem. Source engine materials and models are stored in thousands of small files and it would be expensive to continuously open and close these files. The solution was the conception of the VPK file format which is used to store Left 4 Dead materials, models and particles in a handful of files which can be quickly accessed.

Features

Preload Data

In order to efficiently access small or critical files, the beginning of each file can optionally be stored in the VPK directory. In practice, this seems to be limited to the first 1000 bytes of Source engine materials (VMT files) which are typically only a few hundred bytes in size.

Multiple Archives

Previous Source engine games that had been distributed by the more advanced GCF file format had the luxury of internally fragmenting new and updated files. This meant that new and updated files could be efficiently downloaded and saved with minimal bandwidth and disk IO. Because the new VPK files are independent of distribution (Left 4 Dead is distributed by NCF files and Steam knows nothing of the VPK file format), their content is split up over multiple archives that seem to be limited to about 32 MB in size. Because of this, when a file in a specific file is updated, only the VPK archive that contains the file needs to be updated. Additionally, new files can be downloaded to their own individual archives. This is why most of the newer archives are small in size; their contents are limited the the files added in a single update.

Versions

1
Left 4 Dead
Left 4 Dead 2
Alien Swarm
Portal 2
Source Filmmaker
Dota 2
2
Counter-Strike: Global Offensive

File Format

A VPK package is actually spread out over multiple files sharing the same extension. The directory is stored in a specific file called <name>_dir.vpk and the content is spread over several additional archive files called <name>_*.vpk (where * is the zero based archive index). Consequentially, there are two file formats:

Directory

Header

VPK 1

Originally, the VPK file had no header or identifier. This changed when the June 25, 2009 Left 4 Dead update was released adding support for third party campaigns. VPK directory files created after this date have the following header:

struct VPKHeader_v1
{
	const unsigned int Signature = 0x55aa1234;
	const unsinged int Version = 1;
	unsigned int TreeLength; // The length of the directory
};

If the file data is stored in the same file as the directory, its offset is (sizeof(VPKHeader_v1) + TreeLength).

One can check if a VPK file was created before the update by checking if the first four bytes do not match the above signature.

VPK 2
struct VPKHeader_v2
{
	const unsigned int Signature = 0x3412aa55;
	const unsinged int Version = 2;
	unsigned int TreeLength; // The length of the directory tree

	int Unknown1; // 0 in CSGO
	unsigned int FooterLength;
	int Unknown3; // 48 in CSGO
	int Unknown4; // 0 in CSGO
};

If the file data is stored in the same file as the directory, its offset is (sizeof(VPKHeader_v2) + TreeLength).

Tree

The format of the directory tree is a little unorthodox. It consists of a tree three levels deep that seems to be structured for file size. The first level of the tree consists of file extensions (e.g. vmt, vtf and mdl), the second level consists of directory paths (e.g. materials/brick, materials/decals/asphalt and models/infected), and the third level consists of file names, file information and preload data. Each tree node begins with a null terminated ASCII string and empty strings are used to signify the end of a parent node. Pseudo-code to read the directory might look something like:

ReadString(file)
	string = ""
	while true
		char = ReadChar(file)
		if char = null
			return string
		string = string + char
ReadDirectory(file)
	while true
		extension = ReadString(file)
		if extension = ""
			break
		while true
			path = ReadString(file)
			if path = ""
				break
			while true
				file = ReadString(file)
				if file = ""
					break
				ReadFileInformationAndPreloadData(file)

Immediately following the null terminator for the file name is the following structure:

struct VPKDirectoryEntry
{
	unsigned int CRC; // A 32bit CRC of the file's data.
	unsigned short PreloadBytes; // The number of bytes contained in the index file.

	// A zero based index of the archive this file's data is contained in.
	// If 0x7fff, the data follows the directory.
	unsigned short ArchiveIndex;

	// If ArchiveIndex is 0x7fff, the offset of the file data relative to the end of the directory (see the header for more details).
	// Otherwise, the offset of the data from the start of the specified archive.
	unsigned int EntryOffset;

	// If zero, the entire file is stored in the preload data.
	// Otherwise, the number of bytes stored starting at EntryOffset.
	unsigned int EntryLength;

	const unsigned short Terminator = 0xffff;
};

If a file contains preload data, the preload data immediately follows the above structure. The entire size of a file is PreloadBytes + EntryLength.

Footer

VPK2 adds a footer section (separate from embedded file data). Its purpose is unknown.

Archive

VPK Archives store raw file data. They have no identifying header and know nothing of their contents. Though not necessary, the raw file data is typically tightly packed.

Todo: Rewrite and merge Notes section to the format section

Notes

Valve apparently added skipping to the specifications. I found out when trying to write my own VPK parser in C#. This should be merged with the format area later, and reworded.

Valve uses nulls to signify if skipping is used. On a normal entry, it uses 2 nulls, and is followed by the format above. However, there are cases where there are only one, or no nulls at the start, and this means that some level of skipping is used.

If there are 2 nulls, no skipping is used, and the extension, path, and name are read as usual. If there is 1 null, the extension is skipped (It's the same extension as the last read entry), and then the path and name are read as usual. If there are no nulls, the extension and path are the same as the last entry (Skipped), and only the name is read.

This system has only been observed in VPK1.

See also

External Links