String: Difference between revisions

From Valve Developer Community
Jump to navigation Jump to search
m (revert)
No edit summary
Line 1: Line 1:
'''Strings''' are extensions of the [[Char]]acter [[variable]]s. '''Strings''' allow the storage of text in memory. A typical length limit for a '''string''' is 255 [[char]]acters, but can be larger if necessary. However, larger '''strings''' use more memory, so the 255 character limit helps to better reserve and use memory for such common [[variable]]s.
{{toc-right}}


For localization purposes, modern software generally makes use of character encoding systems which may use more than one byte per character (see [[UTF-8]] and [[UTF-16]]), so it is best to make use of the appropriate string-handling routines as provided by the system you are programming. While the need for, say, a Chinese or Russian translation may seem unnecessary at first, it it always a good practice to program without imposing arbitrary limits at an early stage.
[[W:Array data structure|Array]]s of <code>[[char]]</code> are commonly used to store [[W:ASCII|ASCII]] text. These buffers have a special term:''' [[W:String (computer science)|strings]]''', or sometimes "C strings". They are important, but complicated.


A common method for storing variable length strings is [[String Zero]].
== Null terminator ==


{{stub}}
'''A string is always one <code>char</code> larger than it appears. The extra item is the null terminator (binary zero, typed as <code>\0</code>).''' This is required because a [[pointer]] to a string is generally passed around, rather than the array itself, and a pointer does not contain data on the string's length.
 
Without the terminator it would be impossible to determine where the string ended and where the next variable, or just unassigned memory, began. This is what a [[W:buffer overflow|buffer overflow]] is, and those are bad news!
 
{{tip|The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.}}
 
== Creating from string literal ==
 
<source lang=cpp>
char* MyString = "Hello world"; // must assign to pointer!
</source>
 
This code creates a string from a [[W:string literal|string literal]]. The double quote marks are special syntax which generate an array of <code>char</code> from their contents. The code above therefore:
 
# Assigns 12 bytes of memory, somewhere arbitrary, to store the string. This is one byte for each character, plus an automatic twelfth for the null terminator.
# Creates a local <code>char</code> [[pointer]] containing the address of the first character (H in this case).
 
String literals are often handed with <code>[[W:const|const]]</code> variables. This is because a string literal will stay in memory for the process's whole lifespan.
 
== Creating by size ==
 
<source lang=cpp>
char MyString[12]; // accepts a static number only
 
int StringLen = 12;
char* pMyString = new char[StringLen]; // accepts a variable
 
delete[] pMyString; // always delete / delete[] anything created with 'new' after use
</source>
 
This code both allocates two 12-byte strings, but does not write anything to them (so their contents will be either blank or gibberish). They need to be assigned to, ideally with a string function like <code>strcpy()</code> or <code>sprintf()</code>.
 
The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses <code>[[W:new (C++)|new]]</code> to allocate the string itself somewhere else. The advantage of <code>new</code> is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling <code>[[W:delete (C++)|delete]]</code> (or <code>delete[]</code> for arrays) you will suffer a [[W:memory leak|memory leak]].
 
== String functions ==
 
There are a multitude of functions which process strings, the most common of which have Source-specific <code>V_*</code> equivalents. See [http://msdn.microsoft.com/en-us/library/f0151s4x.aspx MSDN] for a quite comprehensive list, or search VS' Class View for "V_str".


[[category:Variables]]
[[category:Variables]]
[[Category:Glossary]]
[[Category:Glossary]]

Revision as of 06:13, 8 December 2010

Arrays of char are commonly used to store ASCII text. These buffers have a special term: strings, or sometimes "C strings". They are important, but complicated.

Null terminator

A string is always one char larger than it appears. The extra item is the null terminator (binary zero, typed as \0). This is required because a pointer to a string is generally passed around, rather than the array itself, and a pointer does not contain data on the string's length.

Without the terminator it would be impossible to determine where the string ended and where the next variable, or just unassigned memory, began. This is what a buffer overflow is, and those are bad news!

Tip.pngTip:The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.

Creating from string literal

char* MyString = "Hello world"; // must assign to pointer!

This code creates a string from a string literal. The double quote marks are special syntax which generate an array of char from their contents. The code above therefore:

  1. Assigns 12 bytes of memory, somewhere arbitrary, to store the string. This is one byte for each character, plus an automatic twelfth for the null terminator.
  2. Creates a local char pointer containing the address of the first character (H in this case).

String literals are often handed with const variables. This is because a string literal will stay in memory for the process's whole lifespan.

Creating by size

char MyString[12]; // accepts a static number only

int StringLen = 12;
char* pMyString = new char[StringLen]; // accepts a variable

delete[] pMyString; // always delete / delete[] anything created with 'new' after use

This code both allocates two 12-byte strings, but does not write anything to them (so their contents will be either blank or gibberish). They need to be assigned to, ideally with a string function like strcpy() or sprintf().

The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses new to allocate the string itself somewhere else. The advantage of new is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling delete (or delete[] for arrays) you will suffer a memory leak.

String functions

There are a multitude of functions which process strings, the most common of which have Source-specific V_* equivalents. See MSDN for a quite comprehensive list, or search VS' Class View for "V_str".