String: Difference between revisions
| HouJunhao33 (talk | contribs) m (Add language option for zh-cn.) | SirYodaJedi (talk | contribs)  No edit summary | ||
| (10 intermediate revisions by 6 users not shown) | |||
| Line 1: | Line 1: | ||
| {{ | {{LanguageBar}} | ||
| }} | |||
| {{toc-right}} | {{toc-right}} | ||
| [[W:Array data structure|Array]]s of  | [[W:Array data structure|Array]]s of {{ent|char}} ([[W:ASCII|ASCII]]) or {{ent|wchar_t}} ([[W:Unicode|Unicode]]) are commonly used to store text. These buffers have a special term:''' [[W:String (computer science)|strings]]'''. They are important, but complicated. | ||
| {{todo|Explain string_t.}} | |||
| '''A string is always one character larger than it appears. The extra item is the null terminator (binary zero, typed as <code>\0</code>).''' This is required because a [[pointer]] to a string is generally passed around, rather than the array itself | == C strings or [https://en.cppreference.com/w/cpp/string/byte Null-terminated byte strings] == | ||
| '''A string is always one character larger than it appears. The extra item is the null terminator (binary zero, typed as <code>\0</code>).''' This is required because a [[pointer]] to a string is generally passed around, rather than the array itself.  | |||
| {{warning|A pointer does not contain data on the string's length.}} | |||
| Without the terminator it would be impossible to determine where the string ended and where the next variable, or just unassigned memory, began. This is what a [[W:buffer overflow|buffer overflow]] is, and those are bad news! | Without the terminator it would be impossible to determine where the string ended and where the next variable, or just unassigned memory, began. This is what a [[W:buffer overflow|buffer overflow]] is, and those are bad news! | ||
| Line 15: | Line 16: | ||
| {{tip|The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.}} | {{tip|The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.}} | ||
| == Creating  | == Creating From String Literal == | ||
| <source lang=cpp> | <source lang=cpp> | ||
| Line 30: | Line 31: | ||
| {{note|If you examine <code>MyString</code> in the Visual Studio debugger, you will see the entire string. This is special behaviour to make examining strings easier; strictly speaking it should just show you the pointee (i.e. the first character, H).}} | {{note|If you examine <code>MyString</code> in the Visual Studio debugger, you will see the entire string. This is special behaviour to make examining strings easier; strictly speaking it should just show you the pointee (i.e. the first character, H).}} | ||
| == Creating by  | == Creating by Size == | ||
| <source lang=cpp> | <source lang=cpp> | ||
| Line 45: | Line 46: | ||
| The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses <code>[[W:new (C++)|new]]</code> to allocate the string itself somewhere else. The advantage of <code>new</code> is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling <code>[[W:delete (C++)|delete]]</code> (or <code>delete[]</code> for arrays) you will suffer a [[W:memory leak|memory leak]]. | The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses <code>[[W:new (C++)|new]]</code> to allocate the string itself somewhere else. The advantage of <code>new</code> is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling <code>[[W:delete (C++)|delete]]</code> (or <code>delete[]</code> for arrays) you will suffer a [[W:memory leak|memory leak]]. | ||
| == Unicode  | == Unicode Strings == | ||
| {{note|Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of [[VGUI]].}} | {{note|Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of [[VGUI]].}} | ||
| Line 57: | Line 58: | ||
| The <code>L</code> marks the string literal as being Unicode. You need to do this even if the characters are all ASCII-compatible. | The <code>L</code> marks the string literal as being Unicode. You need to do this even if the characters are all ASCII-compatible. | ||
| == String  | == String Functions == | ||
| There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific <code>V_*</code> equivalents. See [http://msdn.microsoft.com/en-us/library/f0151s4x.aspx MSDN] for a quite comprehensive list, or search VS' Class View for "V_str". | There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific <code>V_*</code> equivalents. See [http://msdn.microsoft.com/en-us/library/f0151s4x.aspx MSDN] for a quite comprehensive list, or search VS' Class View for "V_str". | ||
| [[Category:C++]] | |||
| [[category:Variables]] | [[category:Variables]] | ||
Latest revision as of 08:26, 4 August 2025
Arrays of char (ASCII) or wchar_t (Unicode) are commonly used to store text. These buffers have a special term: strings. They are important, but complicated.
C strings or Null-terminated byte strings
A string is always one character larger than it appears. The extra item is the null terminator (binary zero, typed as \0). This is required because a pointer to a string is generally passed around, rather than the array itself. 
 Warning:A pointer does not contain data on the string's length.
Warning:A pointer does not contain data on the string's length.Without the terminator it would be impossible to determine where the string ended and where the next variable, or just unassigned memory, began. This is what a buffer overflow is, and those are bad news!
 Tip:The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.
Tip:The null terminator does not need to be the last item in the array. The bytes after it will just be ignored by string-handling functions.Creating From String Literal
char* MyString = "Hello world"; // must assign to pointer!
This code creates a string from a string literal. The double quote marks are special syntax which generate an array of char from their contents. The code above therefore:
- Assigns 12 bytes of memory, somewhere arbitrary, to store the string. This is one byte for each character, plus an automatic twelfth for the null terminator.
- Creates a local charpointer containing the address of the first character (H in this case).
String literals are often handed with const variables. This is because a string literal will stay in memory for the process's whole lifespan.
 Note:If you examine
Note:If you examine MyString in the Visual Studio debugger, you will see the entire string. This is special behaviour to make examining strings easier; strictly speaking it should just show you the pointee (i.e. the first character, H).Creating by Size
char MyString[12]; // accepts a static size only
int StringLen = 12;
char* pMyString = new char[StringLen]; // accepts a variable size
delete[] pMyString; // always delete / delete[] anything created with 'new' after use
This code both allocates two 12-byte strings, but does not write anything to them (so their contents will be either blank or gibberish). They need to be assigned to, ideally with a string function like strcpy() or sprintf().
The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses new to allocate the string itself somewhere else. The advantage of new is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling delete (or delete[] for arrays) you will suffer a memory leak.
Unicode Strings
 Note:Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of VGUI.
Note:Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of VGUI.Unicode strings behave similarly to ASCII strings, but are instead arrays of wchar_t. They are operated on by their own set of string functions, normally with 'wc' or 'wcs' (wide char string) in their name.
wchar_t* MyWideString = L"Здравей свят";
The L marks the string literal as being Unicode. You need to do this even if the characters are all ASCII-compatible.
String Functions
There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific V_* equivalents. See MSDN for a quite comprehensive list, or search VS' Class View for "V_str".

























