Zh/String: Difference between revisions
HouJunhao33 (talk | contribs) No edit summary |
Kestrelguy (talk | contribs) (updated language bar. still needs some translation done.) |
||
Line 1: | Line 1: | ||
{{ | {{lang|String}} | ||
| | {{translate:zh-cn}} | ||
}} | |||
{{toc-right}} | {{toc-right}} | ||
{{ent:zh-cn|char}}([[W:zh:ASCII|ASCII]])或 <code>wchar_t</code>([[W:zh:Unicode|Unicode]])的[[W:A中:数组|数组]]被经常用来储存文本。这些缓冲区有特定的术语:''' [[W:zh:字符串|字符串]]''',或有时被称为“C语言式字符串”。它们很重要,但也比较复杂。 | |||
== Null 终结符 == | == Null 终结符 == | ||
'''字符串总是要比其文本内容多一个字符,这多的一个字符就是 null 终结符(二进制上的 0)。'''字符串需要终结符,是因为字符串经常用[[pointer|指针]]{{en}}来传递,而指针并不像数组那样自带长度信息。 | |||
若没有终结符,就无法判断字符串结尾的位置。因此会读取到下一个变量或者是未分配内存的信息,这就造成了[[W:zh:缓冲区溢出|缓冲区溢出]],大大滴坏! | |||
{{tip:zh-cn|Null 终结符不需要位于字符串最后,字符串处理函数会忽略掉 Null 终结符后的内容。}} | |||
若没有终结符,就无法判断字符串结尾的位置。因此会读取到下一个变量或者是未分配内存的信息,这就造成了[[W: | |||
{{tip|Null 终结符不需要位于字符串最后,字符串处理函数会忽略掉 Null 终结符后的内容。}} | |||
== 从字面量创建字符串 == | == 从字面量创建字符串 == | ||
<source lang=cpp> | <source lang=cpp> | ||
char* MyString = "Hello world"; // 必须赋值给指针! | char* MyString = "Hello world"; // 必须赋值给指针! | ||
</source> | </source> | ||
此代码从[[W:string literal|字符串字面量]]创建字符串。双引号是用来从内容生成 <code>char</code> 数组的特殊语法标记。以上代码意为: | 此代码从[[W:string literal|字符串字面量]]{{en}}创建字符串。双引号是用来从内容生成 <code>char</code> 数组的特殊语法标记。以上代码意为: | ||
# 分配 12 个字节的内存空间用来储存字符串。一个字符一个字节,并自动加入了占一字节的 Null 终结符。 | # 分配 12 个字节的内存空间用来储存字符串。一个字符一个字节,并自动加入了占一字节的 Null 终结符。 | ||
# 创建一个局部 <code>char</code> | # 创建一个局部 <code>char</code> 指针,指向第一个字符的内存地址(本例中为 H)。 | ||
字符串字面量经常被赋值进 | 字符串字面量经常被赋值进{{ent:zh-cn|W:const|alt=常量}}中,这是因为字符串字面量会在函数的整个生命周期中一直存在在内存中。 | ||
{{note|如果你在 Visual Studio 调试器中查看 <code>MyString</code>,可以看到整个字符串。这个特殊的行为使得查看字符串更加方便,因为调试器一般只会显示指针变量本身(比如,第一个字符,H)。} | {{note:zh-cn|如果你在 Visual Studio 调试器中查看 <code>MyString</code>,可以看到整个字符串。这个特殊的行为使得查看字符串更加方便,因为调试器一般只会显示指针变量本身(比如,第一个字符,H)。} | ||
== 以大小创建字符串 == | == 以大小创建字符串 == | ||
<source lang=cpp> | <source lang=cpp> | ||
char MyString[12]; // 静态大小(栈内存分配) | char MyString[12]; // 静态大小(栈内存分配) | ||
Line 41: | Line 33: | ||
</source> | </source> | ||
以上代码都分配了 12 字节的字符串,但是没有写入任何内容(所以它们的内容应该为空,或者是垃圾数据)。需要为它们分配数据,理想情况下都应该使用像 <code>strcpy()</code> 或 <code>sprintf()</code> 这样的字符串函数。 | 以上代码都分配了 12 字节的字符串,但是没有写入任何内容(所以它们的内容应该为空,或者是垃圾数据)。需要为它们分配数据,理想情况下都应该使用像<code>strcpy()</code>或<code>sprintf()</code>这样的字符串函数。 | ||
The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses <code>[[W:new (C++)|new]]</code> to allocate the string itself somewhere else. The advantage of <code>new</code> is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling <code>[[W:delete (C++)|delete]]</code> (or <code>delete[]</code> for arrays) you will suffer a [[W:memory leak|memory leak]]. | The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses <code>[[W:new (C++)|new]]</code> to allocate the string itself somewhere else. The advantage of <code>new</code> is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling <code>[[W:delete (C++)|delete]]</code> (or <code>delete[]</code> for arrays) you will suffer a [[W:memory leak|memory leak]]. | ||
== Unicode strings == | == Unicode strings == | ||
{{note|Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of [[VGUI]].}} | {{note|Source is internally ASCII. The only time you will deal with Unicode is if you delve into the inner workings of [[VGUI]].}} | ||
Unicode strings behave similarly to ASCII strings, but are instead arrays of <code>wchar_t</code>. They are operated on by their own set of string functions, normally with 'wc' or 'wcs' (wide char string) in their name. | |||
<source lang=cpp> | <source lang=cpp> | ||
Line 59: | Line 50: | ||
== String functions == | == String functions == | ||
There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific <code>V_*</code> equivalents. See [ | There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific <code>V_*</code> equivalents. See [https://docs.microsoft.com/zh-cn/cpp/c-runtime-library/string-manipulation-crt?view=msvc-170 MSDN] for a quite comprehensive list, or search VS' Class View for "V_str". | ||
[[category:Variables]] | [[category:Variables:zh-cn]] | ||
[[Category:Glossary | [[Category:Glossary:zh-cn]] | ||
Revision as of 21:34, 6 June 2022
char(ASCII)或 wchar_t
(Unicode)的数组被经常用来储存文本。这些缓冲区有特定的术语: 字符串,或有时被称为“C语言式字符串”。它们很重要,但也比较复杂。
Null 终结符
字符串总是要比其文本内容多一个字符,这多的一个字符就是 null 终结符(二进制上的 0)。字符串需要终结符,是因为字符串经常用指针来传递,而指针并不像数组那样自带长度信息。
若没有终结符,就无法判断字符串结尾的位置。因此会读取到下一个变量或者是未分配内存的信息,这就造成了缓冲区溢出,大大滴坏! Template:Tip:zh-cn
从字面量创建字符串
char* MyString = "Hello world"; // 必须赋值给指针!
此代码从字符串字面量创建字符串。双引号是用来从内容生成
char
数组的特殊语法标记。以上代码意为:
- 分配 12 个字节的内存空间用来储存字符串。一个字符一个字节,并自动加入了占一字节的 Null 终结符。
- 创建一个局部
char
指针,指向第一个字符的内存地址(本例中为 H)。
字符串字面量经常被赋值进常量中,这是因为字符串字面量会在函数的整个生命周期中一直存在在内存中。
{{note:zh-cn|如果你在 Visual Studio 调试器中查看 MyString
,可以看到整个字符串。这个特殊的行为使得查看字符串更加方便,因为调试器一般只会显示指针变量本身(比如,第一个字符,H)。}
以大小创建字符串
char MyString[12]; // 静态大小(栈内存分配)
int StringLen = 12;
char* pMyString = new char[StringLen]; // 动态大小(堆内存分配)
delete[] pMyString; // 使用 new 分配动态内存后应该用 delete / delete[] 释放
以上代码都分配了 12 字节的字符串,但是没有写入任何内容(所以它们的内容应该为空,或者是垃圾数据)。需要为它们分配数据,理想情况下都应该使用像strcpy()
或sprintf()
这样的字符串函数。
The difference between them is that one creates an array in the function's memory space, while the other creates a pointer in the function and uses new
to allocate the string itself somewhere else. The advantage of new
is that you can allocate an array of a size determined at run-time, but the downside is that if you aren't scrupulous about calling delete
(or delete[]
for arrays) you will suffer a memory leak.
Unicode strings

Unicode strings behave similarly to ASCII strings, but are instead arrays of wchar_t
. They are operated on by their own set of string functions, normally with 'wc' or 'wcs' (wide char string) in their name.
wchar_t* MyWideString = L"Здравей свят";
The L
marks the string literal as being Unicode. You need to do this even if the characters are all ASCII-compatible.
String functions
There are a multitude of functions which process strings, of which the most common ASCII variants have Source-specific V_*
equivalents. See MSDN for a quite comprehensive list, or search VS' Class View for "V_str".