std :: stringの文字数を取得する方法は？

Question

C++で文字列の文字数を取得するにはどうすればよいですか？

Eclipse · Accepted Answer

std::stringを使用している場合は、 length() を呼び出します。

std::string str = "hello"; std::cout << str << ":" << str.length(); // Outputs "hello:5"

C-stringを使用している場合は、 strlen() を呼び出します。

const char *str = "hello"; std::cout << str << ":" << strlen(str); // Outputs "hello:5"

または、Pascalスタイルの文字列（または、ジョエルスポルスキーとしてf *****文字列それらを呼び出すのが好き末尾のNULLがあるときに文字列を使用する）を使用する場合は、最初の文字を逆参照するだけです。

const char *str = "\005hello"; std::cout << str + 1 << ":" << *str; // Outputs "hello:5"

John T · Answer

C++文字列（std :: string）を扱う場合、 length（）または size（）を探しています。どちらも同じ値を提供する必要があります。ただし、Cスタイルの文字列を処理する場合は、 strlen（）を使用します。

#include <iostream> #include <string.h> int main(int argc, char **argv) { std::string str = "Hello!"; const char *otherstr = "Hello!"; // C-Style string std::cout << str.size() << std::endl; std::cout << str.length() << std::endl; std::cout << strlen(otherstr) << std::endl; // C way for string length std::cout << strlen(str.c_str()) << std::endl; // convert C++ string to C-string then call strlen return 0; }

出力：

6 6 6 6

dcw · Answer

それはあなたが話している文字列型に依存します。文字列には多くの種類があります。

const char*-Cスタイルのマルチバイト文字列
const wchar_t*-Cスタイルのワイド文字列
std::string-「標準」マルチバイト文字列
std::wstring-「標準」ワイド文字列

3および4の場合、.size()または.length()メソッドを使用できます。

1の場合、strlen()を使用できますが、文字列変数がNULL（=== 0）でないことを確認する必要があります

2の場合、wcslen()を使用できますが、文字列変数がNULL（=== 0）でないことを確認する必要があります

MFCのCString、ATLのCComBSTR、ACEのACE_CStringなど、非標準のC++ライブラリには、.GetLength()などのメソッドを持つ他の文字列型があります。頭上からすぐにそれらの詳細を思い出すことはできません。

STLSoft ライブラリは、これを string access shims と呼ぶもので抽象化し、これを使用して、任意のタイプから文字列の長さ（およびその他の側面）を取得できます。したがって、上記のすべて（非標準ライブラリのものを含む）で同じ関数stlsoft::c_str_len()を使用します。この記事すべてが完全に明白または簡単ではないため、すべてがどのように機能するかを説明しています。

Gal Goldman · Answer

std :: stringを使用している場合、2つの一般的な方法があります。

std::string Str("Some String"); size_t Size = 0; Size = Str.size(); Size = Str.length();

cスタイル文字列（char *またはconst char *を使用）を使用している場合は、次を使用できます。

const char *pStr = "Some String"; size_t Size = strlen(pStr);

ChrisW · Answer

新しいSTLスタイルの文字列の代わりに古いCスタイルの文字列を使用している場合、Cランタイムライブラリにはstrlen関数があります。

const char* p = "Hello"; size_t n = strlen(p);

Alex Martelli · Answer

string foo; ... foo.length() ...

.lengthと.sizeは同義語です。「長さ」はもう少し明確なWordだと思います。

stefanB · Answer

std::string str("a string"); std::cout << str.size() << std::endl;

Luke Schafer · Answer

実際の文字列オブジェクトの場合：

yourstring.length();

または

yourstring.size();

Hape Entner · Answer

C++ std :: stringでは、length（）およびsize（）メソッドがバイト数を提供し、必ずしも文字数ではありません！。同じc-style sizeof（）関数を使用して！

ほとんどの印刷可能な7ビットASCII文字の場合、これは同じ値ですが、7ビットASCIIでない文字の場合は間違いなく異なります。実際の結果を得るには、次の例を参照してください（64ビットLinux）。

実際に文字数をカウントできる単純なc/c ++関数はありません。ところで、これらはすべて実装に依存しており、他の環境（コンパイラ、Win 16/32、Linux、組み込みなど）では異なる場合があります

次の例を参照してください：

#include <string> #include <iostream> #include <stdio.h> #include <string.h> using namespace std; int main() { /* c-Style char Array */ const char * Test1 = "1234"; const char * Test2 = "ÄÖÜ€"; const char * Test3 = "αβγ????"; /* c++ string object */ string sTest1 = "1234"; string sTest2 = "ÄÖÜ€"; string sTest3 = "αβγ????"; printf("
C Style Resluts:
"); printf("Test1: %s, strlen(): %d
",Test1, (int) strlen(Test1)); printf("Test2: %s, strlen(): %d
",Test2, (int) strlen(Test2)); printf("Test3: %s, strlen(): %d
",Test3, (int) strlen(Test3)); printf("
C++ Style Resluts:
"); cout << "Test1: " << sTest1 << ", Test1.size(): " <<sTest1.size() <<" sTest1.length(): " << sTest1.length() << endl; cout << "Test1: " << sTest2 << ", Test2.size(): " <<sTest2.size() <<" sTest1.length(): " << sTest2.length() << endl; cout << "Test1: " << sTest3 << ", Test3.size(): " <<sTest3.size() << " sTest1.length(): " << sTest3.length() << endl; return 0; }

例の出力はこれです：

C Style Results: Test1: ABCD, strlen(): 4 Test2: ÄÖÜ€, strlen(): 9 Test3: αβγ????, strlen(): 10 C++ Style Results: Test1: ABCD, sTest1.size(): 4 sTest1.length(): 4 Test2: ÄÖÜ€, sTest2.size(): 9 sTest2.length(): 9 Test3: αβγ????, sTest3.size(): 10 sTest3.length(): 10

Robert Fraser · Answer

Unicodeの場合

ここでのいくつかの回答は、.length()がマルチバイト文字で間違った結果を与えることに対処していますが、11の回答があり、それらのいずれも解決策を提供していません。

Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚の場合

まず、「長さ」の意味を知ることが重要です。やる気を起こさせる例として、文字列「Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚」を検討してください（一部の言語、特にタイ語は実際に発音区別符号を組み合わせて使用するため、これは just ミームですが、明らかにそれが最も重要なユースケースです）。 UTF-8でエンコードされていると仮定します。この文字列の長さについて話すことができる3つの方法があります。

95バイト

00000000: 5acd a5cd accc becd 89cc b3cc ba61 cc92 Z............a.. 00000010: cc92 cd8c cc8b cdaa ccb4 cd95 ccb2 6ccd ..............l. 00000020: a4cc 80cc 9acc 88cd 9ccc a8cd 8ecc b0cc ................ 00000030: 98cd 89cc 9f67 cc92 cd9d cd85 cd95 cd94 .....g.......... 00000040: cca4 cd96 cc9f 6fcc 90cd afcc 9acc 85cd ......o......... 00000050: aacc 86cd a3cc a1cc b5cc a1cc bccd 9a ...............

50コードポイント

LATIN CAPITAL LETTER Z COMBINING LEFT ANGLE BELOW COMBINING DOUBLE LOW LINE COMBINING INVERTED BRIDGE BELOW COMBINING LATIN SMALL LETTER I COMBINING LATIN SMALL LETTER R COMBINING VERTICAL TILDE LATIN SMALL LETTER A COMBINING TILDE OVERLAY COMBINING RIGHT ARROWHEAD BELOW COMBINING LOW LINE COMBINING TURNED COMMA ABOVE COMBINING TURNED COMMA ABOVE COMBINING ALMOST EQUAL TO ABOVE COMBINING DOUBLE ACUTE ACCENT COMBINING LATIN SMALL LETTER H LATIN SMALL LETTER L COMBINING OGONEK COMBINING UPWARDS ARROW BELOW COMBINING TILDE BELOW COMBINING LEFT TACK BELOW COMBINING LEFT ANGLE BELOW COMBINING PLUS SIGN BELOW COMBINING LATIN SMALL LETTER E COMBINING Grave ACCENT COMBINING DIAERESIS COMBINING LEFT ANGLE ABOVE COMBINING DOUBLE BREVE BELOW LATIN SMALL LETTER G COMBINING RIGHT ARROWHEAD BELOW COMBINING LEFT ARROWHEAD BELOW COMBINING DIAERESIS BELOW COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW COMBINING PLUS SIGN BELOW COMBINING TURNED COMMA ABOVE COMBINING DOUBLE BREVE COMBINING GREEK YPOGEGRAMMENI LATIN SMALL LETTER O COMBINING SHORT STROKE OVERLAY COMBINING PALATALIZED HOOK BELOW COMBINING PALATALIZED HOOK BELOW COMBINING SEAGULL BELOW COMBINING DOUBLE RING BELOW COMBINING CANDRABINDU COMBINING LATIN SMALL LETTER X COMBINING OVERLINE COMBINING LATIN SMALL LETTER H COMBINING BREVE COMBINING LATIN SMALL LETTER A COMBINING LEFT ANGLE ABOVE

5グラフェン

Z with some s**t a with some s**t l with some s**t g with some s**t o with some s**t

ICU を使用して長さを見つける

ICUにはC++クラスがありますが、UTF-16への変換が必要です。 Cのタイプとマクロを直接使用して、UTF-8サポートを取得できます。

#include <memory> #include <iostream> #include <unicode/utypes.h> #include <unicode/ubrk.h> #include <unicode/utext.h> // // C++ helpers so we can use RAII // // Note that ICU internally provides some C++ wrappers (such as BreakIterator), however these only seem to work // for UTF-16 strings, and require transforming UTF-8 to UTF-16 before use. // If you already have UTF-16 strings or can take the performance hit, you should probably use those instead of // the C functions. See: http://icu-project.org/apiref/icu4c/ // struct UTextDeleter { void operator()(UText* ptr) { utext_close(ptr); } }; struct UBreakIteratorDeleter { void operator()(UBreakIterator* ptr) { ubrk_close(ptr); } }; using PUText = std::unique_ptr<UText, UTextDeleter>; using PUBreakIterator = std::unique_ptr<UBreakIterator, UBreakIteratorDeleter>; void checkStatus(const UErrorCode status) { if(U_FAILURE(status)) { throw std::runtime_error(u_errorName(status)); } } size_t countGraphemes(UText* text) { // source for most of this: http://userguide.icu-project.org/strings/utext UErrorCode status = U_ZERO_ERROR; PUBreakIterator it(ubrk_open(UBRK_CHARACTER, "en_us", nullptr, 0, &status)); checkStatus(status); ubrk_setUText(it.get(), text, &status); checkStatus(status); size_t charCount = 0; while(ubrk_next(it.get()) != UBRK_DONE) { ++charCount; } return charCount; } size_t countCodepoints(UText* text) { size_t codepointCount = 0; while(UTEXT_NEXT32(text) != U_SENTINEL) { ++codepointCount; } // reset the index so we can use the structure again UTEXT_SETNATIVEINDEX(text, 0); return codepointCount; } void printStringInfo(const std::string& utf8) { UErrorCode status = U_ZERO_ERROR; PUText text(utext_openUTF8(nullptr, utf8.data(), utf8.length(), &status)); checkStatus(status); std::cout << "UTF-8 string (might look wrong if your console locale is different): " << utf8 << std::endl; std::cout << "Length (UTF-8 bytes): " << utf8.length() << std::endl; std::cout << "Length (UTF-8 codepoints): " << countCodepoints(text.get()) << std::endl; std::cout << "Length (graphemes): " << countGraphemes(text.get()) << std::endl; std::cout << std::endl; } void main(int argc, char** argv) { printStringInfo(u8"Hello, world!"); printStringInfo(u8"หวัดดีชาวโลก"); printStringInfo(u8"\xF0\x9F\x90\xBF"); printStringInfo(u8"Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚"); }

これは印刷します：

UTF-8 string (might look wrong if your console locale is different): Hello, world! Length (UTF-8 bytes): 13 Length (UTF-8 codepoints): 13 Length (graphemes): 13 UTF-8 string (might look wrong if your console locale is different): หวัดดีชาวโลก Length (UTF-8 bytes): 36 Length (UTF-8 codepoints): 12 Length (graphemes): 10 UTF-8 string (might look wrong if your console locale is different): ???? Length (UTF-8 bytes): 4 Length (UTF-8 codepoints): 1 Length (graphemes): 1 UTF-8 string (might look wrong if your console locale is different): Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚ Length (UTF-8 bytes): 95 Length (UTF-8 codepoints): 50 Length (graphemes): 5

Boost.Locale ICUをラップし、より良いインターフェイスを提供する場合があります。ただし、UTF-16との間の変換が必要です。

Atul Rokade · Answer

Std名前空間を気にせずに文字列の長さを取得する最も簡単な方法は次のとおりです

スペースを含む/含まない文字列

#include <iostream> #include <string> using namespace std; int main(){ string str; getline(cin,str); cout<<"Length of given string is"<<str.length(); return 0; }

スペースなしの文字列

#include <iostream> #include <string> using namespace std; int main(){ string str; cin>>str; cout<<"Length of given string is"<<str.length(); return 0; }