C ++でUnicode文字を印刷するには？

Question

ロシア語の "ф"（ U + 0444 CYRILLIC SMALL LETTER EF）文字を印刷しようとしています。これには10進数のコード 1092 。 C++を使用して、この文字をどのように印刷できますか？私は次のように何かがうまくいくと思っていただろう...

int main (){ wchar_t f = '1060'; cout << f << endl; }

bames53 · Accepted Answer

文字を表すために、Universal Character Names（UCN）を使用できます。文字「ф」のユニコード値はU + 0444であるため、C++では「\ u0444」または「\ U00000444」と書くことができます。また、ソースコードエンコーディングがこの文字をサポートしている場合は、ソースコードにそのまま記述することができます。

// both of these assume that the character can be represented with // a single char in the execution encoding char b = '\u0444'; char a = 'ф'; // this line additionally assumes that the source character encoding supports this character

そのような文字を印刷するかどうかは、印刷先によって異なります。 Unixターミナルエミュレータに出力する場合、ターミナルエミュレータはこの文字をサポートするエンコーディングを使用しており、そのエンコーディングはコンパイラの実行エンコーディングと一致する場合、次のことができます。

#include <iostream> int main() { std::cout << "Hello, ф or \u0444!
"; }

このプログラムはしない 'ф'が単一の文字で表現できることを要求しています。 OS Xおよび最新のLinuxインストールでは、ソース、実行、およびコンソールエンコーディングがすべてUTF-8（すべてのUnicode文字をサポート）であるため、これは問題なく機能します。

Windowsの場合は困難であり、トレードオフが異なると異なる可能性があります。

おそらく、ポータブルコードが必要ない場合（wchar_tを使用するので、これは他のすべてのプラットフォームでは避けるべきです）、おそらく最善の方法は、出力ファイルハンドルのモードを設定してUTF-16データのみを取得することです。

#include <iostream> #include <io.h> #include <fcntl.h> int main() { _setmode(_fileno(stdout), _O_U16TEXT); std::wcout << L"Hello, \u0444!
"; }

移植可能なコードはより困難です。

JAM · Answer

-std=c++11でコンパイルする場合、単純に

 const char *s = u8"\u0444"; cout << s << endl;

Puppy · Answer

最終的に、これは完全にプラットフォームに依存します。残念ながら、Unicodeサポートは標準C++では非常に貧弱です。 GCCの場合、UTF-8を使用し、Windowsはワイド文字列を必要とするため、それをナロー文字列にする必要があり、wcoutに出力する必要があります。

// GCC std::cout << "ф"; // Windoze wcout << L"ф";

vladasimovic · Answer

Windowsを使用している場合（注、coutではなくprintf（）を使用しています）：

//Save As UTF8 without signature #include <stdio.h> #include<windows.h> int main (){ SetConsoleOutputCP(65001); printf("ф
"); }

Unicodeではなく動作-UTF8の代わりに1251：

//Save As Windows 1251 #include <iostream> #include<windows.h> using namespace std; int main (){ SetConsoleOutputCP(1251); cout << "ф" << endl; }

Mike DeSimone · Answer

'1060'は4文字で、標準ではコンパイルされません。ワイド文字がUnicodeと1：1で一致する場合は、文字を数字として扱う必要があります（ロケール設定を確認してください）。

int main (){ wchar_t f = 1060; wcout << f << endl; }

Iro · Answer

このコードはLinux（C++ 11、geany、g ++ 7.4.0）で機能します。

#include <iostream> using namespace std; int utf8_to_unicode(string utf8_code); string unicode_to_utf8(int unicode); int main() { cout << unicode_to_utf8(36) << '	'; cout << unicode_to_utf8(162) << '	'; cout << unicode_to_utf8(8364) << '	'; cout << unicode_to_utf8(128578) << endl; cout << unicode_to_utf8(0x24) << '	'; cout << unicode_to_utf8(0xa2) << '	'; cout << unicode_to_utf8(0x20ac) << '	'; cout << unicode_to_utf8(0x1f642) << endl; cout << utf8_to_unicode("$") << '	'; cout << utf8_to_unicode("¢") << '	'; cout << utf8_to_unicode("€") << '	'; cout << utf8_to_unicode("????") << endl; cout << utf8_to_unicode("\x24") << '	'; cout << utf8_to_unicode("\xc2\xa2") << '	'; cout << utf8_to_unicode("\xe2\x82\xac") << '	'; cout << utf8_to_unicode("\xf0\x9f\x99\x82") << endl; return 0; } int utf8_to_unicode(string utf8_code) { unsigned utf8_size = utf8_code.length(); int unicode = 0; for (unsigned p=0; p<utf8_size; ++p) { int bit_count = (p? 6: 8 - utf8_size - (utf8_size == 1? 0: 1)), shift = (p < utf8_size - 1? (6*(utf8_size - p - 1)): 0); for (int k=0; k<bit_count; ++k) unicode += ((utf8_code[p] & (1 << k)) << shift); } return unicode; } string unicode_to_utf8(int unicode) { string s; if (unicode>=0 and unicode <= 0x7f) // 7F(16) = 127(10) { s = static_cast<char>(unicode); return s; } else if (unicode <= 0x7ff) // 7FF(16) = 2047(10) { unsigned char c1 = 192, c2 = 128; for (int k=0; k<11; ++k) { if (k < 6) c2 |= (unicode % 64) & (1 << k); else c1 |= (unicode >> 6) & (1 << (k - 6)); } s = c1; s += c2; return s; } else if (unicode <= 0xffff) // FFFF(16) = 65535(10) { unsigned char c1 = 224, c2 = 128, c3 = 128; for (int k=0; k<16; ++k) { if (k < 6) c3 |= (unicode % 64) & (1 << k); else if (k < 12) c2 |= (unicode >> 6) & (1 << (k - 6)); else c1 |= (unicode >> 12) & (1 << (k - 12)); } s = c1; s += c2; s += c3; return s; } else if (unicode <= 0x1fffff) // 1FFFFF(16) = 2097151(10) { unsigned char c1 = 240, c2 = 128, c3 = 128, c4 = 128; for (int k=0; k<21; ++k) { if (k < 6) c4 |= (unicode % 64) & (1 << k); else if (k < 12) c3 |= (unicode >> 6) & (1 << (k - 6)); else if (k < 18) c2 |= (unicode >> 12) & (1 << (k - 12)); else c1 |= (unicode >> 18) & (1 << (k - 18)); } s = c1; s += c2; s += c3; s += c4; return s; } else if (unicode <= 0x3ffffff) // 3FFFFFF(16) = 67108863(10) { ; // actually, there are no 5-bytes unicodes } else if (unicode <= 0x7fffffff) // 7FFFFFFF(16) = 2147483647(10) { ; // actually, there are no 6-bytes unicodes } else ; // incorrect unicode (< 0 or > 2147483647) return ""; }

もっと：

MGR · Answer

文字列をUIで表示し、それをxml構成ファイルに保存する必要がありました。上記の指定された形式はc ++の文字列に適しています。「\ u」を「＆＃x」に置き換えて「;」を追加することで、特殊文字のxml互換文字列を追加できます。最後に。

例：C++： "\ u0444"-> XML："ф"

VoyciecH · Answer

Linuxの別のソリューション：

string a = "Ф"; cout << "Ф = \xd0\xa4 = " << hex << int(static_cast<unsigned char>(a[0])) << int(static_cast<unsigned char>(a[1])) << " (" << a.length() << "B)" << endl; string b = "√"; cout << "√ = \xe2\x88\x9a = " << hex << int(static_cast<unsigned char>(b[0])) << int(static_cast<unsigned char>(b[1])) << int(static_cast<unsigned char>(b[2])) << " (" << b.length() << "B)" << endl;

quanta · Answer

Linuxでは、次のことができます。

std::cout << "ф";

here から文字をコピーアンドペーストしただけで、少なくとも試してみたランダムサンプルでは失敗しませんでした。