C ++では、文字列の分割がPythonより遅いのはなぜですか？

Question

少し速度を上げ、さびたC++スキルを磨くために、いくつかのコードをPythonからC++に変換しようとしています。昨日、stdinからの行の読み取りの素朴な実装がC++よりもPythonではるかに高速だったときにショックを受けました（ this を参照）。今日、私はついにC++で文字列を区切り文字（pythonのsplit（）と同様のセマンティクス）をマージして分割する方法を見つけ出し、現在deja vuを経験しています！私のC++コードは、作業を行うのにはるかに時間がかかります（昨日のレッスンの場合のように、桁違いではありません）。

Pythonコード：

#!/usr/bin/env python from __future__ import print_function import time import sys count = 0 start_time = time.time() dummy = None for line in sys.stdin: dummy = line.split() count += 1 delta_sec = int(time.time() - start_time) print("Python: Saw {0} lines in {1} seconds. ".format(count, delta_sec), end='') if delta_sec > 0: lps = int(count/delta_sec) print(" Crunch Speed: {0}".format(lps)) else: print('')

C++コード：

#include <iostream> #include <string> #include <sstream> #include <time.h> #include <vector> using namespace std; void split1(vector<string> &tokens, const string &str, const string &delimiters = " ") { // Skip delimiters at beginning string::size_type lastPos = str.find_first_not_of(delimiters, 0); // Find first non-delimiter string::size_type pos = str.find_first_of(delimiters, lastPos); while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector tokens.Push_back(str.substr(lastPos, pos - lastPos)); // Skip delimiters lastPos = str.find_first_not_of(delimiters, pos); // Find next non-delimiter pos = str.find_first_of(delimiters, lastPos); } } void split2(vector<string> &tokens, const string &str, char delim=' ') { stringstream ss(str); //convert string to stream string item; while(getline(ss, item, delim)) { tokens.Push_back(item); //add token to vector } } int main() { string input_line; vector<string> spline; long count = 0; int sec, lps; time_t start = time(NULL); cin.sync_with_stdio(false); //disable synchronous IO while(cin) { getline(cin, input_line); spline.clear(); //empty the vector for the next line to parse //I'm trying one of the two implementations, per compilation, obviously: // split1(spline, input_line); split2(spline, input_line); count++; }; count--; //subtract for final over-read sec = (int) time(NULL) - start; cerr << "C++ : Saw " << count << " lines in " << sec << " seconds." ; if (sec > 0) { lps = count / sec; cerr << " Crunch speed: " << lps << endl; } else cerr << endl; return 0; //compiled with: g++ -Wall -O3 -o split1 split_1.cpp

2つの異なる分割実装を試しました。 1つ（split1）は文字列メソッドを使用してトークンを検索し、複数のトークンをマージしたり、多数のトークンを処理したりできます（ here から取得）。 2番目（split2）は、getlineを使用して文字列をストリームとして読み取り、区切り文字をマージせず、単一の区切り文字（文字列分割の質問への回答で複数のStackOverflowユーザーによって投稿されたもの）のみをサポートします。

これをさまざまな順序で複数回実行しました。私のテストマシンはMacbook Pro（2011、8GB、クアッドコア）ですが、それほど重要ではありません。私はそれぞれが次のように見える3つのスペースで区切られた列を持つ20M行のテキストファイルでテストしています： "foo.bar 127.0.0.1 home.foo.bar"

結果：

$ /usr/bin/time cat test_lines_double | ./split.py 15.61 real 0.01 user 0.38 sys Python: Saw 20000000 lines in 15 seconds. Crunch Speed: 1333333 $ /usr/bin/time cat test_lines_double | ./split1 23.50 real 0.01 user 0.46 sys C++ : Saw 20000000 lines in 23 seconds. Crunch speed: 869565 $ /usr/bin/time cat test_lines_double | ./split2 44.69 real 0.02 user 0.62 sys C++ : Saw 20000000 lines in 45 seconds. Crunch speed: 444444

私は何を間違えていますか？外部ライブラリに依存しない（つまり、ブーストなし）、区切り文字のシーケンスのマージ（Pythonの分割など）、スレッドセーフ（strtokなし）、およびパフォーマンスが少なくともC++で文字列分割を行うより良い方法はありますかPythonと同等ですか？

Edit 1/Partial Solution？：

pythonでダミーリストをリセットし、C++と同様に毎回追加することで、より公平な比較を試みました。これはまだC++コードが実行していることとは異なりますが、少し近づいています。基本的に、ループは次のとおりです。

for line in sys.stdin: dummy = [] dummy += line.split() count += 1

pythonのパフォーマンスは、split1 C++実装とほぼ同じです。

/usr/bin/time cat test_lines_double | ./split5.py 22.61 real 0.01 user 0.40 sys Python: Saw 20000000 lines in 22 seconds. Crunch Speed: 909090

Pythonが文字列処理用に最適化されていても（Matt Joinerが示唆したように）、これらのC++実装は高速化されないことにまだ驚いています。 C++を使用してより最適な方法でこれを行う方法についてのアイデアがある場合は、コードを共有してください。（次のステップはこれを純粋なCで実装しようとしていると思いますが、プログラマの生産性を犠牲にしてプロジェクト全体をCで再実装するつもりはないので、これは単なる文字列分割速度の実験になります。）

ご協力ありがとうございます。

最終編集/解決策：

アルフの受け入れられた答えをご覧ください。 pythonは参照によって厳密に文字列を処理し、STL文字列はコピーされることが多いため、Vanilla python実装の方がパフォーマンスが向上します。比較のために、Alfのコードを介してデータをコンパイルおよび実行しました。これは、他のすべての実行と同じマシンでのパフォーマンスです。基本的なpython実装と本質的に同じです（python実装よりも高速です）上記の編集に示すように、リストをリセット/追加します）：

$ /usr/bin/time cat test_lines_double | ./split6 15.09 real 0.01 user 0.45 sys C++ : Saw 20000000 lines in 15 seconds. Crunch speed: 1333333

私の唯一の残された不満は、この場合にC++を実行するために必要なコードの量に関するものです。

この問題と昨日のstdinの行読みの問題（上記リンク）から得られる教訓の1つは、言語の相対的な「デフォルト」パフォーマンスについて単純な仮定をするのではなく、常にベンチマークを行う必要があるということです。教育に感謝します。

ご提案ありがとうございます！

Cheers and hth. - Alf · Accepted Answer

推測として、Python文字列は参照カウントされる不変文字列であるため、Pythonコードでは文字列はコピーされませんが、C++ std::stringは可変値型であり、最小の機会。

目標が高速分割の場合、一定時間のサブストリング操作を使用します。これは、Python（およびJavaのように、元のストリングの一部のみreferringを意味します。およびC＃…）。

ただし、C++ std::stringクラスには1つの引き換え機能があります。これはstandardであるため、効率が主要ではない場所で文字列を安全かつ移植可能に渡すことができます考慮。しかし、十分なチャット。コード-そして私のマシンでは、これはもちろんPythonより高速です。Pythonの文字列処理はC++のサブセットであるCで実装されているためです（彼）：

#include <iostream> #include <string> #include <sstream> #include <time.h> #include <vector> using namespace std; class StringRef { private: char const* begin_; int size_; public: int size() const { return size_; } char const* begin() const { return begin_; } char const* end() const { return begin_ + size_; } StringRef( char const* const begin, int const size ) : begin_( begin ) , size_( size ) {} }; vector<StringRef> split3( string const& str, char delimiter = ' ' ) { vector<StringRef> result; enum State { inSpace, inToken }; State state = inSpace; char const* pTokenBegin = 0; // Init to satisfy compiler. for( auto it = str.begin(); it != str.end(); ++it ) { State const newState = (*it == delimiter? inSpace : inToken); if( newState != state ) { switch( newState ) { case inSpace: result.Push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) ); break; case inToken: pTokenBegin = &*it; } } state = newState; } if( state == inToken ) { result.Push_back( StringRef( pTokenBegin, &*str.end() - pTokenBegin ) ); } return result; } int main() { string input_line; vector<string> spline; long count = 0; int sec, lps; time_t start = time(NULL); cin.sync_with_stdio(false); //disable synchronous IO while(cin) { getline(cin, input_line); //spline.clear(); //empty the vector for the next line to parse //I'm trying one of the two implementations, per compilation, obviously: // split1(spline, input_line); //split2(spline, input_line); vector<StringRef> const v = split3( input_line ); count++; }; count--; //subtract for final over-read sec = (int) time(NULL) - start; cerr << "C++ : Saw " << count << " lines in " << sec << " seconds." ; if (sec > 0) { lps = count / sec; cerr << " Crunch speed: " << lps << endl; } else cerr << endl; return 0; } //compiled with: g++ -Wall -O3 -o split1 split_1.cpp -std=c++0x

免責事項：バグがないことを願っています。機能をテストしていませんが、速度のみをチェックしました。しかし、バグが1つでも2つあったとしても、それを修正しても速度に大きな影響はないと思います。

tobbez · Answer

（少なくともパフォーマンスに関して）より良いソリューションを提供しているわけではありませんが、興味深い追加データをいくつか提供しています。

strtok_r（strtokの再入可能バリアント）の使用：

void splitc1(vector<string> &tokens, const string &str, const string &delimiters = " ") { char *saveptr; char *cpy, *token; cpy = (char*)malloc(str.size() + 1); strcpy(cpy, str.c_str()); for(token = strtok_r(cpy, delimiters.c_str(), &saveptr); token != NULL; token = strtok_r(NULL, delimiters.c_str(), &saveptr)) { tokens.Push_back(string(token)); } free(cpy); }

さらに、パラメーターに文字列を使用し、入力にfgetsを使用します。

void splitc2(vector<string> &tokens, const char *str, const char *delimiters) { char *saveptr; char *cpy, *token; cpy = (char*)malloc(strlen(str) + 1); strcpy(cpy, str); for(token = strtok_r(cpy, delimiters, &saveptr); token != NULL; token = strtok_r(NULL, delimiters, &saveptr)) { tokens.Push_back(string(token)); } free(cpy); }

そして、場合によっては、入力文字列の破壊が許容される場合：

void splitc3(vector<string> &tokens, char *str, const char *delimiters) { char *saveptr; char *token; for(token = strtok_r(str, delimiters, &saveptr); token != NULL; token = strtok_r(NULL, delimiters, &saveptr)) { tokens.Push_back(string(token)); } }

これらのタイミングは次のとおりです（質問および受け入れられた回答からの他のバリアントの結果を含む）。

split1.cpp: C++ : Saw 20000000 lines in 31 seconds. Crunch speed: 645161 split2.cpp: C++ : Saw 20000000 lines in 45 seconds. Crunch speed: 444444 split.py: Python: Saw 20000000 lines in 33 seconds. Crunch Speed: 606060 split5.py: Python: Saw 20000000 lines in 35 seconds. Crunch Speed: 571428 split6.cpp: C++ : Saw 20000000 lines in 18 seconds. Crunch speed: 1111111 splitc1.cpp: C++ : Saw 20000000 lines in 27 seconds. Crunch speed: 740740 splitc2.cpp: C++ : Saw 20000000 lines in 22 seconds. Crunch speed: 909090 splitc3.cpp: C++ : Saw 20000000 lines in 20 seconds. Crunch speed: 1000000

ご覧のとおり、受け入れられた答えからの解決策は依然として最速です。

さらにテストを行いたい人のために、質問からのすべてのプログラム、受け入れられた回答、この回答、さらにMakefileとテストデータを生成するスクリプトを含むGithubリポジトリを作成しました： https： //github.com/tobbez/string-splitting 。

Vite Falcon · Answer

これは、Push_back（）関数呼び出しの処理中に_std::vector_がサイズ変更される方法によるものと思われます。 _std::list_またはstd::vector::reserve()を使用して文に十分なスペースを確保しようとすると、パフォーマンスが大幅に向上します。または、以下のようなsplit1（）の両方の組み合わせを使用できます。

_void split1(vector<string> &tokens, const string &str, const string &delimiters = " ") { // Skip delimiters at beginning string::size_type lastPos = str.find_first_not_of(delimiters, 0); // Find first non-delimiter string::size_type pos = str.find_first_of(delimiters, lastPos); list<string> token_list; while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the list token_list.Push_back(str.substr(lastPos, pos - lastPos)); // Skip delimiters lastPos = str.find_first_not_of(delimiters, pos); // Find next non-delimiter pos = str.find_first_of(delimiters, lastPos); } tokens.assign(token_list.begin(), token_list.end()); } _

[〜＃〜] edit [〜＃〜]：私が見る他の明らかなことは、Python variable dummyは毎回assignedを取得しますが、変更はしません。したがって、C++との公正な比較ではありません。Pythonコードを_dummy = []_にして、初期化してからdummy += line.split()を実行します。この後、ランタイムを報告できますか？

EDIT2：さらに公平にするために、C++コードのwhileループを次のように変更できます。

_ while(cin) { getline(cin, input_line); std::vector<string> spline; // create a new vector //I'm trying one of the two implementations, per compilation, obviously: // split1(spline, input_line); split2(spline, input_line); count++; }; _

JiaHao Xu · Answer

C++ 17とC++ 14の機能を使用すると、次のコードの方が優れていると思います。

// These codes are un-tested when I write this post, but I'll test it // When I'm free, and I sincerely welcome others to test and modify this // code. // C++17 #include <istream> // For std::istream. #include <string_view> // new feature in C++17, sizeof(std::string_view) == 16 in libc++ on my x86-64 debian 9.4 computer. #include <string> #include <utility> // C++14 feature std::move. template <template <class...> class Container, class Allocator> void split1(Container<std::string_view, Allocator> &tokens, std::string_view str, std::string_view delimiter = " ") { /* * The model of the input string: * * (optional) delimiter | content | delimiter | content | delimiter| * ... | delimiter | content * * Using std::string::find_first_not_of or * std::string_view::find_first_not_of is a bad idea, because it * actually does the following thing: * * Finds the first character not equal to any of the characters * in the given character sequence. * * Which means it does not treeat your delimiters as a whole, but as * a group of characters. * * This has 2 effects: * * 1. When your delimiters is not a single character, this function * won't behave as you predicted. * * 2. When your delimiters is just a single character, the function * may have an additional overhead due to the fact that it has to * check every character with a range of characters, although * there's only one, but in order to assure the correctness, it still * has an inner loop, which adds to the overhead. * * So, as a solution, I wrote the following code. * * The code below will skip the first delimiter prefix. * However, if there's nothing between 2 delimiter, this code'll * still treat as if there's sth. there. * * Note: * Here I use C++ std version of substring search algorithm, but u * can change it to Boyer-Moore, KMP(takes additional memory), * Rabin-Karp and other algorithm to speed your code. * */ // Establish the loop invariant 1. typename std::string_view::size_type next, delimiter_size = delimiter.size(), pos = str.find(delimiter) ? 0 : delimiter_size; // The loop invariant: // 1. At pos, it is the content that should be saved. // 2. The next pos of delimiter is stored in next, which could be 0 // or std::string_view::npos. do { // Find the next delimiter, maintain loop invariant 2. next = str.find(delimiter, pos); // Found a token, add it to the vector tokens.Push_back(str.substr(pos, next)); // Skip delimiters, maintain the loop invariant 1. // // @ next is the size of the just pushed token. // Because when next == std::string_view::npos, the loop will // terminate, so it doesn't matter even if the following // expression have undefined behavior due to the overflow of // argument. pos = next + delimiter_size; } while(next != std::string_view::npos); } template <template <class...> class Container, class traits, class Allocator2, class Allocator> void split2(Container<std::basic_string<char, traits, Allocator2>, Allocator> &tokens, std::istream &stream, char delimiter = ' ') { std::string<char, traits, Allocator2> item; // Unfortunately, std::getline can only accept a single-character // delimiter. while(std::getline(stream, item, delimiter)) // Move item into token. I haven't checked whether item can be // reused after being moved. tokens.Push_back(std::move(item)); }

コンテナの選択：

std::vector。

割り当てられた内部配列の初期サイズが1で、最終サイズがNであると仮定すると、log2（N）回の割り当てと割り当て解除を行い、（2 ^（log2（N）+ 1）-1）= （2N-1）回。 reallocを対数で呼び出さないためにstd :: vectorのパフォーマンスが低下しているのですか？で指摘されているように、ベクトルのサイズが予測不能で非常に大きい場合、パフォーマンスが低下する可能性があります。ただし、サイズを見積もることができれば、これで問題は少なくなります。
std::list。

すべてのPush_backについて、消費される時間は一定ですが、個々のPush_backでstd :: vectorよりも時間がかかる可能性があります。スレッドごとのメモリプールとカスタムアロケーターを使用すると、この問題を緩和できます。
std::forward_list。

Std :: listと同じですが、要素ごとのメモリ使用量が少なくなります。 API Push_backがないため、ラッパークラスが機能する必要があります。
std::array。

成長の限界を知ることができれば、std :: arrayを使用できます。原因として、API Push_backがないため、直接使用することはできません。しかし、ラッパーを定義することはできますが、ここでは最速の方法であり、見積もりが非常に正確であれば、メモリを節約できます。
std::deque。

このオプションを使用すると、メモリをパフォーマンスと引き換えに使用できます。要素の（2 ^（N + 1）-1）回のコピーはなく、N回の割り当てだけで、割り当て解除はありません。また、一定のランダムアクセス時間があり、両端に新しい要素を追加することができます。

std :: deque-cppreference による

一方、両端キューには通常、最小限のメモリコストがあります。 1つの要素のみを保持する両端キューは、その内部配列全体を割り当てる必要があります（たとえば、64ビットlibstdc ++ではオブジェクトサイズの8倍、64ビットlibc ++ではオブジェクトサイズの16倍または4096バイトのいずれか大きい方）

または、これらのコンボを使用できます：

std::vector< std::array<T, 2 ^ M> >

これはstd :: dequeに似ていますが、違いはこのコンテナが要素を先頭に追加することをサポートしていないことだけです。しかし、基になるstd :: arrayを（2 ^（N + 1）-1）回コピーしないという事実により、パフォーマンスは依然として高速です。（2 ^のポインタ配列をコピーするだけです。（N-M + 1）-1）回、および現在がいっぱいで、何も割り当て解除する必要がない場合にのみ新しい配列を割り当てます。ところで、一定のランダムアクセス時間を取得できます。
std::list< std::array<T, ...> >

メモリのフレーム化のプレッシャーを大幅に軽減します。電流がいっぱいのときにのみ新しい配列を割り当て、何もコピーする必要はありません。コンボ1と比較して、追加のポインターの価格を支払う必要があります。
std::forward_list< std::array<T, ...> >

2と同じですが、コンボ1と同じメモリが必要です。

Paul Beckingham · Answer

Split1の実装を使用し、これを変更することにより、split2の署名により厳密に一致するように署名を変更する場合：

void split1(vector<string> &tokens, const string &str, const string &delimiters = " ")

これに：

void split1(vector<string> &tokens, const string &str, const char delimiters = ' ')

Split1とsplit2のより劇的な違いと、より公平な比較が得られます。

split1 C++ : Saw 10000000 lines in 41 seconds. Crunch speed: 243902 split2 C++ : Saw 10000000 lines in 144 seconds. Crunch speed: 69444 split1' C++ : Saw 10000000 lines in 33 seconds. Crunch speed: 303030

Matt Joiner · Answer

選択したC++実装が必然的にPythonよりも高速であるという誤った仮定をしていることになります。 Pythonの文字列処理は高度に最適化されています。詳細については、この質問を参照してください。なぜstd :: string操作のパフォーマンスが悪いのですか？

n.m. · Answer

void split5(vector<string> &tokens, const string &str, char delim=' ') { enum { do_token, do_delim } state = do_delim; int idx = 0, tok_start = 0; for (string::const_iterator it = str.begin() ; ; ++it, ++idx) { switch (state) { case do_token: if (it == str.end()) { tokens.Push_back (str.substr(tok_start, idx-tok_start)); return; } else if (*it == delim) { state = do_delim; tokens.Push_back (str.substr(tok_start, idx-tok_start)); } break; case do_delim: if (it == str.end()) { return; } if (*it != delim) { state = do_token; tok_start = idx; } break; } } }

Alex Collins · Answer

これは、Pythonのsys.stdinでのバッファリングに関連していると思われますが、C++実装ではバッファリングされていません。

バッファサイズを変更する方法の詳細については、この投稿を参照してから、比較を再試行してください。 sys.stdinにより小さいバッファサイズを設定しますか？