引用符の外側のコンマで分割

Question

私のプログラムはファイルから行を読み取ります。この行には、次のようなコンマ区切りテキストが含まれています。

_123,test,444,"don't split, this",more test,1 _

分割の結果は次のようになります。

_123 test 444 "don't split, this" more test 1 _

String.split(",")を使用すると、次のようになります：

_123 test 444 "don't split this" more test 1 _

つまり、サブストリング_"don't split, this"_のコンマはセパレーターではありません。これに対処するには？

事前に感謝します。

Rohit Jain · Accepted Answer

この正規表現を試すことができます：

str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

これにより、,の文字列が分割され、その後に偶数個の二重引用符が続きます。つまり、二重引用符の外側のコンマで分割されます。これは、文字列に引用符のバランスが取れていれば機能します。

説明：

, // Split on comma (?= // Followed by (?: // Start a non-capture group [^"]* // 0 or more non-quote characters " // 1 quote [^"]* // 0 or more non-quote characters " // 1 quote )* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even) [^"]* // Finally 0 or more non-quotes $ // Till the end (This is necessary, else every comma will satisfy the condition) )

正規表現で(?x)修飾子を使用して、コードに次のように入力することもできます。修飾子は正規表現の空白を無視するため、次のように複数行に分割された正規表現を読みやすくなります。

String[] arr = str.split("(?x) " + ", " + // Split on comma "(?= " + // Followed by " (?: " + // Start a non-capture group " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even) " [^\"]* " + // Finally 0 or more non-quotes " $ " + // Till the end (This is necessary, else every comma will satisfy the condition) ") " // End look-ahead );

zx81 · Answer

マッチできるのになぜ分割するのですか？

なんらかの理由で簡単な解決策が言及されなかったため、この質問を復活させました。美しくコンパクトな正規表現を次に示します。

"[^"]*"|[^,]+

これは、必要なすべてのフラグメントに一致します（デモを参照）。

説明

"[^"]*"、完全一致"double-quoted strings"
または|
一致する[^,]+コンマではない任意の文字。

可能な改良点は、引用の文字列にエスケープされた引用が含まれるように、代替の文字列側を改善することです。

stefan.schwetschke · Answer

複雑な正規表現なしでこれを非常に簡単に行うことができます。

文字"で分割します。文字列のリストを取得します
リスト内の各文字列を処理します：リスト内の偶数位置にあるすべての文字列を分割し（ゼロからインデックス付けを開始）、「」（リスト内にリストを取得します）、すべての奇数位置の文字列をそのままにします（直接入力します）リスト内のリスト）。
リストのリストに参加して、リストのみを取得します。

'"'のクォートを処理する場合は、アルゴリズムを少し調整する必要があります（一部を結合し、誤って分割するか、分割を単純な正規表現に変更します）が、基本構造はそのままです。

したがって、基本的には次のようなものです。

public class SplitTest { public static void main(String[] args) { final String splitMe="123,test,444,\"don't split, this\",more test,1"; final String[] splitByQuote=splitMe.split("\""); final String[][] splitByComma=new String[splitByQuote.length][]; for(int i=0;i<splitByQuote.length;i++) { String part=splitByQuote[i]; if (i % 2 == 0){ splitByComma[i]=part.split(","); }else{ splitByComma[i]=new String[1]; splitByComma[i][0]=part; } } for (String parts[] : splitByComma) { for (String part : parts) { System.out.println(part); } } } }

これは、ラムダを使用するとはるかにクリーンになります。

LIttle Ancient Forest Kami · Answer

@ zx81の答えに基づいて、一致するアイデアは本当に素晴らしいです、私はJava 9results呼び出しを追加しました。 Stream。 OPはsplitを使用したかったので、splitのように_String[]_に集めました。

カンマ区切り記号の後にスペースがある場合の注意（_a, b, "c,d"_）。次に、パターンを変更する必要があります。

Jshellデモ

_$ jshell -> String so = "123,test,444,\"don't split, this\",more test,1"; | Added variable so of type String with initial value "123,test,444,"don't split, this",more test,1" -> Pattern.compile("\"[^\"]*\"|[^,]+").matcher(so).results(); | Expression value is: Java.util.stream.ReferencePipeline$Head@2038ae61 | assigned to temporary variable $68 of type Java.util.stream.Stream<MatchResult> -> $68.map(MatchResult::group).toArray(String[]::new); | Expression value is: [Ljava.lang.String;@6b09bb57 | assigned to temporary variable $69 of type String[] -> Arrays.stream($69).forEach(System.out::println); 123 test 444 "don't split, this" more test 1 _

コード

_String so = "123,test,444,\"don't split, this\",more test,1"; Pattern.compile("\"[^\"]*\"|[^,]+") .matcher(so) .results() .map(MatchResult::group) .toArray(String[]::new); _

説明

正規表現_[^"]_は一致します：引用、引用以外のすべて、引用。
正規表現_[^"]*_は一致します：引用、引用以外の何回も0（またはそれ以上）回、引用。
その正規表現は最初に「win」に進む必要があります。そうでない場合は、コンマ以外の1回以上-つまり：_[^,]+_-「win」に一致します。
results()にはJava 9以上が必要です。
_Stream<MatchResult>_を返します。これは、group()呼び出しを使用してマップし、文字列の配列に収集します。パラメータのないtoArray()呼び出しは_Object[]_を返します。

Abhijith Nagarajan · Answer

以下のコードスニペットをご覧ください。このコードは、ハッピーフローのみを考慮しています。要件に応じて変更します

public static String[] splitWithEscape(final String str, char split, char escapeCharacter) { final List<String> list = new LinkedList<String>(); char[] cArr = str.toCharArray(); boolean isEscape = false; StringBuilder sb = new StringBuilder(); for (char c : cArr) { if (isEscape && c != escapeCharacter) { sb.append(c); } else if (c != split && c != escapeCharacter) { sb.append(c); } else if (c == escapeCharacter) { if (!isEscape) { isEscape = true; if (sb.length() > 0) { list.add(sb.toString()); sb = new StringBuilder(); } } else { isEscape = false; } } else if (c == split) { list.add(sb.toString()); sb = new StringBuilder(); } } if (sb.length() > 0) { list.add(sb.toString()); } String[] strArr = new String[list.size()]; return list.toArray(strArr); }