ファイルの5番目の列の値に基づいて.CSVファイルをフィルタリングし、それらのレコードを新しいファイルに出力します

Question

以下の形式の.CSVファイルがあります。

_"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10 "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013","" "23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013","" "23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013","" "46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013","" _

ファイルの5番目の列に異なる文字列があります。 5番目の列の値に基づいてファイルを除外する必要があります。 5番目のフィールドの値が "string 1"のみのレコードを持つ現在のファイルから新しいファイルが必要だとしましょう。

このために、私は以下のコマンドを試しました、

awk -F"," ' { if toupper($5) == "STRING 1") PRINT }' file1.csv > file2.csv

しかし、それは私に次のようなエラーを投げていました：

_awk: { if toupper($5) == "STRING 1") PRINT } awk: ^ syntax error awk: { if toupper($5) == "STRING 1") PRINT } awk: ^ syntax error _

次に、奇妙な出力が得られる以下を使用しました。

_awk -F"," '$5="string 1" {print}' file1.csv > file2.csv_

出力：

_"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10 "12310" "42324564756" "a simple string with a comma" string 1 without commas" "string 1" "USD" "12" "70%" "08/01/2013" "" "23455" "12312255564" "string with string 1 commas" "string with or without commas" "string 2" "USD" "433" "70%" "07/15/2013" "" "23525" "74535243123" "string with commas string 1 "string with or without commas" "string 1" "CAND" "744" "70%" "05/06/2013" "" "46476" "15467534544" "lengthy string with commas string 1 "string with or without commas" "string 2" "CAND" "388" "70%" "09/21/2013" "" _

PS：文字列が小文字か大文字かわからないので、安全な側にtoupperコマンドを使用しました。 AWKを使用してパターンを検索するときに、コードの何が問題で、文字列のスペースが重要かを知る必要があります。

limovala · Accepted Answer

awk -F '","' 'BEGIN {OFS=","} { if (toupper($5) == "STRING 1") print }' file1.csv > file2.csv

出力

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013","" "23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

これがあなたの望んでいることだと思います。

user61786 · Answer

CSVの問題は、標準がないことです。 CSV形式のデータを頻繁に処理する必要がある場合は、フィールド区切り文字として","を使用するだけでなく、より堅牢な方法を検討することをお勧めします。この場合、PerlのText::CSV CPANモジュールは、ジョブに非常に適しています。

$ Perl -mText::CSV_XS -WlanE ' BEGIN {our $csv = Text::CSV_XS->new;} $csv->parse($_); my @fields = $csv->fields(); print if $fields[4] =~ /string 1/i; ' file1.csv "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013","" "23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""