sedを使用して文字列の出現回数を数えますか？

Question

「タイトル」を何度も書いたファイルがあります。「title」が行の最初の文字列であるという条件で、sedコマンドを使用して「title」がそのファイルに書き込まれた回数を見つけるにはどうすればよいですか？例えば.

# title title title

最初の行のタイトルは最初の文字列ではないため、count = 2が出力されます。

更新

私はawkを使用して発生の総数を次のように見つけました：

awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt

しかし、上記の例で説明したように、title最初の文字列を含む行のみをカウントするようにawkに指示するにはどうすればよいですか？

pavium · Accepted Answer

sedは適切ではないと思いますが、パイプラインで使用してファイルを変換し、必要なWordが別の行に表示されるようにし、grep -cを使用して出現回数をカウントする場合を除きます。

ジョナサンのtrを使用してスペースを改行に変換するアイデアが気に入っています。このメソッドの優れた点は、連続するスペースが複数の空白行に変換されることですただし、問題ではありません grepは、単一の単語「タイトル」を持つ行だけをカウントできるためです。

Paused until further notice. · Answer

絶対とは絶対言うな。純粋なsed（GNUバージョンが必要になる場合があります）。

#!/bin/sed -nf # based on a script from the sed info file (info sed) # section 4.8 Numbering Non-blank Lines (cat -b) # modified to count lines that begin with "title" /^title/! be x /^$/ s/^.*$/0/ /^9*$/ s/^/0/ s/.9*$/x&/ h s/^.*x// y/0123456789/1234567890/ x s/x.*$// G s/
// h :e $ {x;p}

説明：

#!/bin/sed -nf # run sed without printing output by default (-n) # using the following file as the sed script (-f) /^title/! be # if the current line doesn't begin with "title" branch to label e x # swap the counter from hold space into pattern space /^$/ s/^.*$/0/ # if pattern space is empty start the counter at zero /^9*$/ s/^/0/ # if pattern space starts with a nine, prepend a zero s/.9*$/x&/ # mark the position of the last digit before a sequence of nines (if any) h # copy the marked counter to hold space s/^.*x// # delete everything before the marker y/0123456789/1234567890/ # increment the digits that were after the mark x # swap pattern space and hold space s/x.*$// # delete everything after the marker leaving the leading digits G # append hold space to pattern space s/
// # remove the newline, leaving all the digits concatenated h # save the counter into hold space :e # label e $ {x;p} # if this is the last line of input, swap in the counter and print it

sedsed を使用したスクリプトのトレースからの抜粋を以下に示します。

$ echo -e 'title
title
foo
title
bar
title
title
title
title
title
title
title
title' | sedsed-1.0 -d -f ./counter PATT:title$ HOLD:$ COMM:/^title/ !b e COMM:x PATT:$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:0$ HOLD:title$ COMM:/^9*$/ s/^/0/ PATT:0$ HOLD:title$ COMM:s/.9*$/x&/ PATT:x0$ HOLD:title$ COMM:h PATT:x0$ HOLD:x0$ COMM:s/^.*x// PATT:0$ HOLD:x0$ COMM:y/0123456789/1234567890/ PATT:1$ HOLD:x0$ COMM:x PATT:x0$ HOLD:1$ COMM:s/x.*$// PATT:$ HOLD:1$ COMM:G PATT:
1$ HOLD:1$ COMM:s/
// PATT:1$ HOLD:1$ COMM:h PATT:1$ HOLD:1$ COMM::e COMM:$ { PATT:1$ HOLD:1$ PATT:title$ HOLD:1$ COMM:/^title/ !b e COMM:x PATT:1$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:1$ HOLD:title$ COMM:/^9*$/ s/^/0/ PATT:1$ HOLD:title$ COMM:s/.9*$/x&/ PATT:x1$ HOLD:title$ COMM:h PATT:x1$ HOLD:x1$ COMM:s/^.*x// PATT:1$ HOLD:x1$ COMM:y/0123456789/1234567890/ PATT:2$ HOLD:x1$ COMM:x PATT:x1$ HOLD:2$ COMM:s/x.*$// PATT:$ HOLD:2$ COMM:G PATT:
2$ HOLD:2$ COMM:s/
// PATT:2$ HOLD:2$ COMM:h PATT:2$ HOLD:2$ COMM::e COMM:$ { PATT:2$ HOLD:2$ PATT:foo$ HOLD:2$ COMM:/^title/ !b e COMM:$ { PATT:foo$ HOLD:2$ . . . PATT:10$ HOLD:10$ PATT:title$ HOLD:10$ COMM:/^title/ !b e COMM:x PATT:10$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:10$ HOLD:title$ COMM:/^9*$/ s/^/0/ PATT:10$ HOLD:title$ COMM:s/.9*$/x&/ PATT:1x0$ HOLD:title$ COMM:h PATT:1x0$ HOLD:1x0$ COMM:s/^.*x// PATT:0$ HOLD:1x0$ COMM:y/0123456789/1234567890/ PATT:1$ HOLD:1x0$ COMM:x PATT:1x0$ HOLD:1$ COMM:s/x.*$// PATT:1$ HOLD:1$ COMM:G PATT:1
1$ HOLD:1$ COMM:s/
// PATT:11$ HOLD:1$ COMM:h PATT:11$ HOLD:11$ COMM::e COMM:$ { COMM:x PATT:11$ HOLD:11$ COMM:p 11 PATT:11$ HOLD:11$ COMM:} PATT:11$ HOLD:11$

省略記号は、ここでは省略した出力の行を表しています。「11」だけの行は、最終的なカウントが出力される場所です。 sedsedデバッガーが使用されていないときに表示されるのは、この出力のみです。

Jonathan Leffler · Answer

修正された回答

簡単に言うと、できません-sedはその仕事に適したツールではありません（数えることはできません）。

sed -n '/^title/p' file | grep -c

これは、タイトルで始まる行を探して出力し、出力をgrepに送ってカウントします。または、同等に：

grep -c '^title' file

元の回答-質問が編集される前

簡単に言うとできません。それは、この仕事に適したツールではありません。

grep -c title file sed -n /title/p file | wc -l

2番目は、srepをgrepのサロゲートとして使用し、出力を 'wc'に送信して行をカウントします。どちらも、タイトルの出現回数ではなく、「タイトル」を含む行の数をカウントします。あなたは次のようなものでそれを修正することができます：

cat file | tr ' ' '
' | grep -c title

'tr'コマンドは空白を改行に変換し、スペースで区切られた各単語を独自の行に配置します。したがって、grepは単語のタイトルを含む行のみをカウントします。これは、タイトルの2つの出現を区切るスペースがない「title-entitlement」などのシーケンスがない限り機能します。

ghostdog74 · Answer

sed 's/title/title
/g' file | grep -c title

ghostdog74 · Answer

1つのgawkコマンドで十分です。行に「タイトル」がいくつあるかに関係なく、「タイトル」を含む行のみをカウントするため、grep -cは使用しないでください。

$ more file # title # title one two #title title title three title junk title title four fivetitlesixtitle last $ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file total: 7

最初の文字列として「タイトル」だけが必要な場合は、〜の代わりに「==」を使用します

awk '$1 == "title"{++c}END{print c}' file

potong · Answer

これはあなたのために働くかもしれません：

sed '/^title/!d' file | sed -n '$='