Perlで文字列から部分文字列を抽出するにはどうすればよいですか？

Question

次の文字列を検討してください。

1）スキームID：abc-456-hu5t1（高優先度）*****

2）スキームID：frt-78f-hj542w（バランス）

3）スキームID：23f-f974-nm54w（super formula run）*****

上記の形式のように-太字の部分は文字列全体の変更です。

==> 上記のフォーマットの文字列がたくさんあると想像してください。上記の各文字列から3つの部分文字列（下の太字で表示）を選択します。

英数字の値を含む最初のサブストリング（たとえば、上記では「abc-456-hu5t10」）
Wordを含む2番目の部分文字列（たとえば、上記の「優先度が高い」）
*を含む3番目の部分文字列（IF *は文字列ELSEの最後にあります_そのままにします）

上記の各文字列からこれらの3つの部分文字列を選択するにはどうすればよいですか？ Perlで正規表現を使用してそれを実行できることはわかっています...これについてお手伝いできますか？

Dave Webb · Accepted Answer

あなたはこのようなことをすることができます：

my $data = <<END; 1) Scheme ID: abc-456-hu5t10 (High priority) * 2) Scheme ID: frt-78f-hj542w (Balanced) 3) Scheme ID: 23f-f974-nm54w (super formula run) * END foreach (split(/\n/,$data)) { $_ =~ /Scheme ID: ([a-z0-9-]+)\s+$([^)]+)$\s*(\*)?/ || next; my ($id,$Word,$star) = ($1,$2,$3); print "$id $Word $star\n"; }

重要なのは正規表現です。

Scheme ID: ([a-z0-9-]+)\s+$([^)]+)$\s*(\*)?

これは次のように分かれます。

固定文字列「スキームID：」：

Scheme ID:

1つ以上の文字a〜z、0〜9、または-が後に続きます。角かっこを使用して$ 1としてキャプチャします。

([a-z0-9-]+)

1つ以上の空白文字が続く：

\s+

開始ブラケット（エスケープ）が続き、その後に終了ブラケットではない任意の数の文字が続き、その後終了ブラケット（エスケープ）が続きます。エスケープされていない角かっこを使用して、単語を$ 2としてキャプチャします。

$([^)]+)$

$ 3としてキャプチャされたいくつかのスペースに続いて、おそらく*が続きます。

\s*(\*)?

Greg Hewgill · Answer

次のような正規表現を使用できます。

/([-a-z0-9]+)\s*$(.*?)$\s*(\*)?/

だから例えば：

$s = "abc-456-hu5t10 (High priority) *"; $s =~ /([-a-z0-9]+)\s*$(.*?)$\s*(\*)?/; print "$1\n$2\n$3\n";

プリント

abc-456-hu5t10 高優先度 *

Xetius · Answer

(\S*)\s*$(.*?)$\s*(\*?) (\S*) picks up anything which is NOT whitespace \s* 0 or more whitespace characters $ a literal open parenthesis (.*?) anything, non-greedy so stops on first occurrence of... $ a literal close parenthesis \s* 0 or more whitespace characters (\*?) 0 or 1 occurances of literal *

liam · Answer

さて、ここにワンライナー：

Perl -lne 'm|Scheme ID:\s+(.*?)\s+$(.*?)$\s?(\*)?|g&&print "$1:$2:$3"' file.txt

簡単に説明できるように、簡単なスクリプトに拡張しました。

#!/usr/bin/Perl -ln #-w : warnings #-l : print newline after every print #-n : apply script body to stdin or files listed at commandline, dont print $_ use strict; #always do this. my $regex = qr{ # precompile regex Scheme\ ID: # to match beginning of line. \s+ # 1 or more whitespace (.*?) # Non greedy match of all characters up to \s+ # 1 or more whitespace $ # parenthesis literal (.*?) # non-greedy match to the next $ # closing literal parenthesis \s* # 0 or more whitespace (trailing * is optional) (\*)? # 0 or 1 literal *s }x; #x switch allows whitespace in regex to allow documentation. #values trapped in $1 $2 $3, so do whatever you need to: #Perl lets you use any characters as delimiters, i like pipes because #they reduce the amount of escaping when using file paths m|$regex| && print "$1 : $2 : $3"; #alternatively if(m|$regex|) {doOne($1); doTwo($2) ... }

フォーマット以外の場合でも、ループのコマンドラインスイッチに依存するのではなく、メインループを実装してファイルを処理し、スクリプトの本体を具体化します。

Chas. Owens · Answer

これは私の最後の答えに小さな変更を加えるだけです：

my ($guid, $scheme, $star) = $line =~ m{ The [ ] Scheme [ ] GUID: [ ] ([a-zA-Z0-9-]+) #capture the guid [ ] $ (.+) $ #capture the scheme (?: [ ] ([*]) #capture the star )? #if it exists }x;

Michael Krelin - hacker · Answer

久しぶりのPerl

while(<STDIN>) { next unless /:\s*(\S+)\s+$([^$]+)\)\s*(\*?)/; print "|$1|$2|$3|\n"; }

Rap · Answer

文字列1：

$input =~ /'^\S+'/; $s1 = $&;

文字列2：

$input =~ /$.*$/; $s2 = $&;

文字列3：

$input =~ /\*?$/; $s3 = $&;