2つの部分文字列の間の文字列を探す

Question

2つの部分文字列（'123STRINGabc' -> 'STRING'）の間の文字列を見つけるにはどうすればいいですか？

私の現在の方法はこのようなものです：

>>> start = 'asdf=5;' >>> end = '123jasd' >>> s = 'asdf=5;iwantthis123jasd' >>> print((s.split(start))[1].split(end)[0]) iwantthis

しかし、これは非常に非効率的で非Pythonicのようです。このようなことをするためのより良い方法は何ですか？

言及を忘れて：文字列がstartとendで始まらず終わらないかもしれません。前後にもっと文字があるかもしれません。

Nikolaus Gradwohl · Accepted Answer

import re s = 'asdf=5;iwantthis123jasd' result = re.search('asdf=5;(.*)123jasd', s) print(result.group(1))

cji · Answer

s = "123123STRINGabcabc" def find_between( s, first, last ): try: start = s.index( first ) + len( first ) end = s.index( last, start ) return s[start:end] except ValueError: return "" def find_between_r( s, first, last ): try: start = s.rindex( first ) + len( first ) end = s.rindex( last, start ) return s[start:end] except ValueError: return "" print find_between( s, "123", "abc" ) print find_between_r( s, "123", "abc" )

与える：

123STRING STRINGabc

必要な動作に応じて、index呼び出しとrindex呼び出しを混在させるか、上記のバージョンのいずれかを使用できます（正規表現(.*)および(.*?)グループ）。

ansetou · Answer

start = 'asdf=5;' end = '123jasd' s = 'asdf=5;iwantthis123jasd' print s[s.find(start)+len(start):s.rfind(end)]

与える

iwantthis

Tim McNamara · Answer

s[len(start):-len(end)]

Tim McNamara · Answer

文字列フォーマットは、Nikolaus Gradwohlが提案したものにいくらかの柔軟性を追加します。 startとendは必要に応じて修正できるようになりました。

import re s = 'asdf=5;iwantthis123jasd' start = 'asdf=5;' end = '123jasd' result = re.search('%s(.*)%s' % (start, end), s).group(1) print(result)

reubano · Answer

OPの解を答えに変換するだけです。

def find_between(s, start, end): return (s.split(start))[1].split(end)[0]

John La Rooy · Answer

これを行う1つの方法を次に示します

_,_,rest = s.partition(start) result,_,_ = rest.partition(end) print result

正規表現を使用する別の方法

import re print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

または

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

tstoev · Answer

source='your token _here0@df and maybe _here1@df or maybe _here2@df' start_sep='_' end_sep='@df' result=[] tmp=source.split(start_sep) for par in tmp: if end_sep in par: result.append(par.split(end_sep)[0]) print result

表示する必要があります：here0、here1、here2

正規表現は優れていますが、追加のlibが必要になるため、pythonのみを使用したい場合があります。

Fernando Wittmann · Answer

何もインポートしたくない場合は、文字列メソッド.index()を試してください。

text = 'I want to find a string between two substrings' left = 'find a ' right = 'between two' # Output: 'string' print text[text.index(left)+len(left):text.index(right)]

Wikis · Answer

STRINGを抽出するには、次の手順を試してください。

myString = '123STRINGabc' startString = '123' endString = 'abc' mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]

josh · Answer

私のやり方は、

find index of start string in s => i find index of end string in s => j substring = substring(i+len(start) to j-1)

Wesley Kitlasten · Answer

これらの解決策は開始文字列と最終文字列が異なると仮定しています。これは、ファイル全体がreadlines（）を使用して読み取られると仮定して、最初と最後の標識が同じ場合にファイル全体に使用するソリューションです。

def extractstring(line,flag='$'): if flag in line: # $ is the flag dex1=line.index(flag) subline=line[dex1+1:-1] #leave out flag (+1) to end of line dex2=subline.index(flag) string=subline[0:dex2].strip() #does not include last flag, strip whitespace return(string)

例：

lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd', 'afafoaltat $I GOT BETTER!$ derpity derp derp'] for line in lines: string=extractstring(line,flag='$') print(string)

与えます：

A NEWT? I GOT BETTER!

Mnyikka · Answer

これは、検索したstring1とstring2の間の文字列を含むリストを返すために行った関数です。

def GetListOfSubstrings(stringSubject,string1,string2): MyList = [] intstart=0 strlength=len(stringSubject) continueloop = 1 while(intstart < strlength and continueloop == 1): intindex1=stringSubject.find(string1,intstart) if(intindex1 != -1): #The substring was found, lets proceed intindex1 = intindex1+len(string1) intindex2 = stringSubject.find(string2,intindex1) if(intindex2 != -1): subsequence=stringSubject[intindex1:intindex2] MyList.append(subsequence) intstart=intindex2+len(string2) else: continueloop=0 else: continueloop=0 return MyList #Usage Example mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","y68") for x in range(0, len(List)): print(List[x]) output: mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","3") for x in range(0, len(List)): print(List[x]) output: 2 2 2 2 mystring="s123y123o123pp123y6" List = GetListOfSubstrings(mystring,"1","y") for x in range(0, len(List)): print(List[x]) output: 23 23o123pp123

Love and peace - Joe Codeswell · Answer

これは基本的にcjiの答えです - 5月58日の7月30日の10月30日。例外の原因を明確にするため、try except構造を変更しました。

def find_between( inputStr, firstSubstr, lastSubstr ): ''' find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT http://stackoverflow.com/questions/3368969/find-string-between-two-substrings above also has a func that does this FROM THE RIGHT ''' start, end = (-1,-1) try: start = inputStr.index( firstSubstr ) + len( firstSubstr ) except ValueError: print ' ValueError: ', print "firstSubstr=%s - "%( firstSubstr ), print sys.exc_info()[1] try: end = inputStr.index( lastSubstr, start ) except ValueError: print ' ValueError: ', print "lastSubstr=%s - "%( lastSubstr ), print sys.exc_info()[1] return inputStr[start:end]

thecollinsprogram · Answer

このコードを使用するか、以下の機能をコピーすることができます。すべて一行できちんと。

def substring(whole, sub1, sub2): return whole[whole.index(sub1) : whole.index(sub2)]

次のように機能を実行すると。

print(substring("5+(5*2)+2", "(", "("))

あなたはおそらく出力が残るでしょう：

(5*2

のではなく

5*2

出力の最後にサブストリングを入れたい場合は、コードは以下のようになります。

return whole[whole.index(sub1) : whole.index(sub2) + 1]

しかし、最後に部分文字列を付けたくない場合は、+ 1を最初の値にする必要があります。

return whole[whole.index(sub1) + 1 : whole.index(sub2)]

Matthew Dunn · Answer

さまざまなEメールプラットフォームからの区切り文字を使用してテキストを解析すると、この問題がさらに大きくなります。彼らは一般的にスタートとストップを持っています。ワイルドカードの区切り文字は正規表現を詰まらせ続けました。分割の問題は、ここや他の場所で言及されています - おっと、区切り文字がなくなりました。私には、split（）に他に何か消費させるためにreplace（）を使用することがありました。コードの塊：

nuke = '~~~' start = '|*' stop = '*|' julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke)) keep = [chunk for chunk in julien if start in chunk and stop in chunk] logging.info('keep: %s',keep)

Tony Veijalainen · Answer

これは私が以前に投稿した Daniwebのコードスニペット：

# picking up piece of string between separators # function using partition, like partition, but drops the separators def between(left,right,s): before,_,a = s.partition(left) a,_,after = a.partition(right) return before,a,after s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa" print between('<a>','</a>',s) print between('(',')',s) print between("'","'",s) """ Output: ('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa") ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa") ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa') """

AXO · Answer

from timeit import timeit from re import search, DOTALL def partition_find(string, start, end): return string.partition(start)[2].rpartition(end)[0] def re_find(string, start, end): # applying re.escape to start and end would be safer return search(start + '(.*)' + end, string, DOTALL).group(1) def index_find(string, start, end): return string[string.find(start) + len(start):string.rfind(end)] # The wikitext of "Alan Turing law" article form English Wikipeida # https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886 string = """...""" start = '==Proposals==' end = '==Rival bills==' assert index_find(string, start, end) \ == partition_find(string, start, end) \ == re_find(string, start, end) print('index_find', timeit( 'index_find(string, start, end)', globals=globals(), number=100_000, )) print('partition_find', timeit( 'partition_find(string, start, end)', globals=globals(), number=100_000, )) print('re_find', timeit( 're_find(string, start, end)', globals=globals(), number=100_000, ))

結果：

index_find 0.35047444528454114 partition_find 0.5327825636197754 re_find 7.552149639286381

この例では、re_findはindex_findよりも約20倍遅くなりました。

Akshay · Answer

さらにNikolaus Gradwohlの回答から、ファイルの内容の下から（ 'ui：'と ' - '）の間のバージョン番号（つまり.0.2）を取得する必要がありました（filename：docker-） compose.yml）：

 version: '3.1' services: ui: image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1 #network_mode: Host ports: - 443:9999 ulimits: nofile:test

これが私のために働いた方法です（Pythonスクリプト）：

import re, sys f = open('docker-compose.yml', 'r') lines = f.read() result = re.search('ui:(.*)-', lines) print result.group(1) Result: 0.0.2