Python globですが、ファイルシステムではなく文字列のリストに対して

Question

glob 形式のパターンを、ファイルシステム内の実際のファイルではなく、文字列のリストと照合できるようにしたい。これを行う方法、またはglobパターンを簡単に正規表現に変換する方法はありますか？

Nizam Mohamed · Accepted Answer

良いアーティストのコピー。偉大な芸術家スチール。

私は盗んだ;）

fnmatch.translateグロブを翻訳?および*から正規表現へ.および.*それぞれ。調整しないようにしました。

import re def glob2re(pat): """Translate a Shell PATTERN to a regular expression. There is no way to quote meta-characters. """ i, n = 0, len(pat) res = '' while i < n: c = pat[i] i = i+1 if c == '*': #res = res + '.*' res = res + '[^/]*' Elif c == '?': #res = res + '.' res = res + '[^/]' Elif c == '[': j = i if j < n and pat[j] == '!': j = j+1 if j < n and pat[j] == ']': j = j+1 while j < n and pat[j] != ']': j = j+1 if j >= n: res = res + '\[' else: stuff = pat[i:j].replace('\','\\') i = j+1 if stuff[0] == '!': stuff = '^' + stuff[1:] Elif stuff[0] == '^': stuff = '\' + stuff res = '%s[%s]' % (res, stuff) else: res = res + re.escape(c) return res + '\Z(?ms)'

これはアラfnmatch.filter、どちらも re.matchおよびre.search仕事。

def glob_filter(names,pat): return (name for name in names if re.match(glob2re(pat),name))

このページで見つかったグロブパターンと文字列はテストに合格しています。

pat_dict = { 'a/b/*/f.txt': ['a/b/c/f.txt', 'a/b/q/f.txt', 'a/b/c/d/f.txt','a/b/c/d/e/f.txt'], '/foo/bar/*': ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'], '/*/bar/b*': ['/foo/bar/baz', '/foo/bar/bar'], '/*/[be]*/b*': ['/foo/bar/baz', '/foo/bar/bar'], '/foo*/bar': ['/foolicious/spamfantastic/bar', '/foolicious/bar'] } for pat in pat_dict: print('pattern :\t{}\nstrings :\t{}'.format(pat,pat_dict[pat])) print('matched :\t{}\n'.format(list(glob_filter(pat_dict[pat],pat))))

Martijn Pieters · Answer

globモジュールは fnmatch module forindividual path elementsを使用します。

つまり、パスはディレクトリ名とファイル名に分割され、ディレクトリ名にメタ文字（_[_、_*_または_?_のいずれかの文字が含まれる）が含まれている場合、これらは展開されます再帰的に。

単純なファイル名である文字列のリストがある場合は、 fnmatch.filter() function を使用するだけで十分です：

_import fnmatch matching = fnmatch.filter(filenames, pattern) _

ただし、完全なパスが含まれている場合、生成される正規表現ではパスセグメントが考慮されないため、さらに作業を行う必要があります（ワイルドカードはセパレータを除外せず、クロスプラットフォームのパスマッチング用に調整されません）。

パスから単純な trie を作成し、それに対してパターンを照合できます。

_import fnmatch import glob import os.path from itertools import product # Cross-Python dictionary views on the keys if hasattr(dict, 'viewkeys'): # Python 2 def _viewkeys(d): return d.viewkeys() else: # Python 3 def _viewkeys(d): return d.keys() def _in_trie(trie, path): """Determine if path is completely in trie""" current = trie for elem in path: try: current = current[elem] except KeyError: return False return None in current def find_matching_paths(paths, pattern): """Produce a list of paths that match the pattern. * paths is a list of strings representing filesystem paths * pattern is a glob pattern as supported by the fnmatch module """ if os.altsep: # normalise pattern = pattern.replace(os.altsep, os.sep) pattern = pattern.split(os.sep) # build a trie out of path elements; efficiently search on prefixes path_trie = {} for path in paths: if os.altsep: # normalise path = path.replace(os.altsep, os.sep) _, path = os.path.splitdrive(path) elems = path.split(os.sep) current = path_trie for elem in elems: current = current.setdefault(elem, {}) current.setdefault(None, None) # sentinel matching = [] current_level = [path_trie] for subpattern in pattern: if not glob.has_magic(subpattern): # plain element, element must be in the trie or there are # 0 matches if not any(subpattern in d for d in current_level): return [] matching.append([subpattern]) current_level = [d[subpattern] for d in current_level if subpattern in d] else: # match all next levels in the trie that match the pattern matched_names = fnmatch.filter({k for d in current_level for k in d}, subpattern) if not matched_names: # nothing found return [] matching.append(matched_names) current_level = [d[n] for d in current_level for n in _viewkeys(d) & set(matched_names)] return [os.sep.join(p) for p in product(*matching) if _in_trie(path_trie, p)] _

この一口は、パスに沿ったどこでもグロブを使用して一致をすばやく見つけることができます。

_>>> paths = ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/foo/bar/*') ['/foo/bar/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/*/bar/b*') ['/foo/bar/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/*/[be]*/b*') ['/foo/bar/baz', '/foo/bar/bar', '/spam/eggs/baz'] _

Veedrac · Answer

Python 3.4+では、 PurePath.match 。

pathlib.PurePath(path_string).match(pattern)

Python 3.3以前（2.xを含む））では、 pathlib from PyPI を取得します。

プラットフォームに依存しない結果（これを実行している理由に依存する）を取得するには、PurePosixPathまたはPureWindowsPathを明示的に指定する必要があることに注意してください。

mu 無 · Answer

_fnmatch.fnmatch_ を使用して、パターンがファイル名と一致するかどうかを直接確認できますが、_fnmatch.translate_メソッドを使用して、指定されたfnmatchパターン：

_>>> import fnmatch >>> fnmatch.translate('*.txt') '.*\.txt\Z(?ms)' _

documenation から：

fnmatch.translate(pattern)

正規表現に変換されたシェルスタイルのパターンを返します。

Jason S · Answer

気にしないで、私はそれを見つけました。 fnmatch モジュールが必要です。

Carson Gee · Answer

things/**/*.pyがexample*.pyと一致しないように、folder/example_stuff.pyなどの再帰的なglobパターンのサポートを追加し、相対パスを一致させたいと考えていました。

これが私のアプローチです：

 from os import path import re def recursive_glob_filter(files, glob): # Convert to regex and add start of line match pattern_re = '^' + fnmatch_translate(glob) # fnmatch does not escape path separators so escape them if path.sep in pattern_re and not r'\{}'.format(path.sep) in pattern_re: pattern_re = pattern_re.replace('/', r'\/') # Replace `*` with one that ignores path separators sep_respecting_wildcard = '[^\{}]*'.format(path.sep) pattern_re = pattern_re.replace('.*', sep_respecting_wildcard) # And now for `**` we have `[^\/]*[^\/]*`, so replace that with `.*` # to match all patterns in-between pattern_re = pattern_re.replace(2 * sep_respecting_wildcard, '.*') compiled_re = re.compile(pattern_re) return filter(compiled_re.search, files)

NumesSanguis · Answer

文字列のリストに適用できる@Veedracの拡張 _PurePath.match_ 回答：

_# Python 3.4+ from pathlib import Path path_list = ["foo/bar.txt", "spam/bar.txt", "foo/eggs.txt"] # convert string to pathlib.PosixPath / .WindowsPath, then apply PurePath.match to list print([p for p in path_list if Path(p).match("ba*")]) # "*ba*" also works # output: ['foo/bar.txt', 'spam/bar.txt'] print([p for p in path_list if Path(p).match("*o/ba*")]) # output: ['foo/bar.txt'] _

pathlib.Path()よりもpathlib.PurePath()を使用することをお勧めします。これにより、基礎となるファイルシステムについて心配する必要がなくなります。