一致する文字列順列のセットをフィルタリングする

Question

itertools.permutations（）を使用してstringのすべての順列を返し、メンバーであるものだけを返しようとしていますwordsのセットの。

import itertools def permutations_in_dict(string, words): ''' Parameters ---------- string : {str} words : {set} Returns ------- list : {list} of {str} Example ------- >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] '''

私の現在の解決策はターミナルでうまくいきますが、どういうわけかテストケースに合格できませんでした...

return list(set([''.join(p) for p in itertools.permutations(string)]) & words)

どんな助けでもありがたいです。

AChampion · Accepted Answer

すべてのwordsを作成せずに、collections.Counter()を使用してstringをpermutationsと比較するだけです（これは文字列の長さで爆発します）。

_from collections import Counter def permutations_in_dict(string, words): c = Counter(string) return [w for w in words if c == Counter(w)] >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['cat', 'act'] _

注：setsは順序付けされていないため、特定の順序が必要な場合は、結果を並べ替える必要がある場合があります。 return sorted(...)

Raymond Hettinger · Answer

問題カテゴリ

あなたが解決している問題は、 anagram の一致のテストとして最もよく説明されています。

並べ替えを使用したソリューション

従来の解決策は、ターゲット文字列を並べ替え、候補文字列を並べ替え、等価性をテストすることです。

_>>> def permutations_in_dict(string, words): target = sorted(string) return sorted(Word for Word in words if sorted(Word) == target) >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] _

マルチセットを使用したソリューション

別のアプローチは collections.Counter（） を使用して multiset 等価テストを作成することです。これはソートソリューション（O(n)対O(n log n)）よりもアルゴリズム的に優れていますが、（すべての文字をハッシュするコストのため）文字列のサイズが大きくない場合は失われる傾向があります。

_>>> def permutations_in_dict(string, words): target = Counter(string) return sorted(Word for Word in words if Counter(Word) == target) >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] _

完全なハッシュを使用したソリューション

一意のアナグラム署名または完全なハッシュは、文字列内の可能な各文字に対応する素数を乗算することによって構築できます。

乗算の可換プロパティは、単一の文字列の順列に対してハッシュ値が不変であることを保証します。ハッシュ値の一意性は算術の基本定理（一意の素因数分解定理とも呼ばれる）によって保証されます。

_>>> from operator import mul >>> primes = [2, 3, 5, 7, 11] >>> primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))] >>> anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s)) >>> def permutations_in_dict(string, words): target = anagram_hash(string) return sorted(Word for Word in words if anagram_hash(Word) == target) >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] _

順列を使用したソリューション

itertools.permutations（） を使用してターゲット文字列の順列で検索することは、文字列が小さい場合に適切です（n長さの文字列はn階乗候補）を生成します。

nが小さく、wordsの数が多い場合、このアプローチは非常に高速に実行されます（メンバーシップを設定するため）テストはO（1）です）：

_>>> from itertools import permutations >>> def permutations_in_dict(string, words): perms = set(map(''.join, permutations(string))) return sorted(Word for Word in words if Word in perms) >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] _

OPが推測したように、純粋なpython検索ループは set.intersection（） ：

_>>> def permutations_in_dict(string, words): perms = set(map(''.join, permutations(string))) return sorted(words & perms) >>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'}) ['act', 'cat'] _

最良のソリューション

どのソリューションが最適かは、stringの長さとwordsの長さに依存します。タイミングは、特定の問題に最適なものを示します。

以下は、2つの異なる文字列サイズを使用したさまざまなアプローチの比較タイミングです。

_Timings with string_size=5 and words_size=1000000 ------------------------------------------------- 0.01406 match_sort 0.06827 match_multiset 0.02167 match_perfect_hash 0.00224 match_permutations 0.00013 match_permutations_set Timings with string_size=20 and words_size=1000000 -------------------------------------------------- 2.19771 match_sort 8.38644 match_multiset 4.22723 match_perfect_hash <takes "forever"> match_permutations <takes "forever"> match_permutations_set _

結果は、小さな文字列の場合、最速のアプローチがset-intersectionを使用してターゲット文字列の順列を検索することを示しています

より長い文字列の場合、最速のアプローチは、従来のソートおよび比較ソリューションです。

この小さなアルゴリズムの研究が私と同じくらい興味深いものであったことを願っています。要点は次のとおりです。

セット、itertools、およびコレクションは、このような問題を短時間で解決します。
Big-ohの実行時間は重要です（nの階乗は大きなnに対して分解されます）。
一定のオーバーヘッドが重要です（ハッシュのオーバーヘッドのため、ソートはマルチセットに勝ります）。
離散数学はアイデアの宝庫です。
分析してタイミングを実行するまで、何が最適かを知るのは困難です:-)

タイミング設定

FWIW、ここに私が比較タイミングを実行するために使用したテスト設定があります：

_from collections import Counter from itertools import permutations from string import letters from random import choice from operator import mul from time import time def match_sort(string, words): target = sorted(string) return sorted(Word for Word in words if sorted(Word) == target) def match_multiset(string, words): target = Counter(string) return sorted(Word for Word in words if Counter(Word) == target) primes = [2, 3, 5, 7, 11] primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))] anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s)) def match_perfect_hash(string, words): target = anagram_hash(string) return sorted(Word for Word in words if anagram_hash(Word) == target) def match_permutations(string, words): perms = set(map(''.join, permutations(string))) return sorted(Word for Word in words if Word in perms) def match_permutations_set(string, words): perms = set(map(''.join, permutations(string))) return sorted(words & perms) string_size = 5 words_size = 1000000 population = letters[: string_size+2] words = set() for i in range(words_size): Word = ''.join([choice(population) for i in range(string_size)]) words.add(Word) string = Word # Arbitrarily search use the last Word as the target print 'Timings with string_size=%d and words_size=%d' % (string_size, words_size) for func in (match_sort, match_multiset, match_perfect_hash, match_permutations, match_permutations_set): start = time() func(string, words) end = time() print '%-10.5f %s' % (end - start, func.__name__) _

Błotosmętek · Answer

どうやらあなたは出力がアルファベット順にソートされることを期待しているので、これはそうするべきです：

return sorted(set(''.join(p) for p in itertools.permutations(string)) & words)

MishaVacic · Answer

この解決策を試してください

list(map("".join, itertools.permutations('act'))) ['act', 'atc', 'cat', 'cta', 'tac', 'tca']

それをlistAと呼ぶことができます

listA = list(map("".join, itertools.permutations('act')))

あなたのリストはListBです

listB = ['cat', 'rat', 'dog', 'act']

次に、セットの交差点を使用します

list(set(listA) & set(listB)) ['cat', 'act']