pythonを使用してelasticsearch-dslのフィールドを集約する

Question

誰かが私のドキュメントに関するものを集約（合計およびカウント）するPythonステートメントを記述する方法を教えてもらえますか？

脚本

from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="attendance") s = s.execute() for tag in s.aggregations.per_tag.buckets: print (tag.key)

出力

File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__ '%r object has no attribute %r' % (self.__class__.__name__, attr_name)) AttributeError: 'Response' object has no attribute 'aggregations'

これは何が原因ですか？「aggregations」キーワードは間違っていますか？インポートする必要がある他のパッケージはありますか？「出席」インデックスのドキュメントにemailAddressというフィールドがある場合、どのフィールドにそのフィールドの値があるかをどのようにカウントしますか？

VISQL · Accepted Answer

まず第一に。ここで私が書いたものには、実際には定義された集計がないことに気づきました。これを使用する方法に関するドキュメントは、私にはあまり読めません。上記で書いたものを使用して、拡張します。より良い例にするために、インデックス名を変更しています。

from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") s = s.execute() # invalid! You haven't defined an aggregation. #for tag in s.aggregations.per_tag.buckets: # print (tag.key) # Lets make an aggregation # 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator # 'field' is also a keyword, and 'house_number' is a field in our ES index s.aggs.bucket('by_house', 'terms', field='house_number', size=0)

上記では、家番号ごとに1つのバケットを作成しています。したがって、バケットの名前は家の番号になります。 ElasticSearch（ES）は常に、そのバケットに適合するドキュメントの数を提供します。 ESには10個の結果（または開発者が設定した結果）のみを返すデフォルト設定があるため、Size = 0はすべての結果を使用することを意味します。

# This runs the query. s = s.execute() # let's see what's in our results print s.aggregations.by_house.doc_count print s.hits.total print s.aggregations.by_house.buckets for item in s.aggregations.by_house.buckets: print item.doc_count

以前の私の間違いは、Elastic Searchクエリにデフォルトで集約があると思っていました。自分で定義し、実行します。次に、あなたが言及したアグリゲータで応答を分割できます。

上記のCURLは次のようになります。
注：Google ChromeにはElasticSearchプラグイン/拡張/アドオンのSENSEを使用しています。 SENSEでは、//を使用してコメントアウトできます。

POST /airbnb/sleep_overs/_search { // the size 0 here actually means to not return any hits, just the aggregation part of the result "size": 0, "aggs": { "by_house": { "terms": { // the size 0 here means to return all results, not just the the default 10 results "field": "house_number", "size": 0 } } } }

回避策。 DSLのGITに関する誰かから、翻訳を忘れてこの方法を使用するように言われました。それはもっと簡単で、CURLで難しいものを書くだけです。これが回避策と呼ばれる理由です。

# Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") # how simple we just past CURL code here body = { "size": 0, "aggs": { "by_house": { "terms": { "field": "house_number", "size": 0 } } } } s = Search.from_dict(body) s = s.index("airbnb") s = s.doc_type("sleepovers") body = s.to_dict() t = s.execute() for item in t.aggregations.by_house.buckets: # item.key will the house number print item.key, item.doc_count

お役に立てれば。 CURLですべてを設計してから、Pythonステートメントを使用して結果を取り除き、必要なものを取得します。これは、複数のレベルのある集約（サブ集約）に役立ちます。

ekmcd · Answer

私にはまだコメントする担当者がいませんが、from_dictに関するVISQLの回答に関するMatthewのコメントを少し修正したいと思いました。検索プロパティを維持する場合は、from_dictではなくupdate_from_dictを使用します。

Docs によると、from_dictは新しい検索オブジェクトを作成しますが、update_from_dictはその場で変更します。これは、検索にインデックス、使用などのプロパティがすでにある場合に必要です。

したがって、検索の前にクエリ本文を宣言し、次のように検索を作成する必要があります。

query_body = { "size": 0, "aggs": { "by_house": { "terms": { "field": "house_number", "size": 0 } } } } s = Search(using=client, index="airbnb", doc_type="sleep_overs").update_from_dict(query_body)