PyMongo count_documentsがcountより遅いのはなぜですか？

Question

_db['TF']_には約6000万件のレコードがあります。

レコードの量を取得する必要があります。

db['TF'].count()を実行すると、すぐに戻ります。

db['TF'].count_documents({})を実行すると、結果が得られるまでに非常に長い時間がかかります。

ただし、countメソッドは非推奨になります。

では、_count_documents_を使用するときに、どうすれば数量をすばやく取得できますか？私が逃したいくつかの議論はありますか？

ドキュメントとコードを読みましたが、何も見つかりませんでした。

どうもありがとう！

styvane · Answer

すでに述べたようにここ、動作はPyMongoに固有のものではありません。

その理由は、PyMongoの count_documents メソッドが集計クエリを実行し、メタデータを使用しないためです。 collection.py＃L1670-L1688 を参照してください

pipeline = [{'$match': filter}] if 'skip' in kwargs: pipeline.append({'$skip': kwargs.pop('skip')}) if 'limit' in kwargs: pipeline.append({'$limit': kwargs.pop('limit')}) pipeline.append({'$group': {'_id': None, 'n': {'$sum': 1}}}) cmd = SON([('aggregate', self.__name), ('pipeline', pipeline), ('cursor', {})]) if "hint" in kwargs and not isinstance(kwargs["hint"], string_type): kwargs["hint"] = helpers._index_document(kwargs["hint"]) collation = validate_collation_or_none(kwargs.pop('collation', None)) cmd.update(kwargs) with self._socket_for_reads(session) as (sock_info, slave_ok): result = self._aggregate_one_result( sock_info, slave_ok, cmd, collation, session) if not result: return 0 return result['n']

このコマンドには、 collection.countDocuments メソッドと同じ behavior があります。

そうは言っても、精度とパフォーマンスを交換したい場合は、 estimated_document_count メソッドを使用できますが、その一方で、 count を送信します。 =データベースへのコマンド動作と同じ collection.estimatedDocumentCount collection.py＃L1609-L1614 を参照

if 'session' in kwargs: raise ConfigurationError( 'estimated_document_count does not support sessions') cmd = SON([('count', self.__name)]) cmd.update(kwargs) return self._count(cmd)

self._count は、コマンドを送信するヘルパーです。