Kafkaコンソールコンシューマエラー「パーティションでオフセットコミットに失敗しました」

Question

kafka-console-consumerを使用して、kafka=トピックをプローブしています。

断続的に、次の2つの警告が続くこのエラーメッセージが表示されます。

[2018-05-01 18:14:38,888] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-56648] Offset commit failed on partition my-topic-0 at offset 444: The coordinator is not aware of this member. (org.Apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Asynchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=444, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.Apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Synchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=447, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.Apache.kafka.clients.consumer.internals.ConsumerCoordinator)

警告ログで次のことが示唆されました：

これは、後続のpoll（）の呼び出し間の時間が設定されたmax.poll.interval.msよりも長いことを意味します。これは通常、ポーリングループがメッセージ処理に時間を費やしていることを意味します。セッションタイムアウトを増やすか、max.poll.recordsでpoll（）に返されるバッチの最大サイズを減らすことで、これに対処できます。

そのため、max.poll.interval.msを増やすか、max.poll.recordsを減らす必要があります。

各方法の意味を教えてください。また、異なる状況でどの方法が推奨されますか？

Nathan Walther · Accepted Answer

「大量のレコードの処理に時間を費やしてもよい」というmax.poll.interval.msを増やし、小さいバッチよりも大きいバッチをより効率的に処理できる場合、スループットが向上します。

max.poll.recordsを減らすには、「レコードを少なくして処理するのに十分な時間がある」と言い、スループットよりもレイテンシを優先します。

また、両方とも適切に構成されていますが、pollループ内で他の何かがパフォーマンスの問題を引き起こしていることも考慮してください。より大きな問題を隠さないように、構成を変更する前にまず調べてみます。