web-dev-qa-db-ja.com

大きなテーブルでCTEを使用してキーセットページネーションクエリを最適化する方法

私はあなたに迷惑をかけるためにここに来る前にこのトピックについてできる限り自分自身を文書化しようとしましたが、とにかくここにいます。

このテーブルにキーセットのページネーションを実装したいと思います:

create table api.subscription (
    subscription_id uuid primary key,
    token_id uuid not null,
    product_id uuid not null references api.product(product_id) deferrable,
    spid bigint null,
    attributes_snapshot jsonb not null,
    created_at timestamp not null,
    refreshed_at timestamp,
    enriched_at timestamp null,
    valid_until timestamp not null,
    is_cancelled boolean not null,
    has_been_expired boolean not null,
    has_quality_data boolean not null
);

そのために、このクエリを使用してページ分割メタデータを準備します。

with book as (
    select created_at, subscription_id
    from api.subscription
    where token_id = $1
    and refreshed_at >= $2
    and valid_until >= now()
    and not is_cancelled
    and not has_been_expired
),
total as (
    select count(*) as total from book
),
page as (
    select * from book
    where (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
    order by created_at asc, subscription_id asc
    limit $3
),
last_row as (
    select last_value(created_at) over() as last_seen_created_at,
    last_value(subscription_id) over() as last_seen_subscription_id
    from page
),
ids as (
    select array_agg(subscription_id) ids from page
)
select * from last_row, total, ids

これにより、ユーザーが現在ページ分割しているアイテムの総数、最後に表示されたキー(次のページ)、および現在のページのID(IN句(またはany)ですが、それは別の話です)。

問題は、テーブルに1900万行が含まれている場合にかかる時間です。

 Nested Loop  (cost=50000.14..50000.22 rows=1 width=64) (actual time=143.677..143.677 rows=0 loops=1)
   Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total, ids.ids
   Buffers: shared hit=395 read=24611
   CTE book
     ->  Seq Scan on api.subscription  (cost=0.00..50000.00 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
           Output: subscription.created_at, subscription.subscription_id
           Filter: ((NOT subscription.is_cancelled) AND (NOT subscription.has_been_expired) AND (subscription.token_id = '73759739-7af5-4d28-91bc-897c959bddcd'::uuid) AND (subscription.valid_until >= now()) AND (subscription.refreshed_at >= (now() - '4 days'::interval)))
           Rows Removed by Filter: 1000000
           Buffers: shared hit=389 read=24611
   CTE total
     ->  Aggregate  (cost=0.02..0.03 rows=1 width=8) (never executed)
           Output: count(*)
           ->  CTE Scan on book  (cost=0.00..0.02 rows=1 width=0) (never executed)
                 Output: book.created_at, book.subscription_id
   CTE page
     ->  Limit  (cost=0.03..0.04 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
           Output: book_1.created_at, book_1.subscription_id
           Buffers: shared hit=395 read=24611
           ->  Sort  (cost=0.03..0.04 rows=1 width=24) (actual time=143.673..143.673 rows=0 loops=1)
                 Output: book_1.created_at, book_1.subscription_id
                 Sort Key: book_1.created_at, book_1.subscription_id
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=395 read=24611
                 ->  CTE Scan on book book_1  (cost=0.00..0.02 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
                       Output: book_1.created_at, book_1.subscription_id
                       Buffers: shared hit=389 read=24611
   CTE last_row
     ->  WindowAgg  (cost=0.00..0.03 rows=1 width=24) (actual time=143.675..143.675 rows=0 loops=1)
           Output: last_value(page.created_at) OVER (?), last_value(page.subscription_id) OVER (?)
           Buffers: shared hit=395 read=24611
           ->  CTE Scan on page  (cost=0.00..0.02 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
                 Output: page.created_at, page.subscription_id
                 Buffers: shared hit=395 read=24611
   CTE ids
     ->  Aggregate  (cost=0.02..0.03 rows=1 width=32) (never executed)
           Output: array_agg(page_1.subscription_id)
           ->  CTE Scan on page page_1  (cost=0.00..0.02 rows=1 width=16) (never executed)
                 Output: page_1.created_at, page_1.subscription_id
   ->  Nested Loop  (cost=0.00..0.05 rows=1 width=32) (actual time=143.677..143.677 rows=0 loops=1)
         Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total
         Buffers: shared hit=395 read=24611
         ->  CTE Scan on last_row  (cost=0.00..0.02 rows=1 width=24) (actual time=143.676..143.676 rows=0 loops=1)
               Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id
               Buffers: shared hit=395 read=24611
         ->  CTE Scan on total  (cost=0.00..0.02 rows=1 width=8) (never executed)
               Output: total.total
   ->  CTE Scan on ids  (cost=0.00..0.02 rows=1 width=32) (never executed)
         Output: ids.ids
 Planning time: 0.580 ms
 Execution time: 143.786 ms

注:これは100万行しかない例ですが、1900万行を超えると20秒以上かかります。

私たちは運が悪く、さまざまなインデックスの設定を試しました:

Indexes:
    "subscription_pkey" PRIMARY KEY, btree (subscription_id)
    "has_been_expired_idx" btree (has_been_expired)
    "is_cancelled_idx" btree (is_cancelled)
    "pagination_idx" btree (token_id, refreshed_at, valid_until, is_cancelled, has_been_expired)
    "refreshed_at_idx" btree (refreshed_at)
    "token_and_created_at_idx" btree (token_id, created_at)
    "token_and_refreshed_at_idx" btree (token_id, refreshed_at)
    "token_id_idx" btree (token_id)
    "valid_until_idx" btree (valid_until)
Foreign-key constraints:
    "subscription_product_id_fkey" FOREIGN KEY (product_id) REFERENCES api.product(product_id) DEFERRABLE

だから私の質問は:

  • 正しいインデックスを見つけることによってこのクエリを改善する方法はありますか?
  • とにかくそれは良いアプローチですか?

ヒントがあれば歓迎します:)私の質問が理にかなっているといいのですが。

お読みいただきありがとうございます。

5
Florian Klein

合計がなければ、次のように書き直すことができます

with page as (
    select created_at, subscription_id
    from api.subscription
    where token_id = $1
    and refreshed_at >= $2
    and valid_until >= now()
    and not is_cancelled
    and not has_been_expired
    and (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
    order by created_at asc, subscription_id asc
    limit $3
),
last_row as (
    select last_value(created_at) over() as last_seen_created_at,
    last_value(subscription_id) over() as last_seen_subscription_id
    from page
),
ids as (valid_until
    select array_agg(subscription_id) ids from page
)
select * from last_row, ids

is_cancelledhas_been_expiredtoken_idの複合インデックスとrefreshed_atvalid_untilまたはcreated_atのいずれかが役立ちます。

1
Jasen