web-dev-qa-db-ja.com

16mのLinuxでZFSを使用するnginxは、アークフィルが多すぎ、IOが高いがスループットが低い

バインドされたネットワークが2x10GbpsのCentOs v7.7で機能する静的な大容量ファイル(100MB-16GB)のNginxセットアップ。 Linux上のZFSが使用されます。

  • 8x8TBディスク上のプールサイズ50TB
  • 最大弧サイズ65GB
  • L2ARC 1TB nvme
  • Recordsize = 16M
  • ashift = 12
  • nginx:sendfileオフ
  • nginx:aioオン
  • nginx:output_buffers 1128k

システムは数日間稼働しています。アークの充填に使用されているCPUが多すぎます。ディスクは600MB/sでビジーですが、nginxスループットは2Gbps未満であり、L2ARCヒット率は非常に低いです。何か案が?

Zfs_arc_summaryの出力とパフォーマンスレポートを以下に示します。

ZFS Subsystem Report                            Wed May 20 12:27:46 2020
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                1.84m
        Mutex Misses:                           157.78k
        Evict Skips:                            157.78k

ARC Size:                               102.54% 66.97   GiB
        Target Size: (Adaptive)         100.00% 65.32   GiB
        Min Size (Hard Limit):          92.87%  60.66   GiB
        Max Size (High Water):          1:1     65.32   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       46.89%  31.40   GiB
        Frequently Used Cache Size:     53.11%  35.57   GiB

ARC Hash Breakdown:
        Elements Max:                           159.31k
        Elements Current:               97.44%  155.23k
        Collisions:                             11.76k
        Chain Max:                              2
        Chains:                                 779

ARC Total accesses:                                     446.46m
        Cache Hit Ratio:                99.29%  443.29m
        Cache Miss Ratio:               0.71%   3.17m
        Actual Hit Ratio:               99.29%  443.29m

        Data Demand Efficiency:         99.28%  402.73m

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           5.99%   26.57m
          Most Frequently Used:         94.01%  416.71m
          Most Recently Used Ghost:     0.00%   9.65k
          Most Frequently Used Ghost:   0.28%   1.26m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  90.19%  399.81m
          Prefetch Data:                0.00%   0
          Demand Metadata:              9.81%   43.47m
          Prefetch Metadata:            0.00%   1.82k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  91.77%  2.91m
          Prefetch Data:                0.00%   0
          Demand Metadata:              7.85%   249.26k
          Prefetch Metadata:            0.38%   12.12k

L2 ARC Summary: (HEALTHY)
        Low Memory Aborts:                      0
        Free on Write:                          3
        R/W Clashes:                            0
        Bad Checksums:                          0
        IO Errors:                              0

L2 ARC Size: (Adaptive)                         458.07  GiB
        Compressed:                     99.60%  456.23  GiB
        Header Size:                    0.00%   5.34    MiB

L2 ARC Breakdown:                               3.17m
        Hit Ratio:                      15.02%  476.70k
        Miss Ratio:                     84.98%  2.70m
        Feeds:                                  55.31k

L2 ARC Writes:
        Writes Sent:                    100.00% 55.27k



ZFS Tunable:
        metaslab_debug_load                               0
        zfs_multihost_interval                            1000
        zfs_vdev_default_ms_count                         200
        zfetch_max_streams                                8
        zfs_nopwrite_enabled                              1
        zfetch_min_sec_reap                               2
        zfs_dbgmsg_enable                                 1
        zfs_dirty_data_max_max_percent                    25
        zfs_abd_scatter_enabled                           1
        zfs_remove_max_segment                            16777216
        zfs_deadman_ziotime_ms                            300000
        spa_load_verify_data                              1
        zfs_zevent_cols                                   80
        zfs_obsolete_min_time_ms                          500
        zfs_dirty_data_max_percent                        40
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_resilver_disable_defer                        0
        zfs_sync_pass_dont_compress                       8
        zvol_volmode                                      1
        l2arc_write_max                                   8388608
        zfs_disable_ivset_guid_check                      0
        zfs_vdev_scrub_max_active                         128
        zfs_vdev_sync_write_min_active                    64
        zvol_prefetch_bytes                               131072
        zfs_send_unmodified_spill_blocks                  1
        metaslab_aliquot                                  524288
        zfs_no_scrub_prefetch                             0
        zfs_abd_scatter_max_order                         10
        zfs_arc_shrink_shift                              0
        zfs_vdev_queue_depth_pct                          1000
        zfs_txg_history                                   100
        zfs_vdev_removal_max_active                       2
        zil_maxblocksize                                  131072
        metaslab_force_ganging                            16777217
        zfs_delay_scale                                   500000
        zfs_free_bpobj_enabled                            1
        zfs_vdev_async_write_active_min_dirty_percent     30
        metaslab_debug_unload                             1
        zfs_read_history                                  0
        zfs_vdev_initializing_max_active                  1
        zvol_max_discard_blocks                           16384
        zfs_recover                                       0
        zfs_scan_fill_weight                              3
        spa_load_print_vdev_tree                          0
        zfs_key_max_salt_uses                             400000000
        zfs_metaslab_segment_weight_enabled               1
        zfs_dmu_offset_next_sync                          0
        l2arc_headroom                                    2
        zfs_deadman_synctime_ms                           600000
        zfs_dirty_data_sync_percent                       20
        zfs_free_min_time_ms                              1000
        zfs_dirty_data_max                                4294967296
        zfs_vdev_async_read_min_active                    64
        dbuf_metadata_cache_max_bytes                     314572800
        zfs_mg_noalloc_threshold                          0
        zfs_dedup_prefetch                                0
        dbuf_cache_lowater_pct                            10
        zfs_slow_io_events_per_second                     20
        zfs_vdev_max_active                               1000
        l2arc_write_boost                                 8388608
        zfs_resilver_min_time_ms                          3000
        zfs_max_missing_tvds                              0
        zfs_vdev_async_write_max_active                   10
        zvol_request_sync                                 0
        zfs_async_block_max_blocks                        100000
        metaslab_df_max_search                            16777216
        zfs_prefetch_disable                              1
        metaslab_lba_weighting_enabled                    1
        zio_dva_throttle_enabled                          1
        metaslab_df_use_largest_segment                   0
        zfs_vdev_trim_max_active                          2
        zfs_unlink_suspend_progress                       0
        zfs_sync_taskq_batch_pct                          75
        zfs_arc_min_prescient_prefetch_ms                 0
        zfs_scan_max_ext_gap                              2097152
        zfs_initialize_value                              16045690984833335022
        zfs_mg_fragmentation_threshold                    95
        zil_nocacheflush                                  0
        l2arc_feed_again                                  1
        zfs_trim_metaslab_skip                            0
        zfs_zevent_console                                0
        zfs_immediate_write_sz                            32768
        zfs_condense_indirect_commit_entry_delay_ms       0
        zfs_dbgmsg_maxsize                                4194304
        zfs_trim_extent_bytes_max                         134217728
        zfs_trim_extent_bytes_min                         32768
        zfs_user_indirect_is_special                      1
        zfs_lua_max_instrlimit                            100000000
        zfs_free_leak_on_eio                              0
        zfs_special_class_metadata_reserve_pct            25
        zfs_deadman_enabled                               1
        dmu_object_alloc_chunk_shift                      7
        vdev_validate_skip                                0
        zfs_commit_timeout_pct                            5
        zfs_arc_meta_limit_percent                        75
        metaslab_bias_enabled                             1
        zfs_send_queue_length                             16777216
        zfs_arc_p_dampener_disable                        1
        zfs_object_mutex_size                             64
        zfs_metaslab_fragmentation_threshold              70
        zfs_delete_blocks                                 20480
        zfs_arc_dnode_limit_percent                       10
        zfs_no_scrub_io                                   0
        zfs_dbuf_state_index                              0
        zio_deadman_log_all                               0
        zfs_vdev_sync_read_min_active                     64
        zfs_deadman_checktime_ms                          60000
        metaslab_fragmentation_factor_enabled             1
        zfs_override_estimate_recordsize                  0
        zfs_multilist_num_sublists                        0
        zvol_inhibit_dev                                  0
        zfs_scan_legacy                                   0
        zfetch_max_distance                               16777216
        zap_iterate_prefetch                              1
        zfs_scan_strict_mem_lim                           0
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_scan_checkpoint_intval                        7200
        dmu_prefetch_max                                  134217728
        zfs_recv_queue_length                             16777216
        zfs_vdev_mirror_rotating_seek_inc                 5
        dbuf_cache_shift                                  5
        dbuf_metadata_cache_shift                         6
        zfs_condense_min_mapping_bytes                    131072
        zfs_vdev_cache_size                               0
        spa_config_path                                   /etc/zfs/zpool.cache
        zfs_dirty_data_max_max                            4294967296
        zfs_arc_lotsfree_percent                          10
        zfs_vdev_ms_count_limit                           131072
        zfs_zevent_len_max                                1024
        zfs_checksum_events_per_second                    20
        zfs_arc_sys_free                                  0
        zfs_scan_issue_strategy                           0
        zfs_arc_meta_strategy                             1
        zfs_condense_max_obsolete_bytes                   1073741824
        zfs_vdev_cache_bshift                             16
        zfs_compressed_arc_enabled                        1
        zfs_arc_meta_adjust_restarts                      4096
        zfs_max_recordsize                                16777216
        zfs_vdev_scrub_min_active                         48
        zfs_zil_clean_taskq_maxalloc                      1048576
        zfs_lua_max_memlimit                              104857600
        zfs_vdev_raidz_impl                               cycle [fastest] original scalar sse2 ssse3
        zfs_per_txg_dirty_frees_percent                   5
        zfs_vdev_read_gap_limit                           32768
        zfs_scan_vdev_limit                               4194304
        zfs_zil_clean_taskq_minalloc                      1024
        zfs_multihost_history                             0
        zfs_scan_mem_lim_fact                             20
        zfs_arc_meta_limit                                0
        spa_load_verify_shift                             4
        zfs_vdev_sync_write_max_active                    128
        l2arc_norw                                        0
        zfs_arc_meta_Prune                                10000
        zfs_vdev_removal_min_active                       1
        metaslab_preload_enabled                          1
        dbuf_cache_max_bytes                              629145600
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_spa_discard_memory_limit                      16777216
        zfs_vdev_initializing_min_active                  1
        zvol_major                                        230
        zfs_vdev_aggregation_limit                        1048576
        zfs_flags                                         0
        zfs_vdev_mirror_rotating_seek_offset              1048576
        spa_asize_inflation                               24
        zfs_admin_snapshot                                0
        l2arc_feed_secs                                   1
        vdev_removal_max_span                             32768
        zfs_trim_txg_batch                                32
        zfs_multihost_fail_intervals                      10
        zfs_abd_scatter_min_size                          1536
        zio_taskq_batch_pct                               75
        zfs_sync_pass_deferred_free                       2
        zfs_arc_min_prefetch_ms                           0
        zvol_threads                                      32
        zfs_condense_indirect_vdevs_enable                1
        zfs_arc_grow_retry                                0
        zfs_multihost_import_intervals                    20
        zfs_read_history_hits                             0
        zfs_vdev_min_ms_count                             16
        zfs_zil_clean_taskq_nthr_pct                      100
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_async_read_max_active                    128
        zfs_vdev_aggregate_trim                           0
        zfs_delay_min_dirty_percent                       60
        zfs_vdev_cache_max                                16384
        zfs_removal_suspend_progress                      0
        zfs_vdev_trim_min_active                          1
        zfs_scan_mem_lim_soft_fact                        20
        ignore_hole_birth                                 1
        spa_slop_shift                                    5
        zfs_vdev_write_gap_limit                          4096
        dbuf_cache_hiwater_pct                            10
        spa_load_verify_metadata                          1
        l2arc_noprefetch                                  1
        send_holes_without_birth_time                     1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_pc_percent                                0
        zfs_metaslab_switch_threshold                     2
        zfs_vdev_scheduler                                deadline
        zil_slog_bulk                                     786432
        zfs_expire_snapshot                               300
        zfs_sync_pass_rewrite                             2
        zil_replay_disable                                0
        zfs_nocacheflush                                  0
        zfs_vdev_aggregation_limit_non_rotating           131072
        zfs_arc_max                                       70132659200
        zfs_arc_min                                       65132659200
        zfs_read_chunk_size                               1048576
        zfs_txg_timeout                                   5
        zfs_trim_queue_limit                              10
        zfs_arc_dnode_limit                               0
        zfs_scan_ignore_errors                            0
        zfs_pd_bytes_max                                  52428800
        zfs_scrub_min_time_ms                             1000
        l2arc_headroom_boost                              200
        zfs_send_corrupt_data                             0
        l2arc_feed_min_ms                                 200
        zfs_arc_meta_min                                  0
        zfs_arc_average_blocksize                         8192
        zfetch_array_rd_sz                                1048576
        zfs_autoimport_disable                            1
        zio_slow_io_ms                                    30000
        zfs_arc_p_min_shift                               0
        zio_requeue_io_start_cut_in_line                  1
        zfs_removal_ignore_errors                         0
        zfs_scan_suspend_progress                         0
        zfs_vdev_sync_read_max_active                     128
        zfs_deadman_failmode                              wait
        zfs_reconstruct_indirect_combinations_max         4096
        zfs_ddt_data_is_special                           1

nginxプロセスのパフォーマンスレポート

3
raitech

クライアントが要求しているI/Oのサイズは?ここでの私の直感は、レコードサイズが大きすぎて巨大な読み取り増幅を引き起こしていることです。クライアントがより小さなブロックをフェッチしている場合でも、ZFSはチェックサムを確認するために16MBのブロック全体を読み取る必要があります。ARCとL2ARCは、シーケンシャルI/Oでのキャッシュの利点が小さいため、シーケンシャルI/Oのキャッシュに耐性があるように設計されています。ブロック全体をキャッシュする必要があるため、ARCでは最大4000ブロックしかキャッシュできません。

レコードサイズを1MBに減らし、cp + mvを使用頻度の高いいくつかのファイルに戻し、ディスクI/OとネットワークI/Oがより類似しているかどうかを確認します。

1
Gordan Bobic