web-dev-qa-db-ja.com

patroniを使用した高可用性PostgreSQL

私は このガイド に従って、Postgres HAのラボをセットアップします。

私は正確にガイドに従います(私の場合はIPアドレスを変更します)、結局すべてがPostgresサーバーで機能します1

しかし、Postgresサーバー2になるとpatroni.yml セットアップ

ガイドは両方のPostgresサーバーで同じpatroni.ymlセットアップですが、再起動するとpatroni service

この問題はserver1で発生しました

quanlm@DB1:~$ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago
 Main PID: 411 (patroni)
    Tasks: 12
   Memory: 77.6M
      CPU: 4.041s
   CGroup: /system.slice/patroni.service
           ├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
           ├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran
           ├─435 postgres: postgres: checkpointer process                                                                                   
           ├─436 postgres: postgres: writer process                                                                                         
           ├─439 postgres: postgres: stats collector process                                                                                
           ├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle                                                           
           ├─455 postgres: postgres: wal writer process                                                                                     
           └─456 postgres: postgres: autovacuum launcher process                                                                            

Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action.  i am the leader with the lock
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action.  i am the leader with the lock
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action.  i am the leader with the lock
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action.  i am the leader with the lock
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action.  i am the leader with the lock
Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader

サーバー1は問題ありませんが、サーバー2にあります

quanlm@DB2:~$ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago
  Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
 Main PID: 9514 (code=exited, status=1/FAILURE)

Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 440, in __init__
Nov 12 07:50:02 DB2 patroni[9514]:     self.server_bind()
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/http/server.py", line 138, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     socketserver.TCPServer.server_bind(self)
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     self.socket.bind(self.server_address)
Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state.
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.

結局それは機能しません。

編集によって両方のサーバーのリモート接続を許可していますlisten_addresses = '*' オン postgresql.confおよび

Host all all 0.0.0.0/0 md5

オン pg_hba.conf

したがって、HAproxyが機能し始めたときに、最初のサーバーがダウンしても、2番目のサーバーは稼働しませんでした。

問題は確かにサーバー2のpatroniにありますが、それを修正する方法は?

それ以外の場合、HA postgresqlサーバーに回避策はありますか?

P/s:ファイアウォール設定

quanlm@DB1:~$ Sudo ufw status
Status: inactive
quanlm@DB1:~$ Sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  
quanlm@DB2:~$ Sudo ufw status
Status: inactive
quanlm@DB2:~$ Sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  

P/ss:私のpatroni.yml

quanlm@DB1:~$ cat /etc/patroni.yml 
scope: postgres
namespace: /db/
name: postgresql0

restapi:
    listen: 192.168.122.77:8008
    connect_address: 192.168.122.77:8008

etcd:
    Host: 192.168.122.156:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - Host replication replicator 127.0.0.1/32 md5
    - Host replication replicator 192.168.122.77/0 md5
    - Host replication replicator 192.168.122.240/0 md5
    - Host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 192.168.122.77:5432
    connect_address: 192.168.122.77:5432
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: password
        superuser:
            username: postgres
            password: password
    parameters:
        unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

はい、両方のサーバーで

更新#1


オン patroni.yml変更がありますname: postgresql0-> name: postgresqp1

残りのAPIはホストIP「192.168.122.240」に設定されています

しかし、1つ

postgresql:
    listen: 192.168.122.77:5432
    connect_address: 192.168.122.77:5432

この問題が発生しました:

quanlm@DB2:~⟫ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-11-13 02:17:16 UTC; 25s ago
 Main PID: 32363 (patroni)
    Tasks: 6
   Memory: 45.7M
      CPU: 6.326s
   CGroup: /system.slice/patroni.service
           ├─ 1014 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --port=5432 --wal_level=hot_standby --max_wal_senders=10 --cluster
           └─32363 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml

Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: does not have lock
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: establishing a new patroni connection to the postgres cluster
Nov 13 02:17:40 DB2 patroni[32363]: LOG:  could not bind IPv4 socket: Cannot assign requested address
Nov 13 02:17:40 DB2 patroni[32363]: HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
Nov 13 02:17:40 DB2 patroni[32363]: WARNING:  could not create listen socket for "192.168.122.77"
Nov 13 02:17:40 DB2 patroni[32363]: FATAL:  could not create any TCP/IP sockets
Nov 13 02:17:40 DB2 patroni[32363]: 2019-11-13 02:17:40,042 INFO: demoting self because i do not have the lock and i was a leader

と私が変更した場合

postgresql:
    listen: 192.168.122.240:5432
    connect_address: 192.168.122.240:5432

これが起こりました:

quanlm@DB2:~⟫ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-11-13 02:18:57 UTC; 40s ago
 Main PID: 3785 (patroni)
    Tasks: 11
   Memory: 59.6M
      CPU: 770ms
   CGroup: /system.slice/patroni.service
           ├─3785 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
           ├─3818 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --max_replication_slots=10 --port=5432 --max_connections=100 --max_
           ├─3853 postgres: postgres: startup process   recovering 000000040000000000000006                                                                
           ├─3857 postgres: postgres: checkpointer process                                                                                                 
           ├─3858 postgres: postgres: writer process                                                                                                       
           ├─3859 postgres: postgres: stats collector process                                                                                              
           └─3916 postgres: postgres: postgres postgres 192.168.122.240(39576) idle                                                                        

Nov 13 02:19:19 DB2 patroni[3785]:         
Nov 13 02:19:24 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:24 DB2 patroni[3785]:         
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: does not have lock
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,966 INFO: no action.  i am a secondary and i am following a leader
Nov 13 02:19:29 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:29 DB2 patroni[3785]:         
Nov 13 02:19:34 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:34 DB2 patroni[3785]:   

更新#2


設定後patroni.yml 戻る name: postgresql0

postgresql:
    listen: 192.168.122.240:5432
    connect_address: 192.168.122.240:5432

サービスをリセットした後、両方のDBが稼働します... HAの目的でアクティブ-パッシブサーバーを設定するときは、そのように思われることはないと思います...そして、それらは互いに返信しませんでした

画像: https://raw.githubusercontent.com/lmq1999/Mytest/master/Mytest/Mytest/Screenshot%20from%202019-11-13%2009-26-19.png

2
Lê Minh Quân

最初のノード(patroni.ymlという名前)ではpostgresql0で問題ありませんが、2番目のノードでは(たとえば)postgresql1に名前を変更する必要があります。また、(そのチュートリアルの上部にリストされている)他のノードのIPを選択し、そのノードを使用してYAMLも更新します(restapipostgresqlの下に多数のオカレンスがあります) 。

疑わしい場合は、Patroniリポジトリのサンプルファイル( https://github.com/zalando/patroni/blob/master/postgres0.yml と他の2つ)には、常に機能する値が含まれています。

1
dezso