プライマリDBサーバーと接続マネージャー/ ICMが同時に終了し、ICMがオンラインに戻った場合、InformixHDRフェイルオーバーは機能しません

Question

自動切り替えは、プライマリ/セカンダリHDRサーバー設定で構成されます。

HDRクラスターが接続され、レプリケーションがオンになります。プライマリサーバー（NODE-A）でサービスが正常に停止するか、VMが終了すると、フェイルオーバーが発生し、セカンダリサーバー（NODE-B）がプライマリになります。

Connection Manager/ICMが稼働している限り、以下は問題なく機能します。

service informix stop

または

onmade -yuk

または単にプライマリDBノードをシャットダウン/終了します（NODE-A）

ただし、プライマリDBサーバー（NODE-A）と接続マネージャー/ ICMの両方が同時にダウンしたDRイベントの場合、接続マネージャーマシンが復旧した場合、セカンダリにフェイルオーバーしません。ノード（NODE-B）

Informixサーバー上でawwdst10aおよびawwdst10b次のレプリケーションパラメータがあります

DRAUTO 3 DRINTERVAL 0 DRTIMEOUT 30 HA_ALIAS HA_FOC_ORDER DRLOSTFOUND /logs/informix/t10/ DRIDXAUTO 1 LOG_INDEX_BUILDS 1 SDS_ENABLE 0 SDS_TIMEOUT 20 SDS_PAGING UPDATABLE_SECONDARY 0 FAILOVER_CALLBACK TEMPTAB_NOLOG 1

両方のInformixサーバーのsqlhostsファイルは次のとおりです。

cluster_1 group - - i=10 awwdst10a_t10 onsoctcp awwdst10a informix_t10 g=cluster_1 awwdst10b_t10 onsoctcp awwdst10b informix_t10 g=cluster_1 sla_cluster_1 onsoctcp localhost informix_t10

私が実行するsecondary awwdst10bで、レプリケーションがセットアップされ、機能しています。

onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC10 -- Read-Only (Sec) -- Up 02:42:00 -- 1556768 Kbytes Primary Server:awwdst10a_t10 Index page logging status: Enabled Index page logging was enabled at: 2018/06/06 09:47:35 Server ACKed Log Supports Status (log, page) Updates awwdst10b_t10 54494,1135 No ASYNC(HDR),Connected,On

primary awwdst10aで実行：

$ onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC10 -- On-Line (Prim) -- Up 02:37:16 -- 1556768 Kbytes Primary Server:awwdst10a_t10 Current Log Page:54494,1135 Index page logging status: Enabled Index page logging was enabled at: 2018/06/06 09:47:35 Server ACKed Log Applied Log Supports Status (log, page) (log, page) Updates awwdst10b_t10 54494,1135 54494,1135 No ASYNC(HDR),Connected,On

したがって、レプリケーションは機能しており、サーバーは同期されています。

3台目のマシンでawguht10次のパラメータを使用して接続マネージャ/ ICMをセットアップしました。

NAME whics-t10 DEBUG 1 LOG 1 LOGFILE /logs/informix/cmlog CM_TIMEOUT 60 EVENT_TIMEOUT 60 SECONDARY_EVENT_TIMEOUT 60 SQLHOSTS local CLUSTER cluster_1 { INFORMIXSERVER cluster_1 SLA sla_cluster_1 DBSERVERS=primary FOC ORDER=ENABLED \ PRIORITY=1 #CMALARMPROGRAM $INFORMIXDIR/etc/cmalarmprogram.sh }

そして、接続マネージャーのsqlhostsファイル：

cluster_1 group - - i=10 awwdst40a_t40 onsoctcp awwdst40a informix_t10 g=cluster_1 awwdst40b_t40 onsoctcp awwdst40b informix_t10 g=cluster_1 sla_cluster_whics onsoctcp localhost informix_t10

今、私は強制的にダウンさせますawwdst10a Informixサーバーのオンラインログで以下を取得しますawwdst10b：

Mon Oct 22 22:46:30 2018 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 56774656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 55443KB to 55444KB 22:46:30 Segment locked: addr=0x73877000, size=56774656 22:46:30 Successfully added a bufferpool of page size 2K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 107974656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 105443KB to 105444KB 22:46:30 Segment locked: addr=0x76e9c000, size=107974656 22:46:30 Successfully added a bufferpool of page size 4K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 210374656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 205443KB to 205444KB 22:46:30 Segment locked: addr=0x7d595000, size=210374656 22:46:30 Successfully added a bufferpool of page size 8K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 415174656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 405443KB to 405444KB 22:46:30 Segment locked: addr=0x89e36000, size=415174656 22:46:30 Successfully added a bufferpool of page size 16K. 22:46:30 Warning: ONCONFIG dump directory (DUMPDIR) '/dump/informix/t10' has insecure permissions 22:46:30 Event alarms enabled. ALARMPROG = '/usr/informix/etc/no_log.sh' 22:46:30 Booting Language <c> from module <> 22:46:30 Loading Module <CNULL> 22:46:30 Booting Language <builtin> from module <> 22:46:30 Loading Module <BUILTINNULL> 22:46:36 Affinitied VP 1 to phys proc 1 22:46:36 DR: DRAUTO is 3 (CMSM) 22:46:36 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 22:46:36 Event notification facility epoll enabled. 22:46:36 Trusted Host cache successfully built:/etc/hosts.equiv. 22:46:36 CCFLAGS2 value set to 0x200 22:46:36 SQL_FEAT_CTRL value set to 0x8008 22:46:36 SQL_DEF_CTRL value set to 0x4b0 22:46:36 IBM Informix Dynamic Server Version 12.10.FC10 Software Serial Number AAA#B000000 22:46:37 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 22:46:37 Results: Transaction performance might not be optimal. 22:46:37 Action: For better performance, increase the physical log buffer size to 128. 22:46:38 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 22:46:38 Started 1 B-tree scanners. 22:46:38 B-tree scanner threshold set at 5000. 22:46:38 B-tree scanner range scan size set to -1. 22:46:38 B-tree scanner ALICE mode set to 6. 22:46:38 B-tree scanner index compression level set to med. 22:46:38 Physical Recovery Started at Page (3:164234). 22:46:38 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 22:46:38 DR: Trying to connect to primary server = awwdst10a_t10 22:48:39 DR: Cannot connect to primary server 22:48:39 DR: Turned off on secondary server 22:48:39 Dataskip is now OFF for all dbspaces 22:48:39 Restartable Restore has been ENABLED 22:48:39 Recovery Mode 23:23:50 Booting Language <spl> from module <> 23:23:50 Loading Module <SPLNULL> 23:24:12 CM:Connection manager whics-t10 registered with the server

Connection Manager-投稿失敗

Mon Oct 22 23:23:44 2018 23:23:44 IBM Informix Connection Manager 23:23:44 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC10 23:23:44 Build Number: N046 23:23:44 Build Host: hans 23:23:44 Build OS: Linux 2.6.18-128.el5 23:23:44 Build Date: Mon Oct 23 07:06:25 CDT 2017 23:23:44 GLS Version: glslib-6.00.FC11 23:23:44 23:23:44 set CM_TIMEOUT 60 23:23:44 set global EVENT_TIMEOUT 60 23:23:44 set global SECONDARY_EVENT_TIMEOUT 60 23:23:44 DEBUG[TID139636001822496]:add type primary to SLA sla_cluster_1 [cmsm_sla.c:parse_sla:2412] 23:23:44 SLA sla_cluster_1 listener mode REDIRECT 23:23:44 Connection Manager name is whics-t10 23:23:44 Starting Connection Manager... 23:23:44 DEBUG[TID139636001822496]:Password File /opt/informix-12.10.fc10/etc/passwd_file failed error:No such file or directory[2] [cmsm_main.c:cmsm_pw_init:1782] 23:23:44 Warning: Password Manager failed; working in trusted node mode 23:23:44 the current maximum number of file descriptors is 32767 23:23:44 DEBUG[TID139636001822496]:new daemon pid is 3804 [cmsm_main.c:cmsm_daemonize:4653] 23:23:44 DEBUG[TID139636001822496]:dbservername = cluster_1 [cmsm_server.ec:cmsm_primary:2819] 23:23:44 DEBUG[TID139636001822496]:nettype = [cmsm_server.ec:cmsm_primary:2820] 23:23:44 DEBUG[TID139636001822496]:hostname = - [cmsm_server.ec:cmsm_primary:2821] 23:23:44 DEBUG[TID139636001822496]:servicename = - [cmsm_server.ec:cmsm_primary:2822] 23:23:44 DEBUG[TID139636001822496]:options = i=10 [cmsm_server.ec:cmsm_primary:2823] 23:23:44 DEBUG[TID139636001822496]:connect to cluster_1 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2839] 23:23:44 DEBUG[TID139636001822496]:connect to group cluster_1 server awwdst10a_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 23:23:44 DEBUG[TID139636001822496]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:23:44 DEBUG[TID139636001822496]:connect to group cluster_1 server awwdst10b_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 23:23:44 DEBUG[TID139636001822496]:create new thread -1997850880 for awwdst10b_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:23:44 listener sla_cluster_1 initializing 23:23:44 DEBUG[TID139635976775424]:listenter sla_cluster_1 Host=localhost service=informix_t10 nettype=soctcp engtypeon [cmsm_main.c:cmsm_listener_thread:1166] 23:23:49 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:49 ALARM 4001 unable to connect to Informix server awwdst10a_t10 from awguht10 error -908 23:23:49 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not primary [cmsm_arb.c:arb_server_down:1399] 23:23:50 DEBUG[TID139635978876672]:CONNECT to @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:50 DEBUG[TID139635978876672]:database sysmaster @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 23:23:50 Listener sla_cluster_1 DBSERVERS=primary is active with 4 worker threads 23:23:50 DEBUG[TID139635978876672]:awwdst10b_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 23:23:50 DEBUG[TID139635978876672]:skip awwdst10b_t10 alias awwdst10b_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] 23:23:50 DEBUG[TID139635978876672]:SQL fetch sysconfig_cursor sqlcode = (100,0,) [cmsm_server.ec:cmsm_add_dbaliases:2540] 23:23:52 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t102 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:52 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not primary [cmsm_arb.c:arb_server_down:1399] 23:24:12 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:12 DEBUG[TID139636001822496]:fcntl(/opt/informix-12.10.fc10/tmp/cmsm.pid.whics-t10) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193] 23:24:12 DEBUG[TID139635968210688]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037] **23:24:12 Connection Manager successfully connected to awwdst10b_t10** **23:24:12 DEBUG[TID139635966109440]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037]** **23:24:12 DEBUG[TID139635966109440]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522]** 23:24:12 Connection Manager started successfully 23:24:12 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:24:12 DEBUG[TID139635968210688]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522] 23:24:12 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:24:12 DEBUG[TID139635978876672]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:24:12 DEBUG[TID139635978876672]:cluster cluster_1 primary is not active [cmsm_arb.c:arb_set_primary:774] 23:24:12 DEBUG[TID139635978876672]:server awwdst10b_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] 23:24:12 DEBUG[TID139635978876672]:ACTIVE MONITOR awwdst10b_t10 awwdst10b_t10 172.29.26.122 53089 [cmsm_er.ec:cmsm_er_monitor:3013] 23:24:12 DEBUG[TID139635978876672]:register CM successfully whics-t10 with token 1838996547 [cmsm_server.ec:cmsm_cm_register:2330] 23:24:12 DEBUG[TID139635978876672]:Register server awwdst10b_t10 for awwdst10b_t10 count = 1 [cmsm_er.ec:cmsm_er_event_process:1280] 23:24:12 DEBUG[TID139635978876672]:get awwdst10b_t10 CLUST_CHG event 1:4[DISCONNECT] awwdst10a_t10|Primary| [cmsm_er.ec:cmsm_er_event_process:1733] 23:24:17 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t103 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:24:17 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:24:17 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:17 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:24:46 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:24:46 DEBUG[TID139635978876672]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:24:52 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t104 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:24:52 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:24:52 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:52 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:25:02 DEBUG[TID139635896350464]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037] 23:25:02 DEBUG[TID139635896350464]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522] 23:25:20 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:25:23 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:25:23 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:25:23 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133]

Connection Manager-故障前

 17:09:02 IBM Informix Connection Manager 17:09:02 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC10 ..... 17:09:02 set CM_TIMEOUT 60 17:09:02 set global EVENT_TIMEOUT 60 17:09:02 set global SECONDARY_EVENT_TIMEOUT 60 17:09:02 DEBUG[TID139992173627168]:add type primary to SLA sla_cluster_1 [cmsm_sla.c:parse_sla:2412] 17:09:02 SLA sla_cluster_1 listener mode REDIRECT 17:09:02 Connection Manager name is whics-t10 17:09:02 Starting Connection Manager... 17:09:02 DEBUG[TID139992173627168]:dbservername = cluster_1 [cmsm_server.ec:cmsm_primary:2819] 17:09:02 DEBUG[TID139992173627168]:nettype = [cmsm_server.ec:cmsm_primary:2820] 17:09:02 DEBUG[TID139992173627168]:hostname = - [cmsm_server.ec:cmsm_primary:2821] 17:09:02 DEBUG[TID139992173627168]:servicename = - [cmsm_server.ec:cmsm_primary:2822] 17:09:02 DEBUG[TID139992173627168]:options = i=10 [cmsm_server.ec:cmsm_primary:2823] 17:09:02 DEBUG[TID139992173627168]:connect to cluster_1 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2839] 17:09:02 DEBUG[TID139992173627168]:connect to group cluster_1 server awwdst10b_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 17:09:02 DEBUG[TID139992173627168]:create new thread 1986635520 for awwdst10b_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 17:09:02 listener sla_cluster_1 initializing 17:09:02 DEBUG[TID139992152782592]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 17:09:02 DEBUG[TID139992152782592]:database sysmaster @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 17:09:02 DEBUG[TID139992152782592]:awwdst10a_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 17:09:02 DEBUG[TID139992152782592]:skip awwdst10a_t10 alias awwdst10a_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] ....... setting primary name = awwdst10a_t10 [cmsm_arb.c:arb_set_primary:795] 17:09:02 Cluster cluster_1 Arbitrator FOC ORDER=ENABLED PRIORITY=1 17:09:02 Connection Manager successfully connected to awwdst10a_t10 17:09:02 The server type of cluster cluster_1 server awwdst10b_t10 is HDR. 17:09:02 DEBUG[TID139992150681344]:CONNECT to @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 17:09:02 DEBUG[TID139992152782592]:server awwdst10a_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] 17:09:02 DEBUG[TID139992152782592]:register CM successfully whics-t10 with token 216854841 [cmsm_server.ec:cmsm_cm_register:2330] 17:09:02 DEBUG[TID139992152782592]:Register server awwdst10a_t10 for awwdst10a_t10 count = 1 [cmsm_er.ec:cmsm_er_event_process:1280] 17:09:03 DEBUG[TID139992150681344]:database sysmaster @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 17:09:03 Listener sla_cluster_1 DBSERVERS=primary is active with 4 worker threads 17:09:03 DEBUG[TID139992150681344]:awwdst10b_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 17:09:03 DEBUG[TID139992150681344]:skip awwdst10b_t10 alias awwdst10b_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] 17:09:14 Connection Manager successfully connected to awwdst10b_t10 17:09:14 DEBUG[TID139992173627168]:fcntl(/opt/informix-12.10.fc10/tmp/cmsm.pid.whics-t10) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193] 17:09:14 Connection Manager started successfully 17:09:14 SLA sla_cluster_1 redirect SQLI client from ::1 to awwdst10a_t10 172.29.26.129.51001 17:09:14 DEBUG[TID139992150681344]:server awwdst10b_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] .......... 17:11:01 SLA sla_cluster_1 redirect SQLI client from ::1 to awwdst10a_t10 172.29.26.129.51001

awwdst10a-故障前

16:47:21 IBM Informix Dynamic Server Started. 16:47:21 Requested shared memory segment size rounded from 275020KB to 280576K Mon Oct 22 16:47:23 2018 16:47:23 Requested shared memory segment size rounded from 405443KB to 405504KB 16:47:23 Shared memory segment will use huge pages. 16:47:23 Segment locked: addr=0x8a600000, size=415236096 16:47:23 Successfully added a bufferpool of page size 16K. ...... 16:47:28 Affinitied VP 1 to phys proc 1 16:47:28 DR: DRAUTO is 3 (CMSM) 16:47:28 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 16:47:28 Event notification facility epoll enabled. 16:47:28 Trusted Host cache successfully built:/etc/hosts.equiv. 16:47:28 IBM Informix Dynamic Server Version 12.10.FC10 Software 16:47:29 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 16:47:29 Results: Transaction performance might not be optimal. 16:47:29 Action: For better performance, increase the physical log buffer size to 128. 16:47:30 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 16:47:30 Started 1 B-tree scanners. 16:47:30 B-tree scanner threshold set at 5000. 16:47:30 B-tree scanner range scan size set to -1. 16:47:30 B-tree scanner ALICE mode set to 6. 16:47:30 B-tree scanner index compression level set to med. 16:47:30 DR: Reservation of the last logical log for log backup turned on 16:47:30 DR: Trying to connect to secondary server = awwdst10b_t10 16:47:30 DR: Cannot connect to secondary server 16:47:30 DR: Turned off on primary server 16:47:30 Physical Recovery Started at Page (3:152094). 16:47:30 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 16:47:30 Logical Recovery Started. 16:47:30 10 recovery worker threads will be started. 16:47:30 Logical Recovery has reached the transaction cleanup phase. 16:47:30 Logical Recovery Complete. 0 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks 16:47:31 Dataskip is now OFF for all dbspaces 16:47:32 Checkpoint Completed: duration was 0 seconds. 16:47:32 Mon Oct 22 - loguniq 54478, logpos 0xe30c0, timestamp: 0xd41004bc Interval: 399667 16:47:39 Logical Log 54478 Complete, timestamp: 0xd41102e7. 16:47:41 Action: Increase the size of the individual logical log files so that it takes at least 30 seconds to fill each one. Look at the online log to determine how quickly the log files are filling, and then increase the size of the files proportionately. 16:47:41 Logical Log 54479 Complete, timestamp: 0xd411f36b. 16:47:44 DR: Primary server connected 16:47:44 DR: Secondary server needs failure recovery 16:47:46 DR_ERR set to -1 16:47:46 DR: Failure recovery error (14) 16:47:46 Logical Log 54480 Complete, timestamp: 0xd4131922. 16:47:47 DR: Turned off on primary server 16:47:47 DR: Cannot connect to secondary server 16:47:54 Logical Log 54483 Complete, timestamp: 0xd415bc16. 16:47:58 DR: Primary server connected 16:47:58 DR: Secondary server needs failure recovery 16:48:00 DR: Sending log 54478, size 2500 pages, 100.00 percent used 16:48:00 Logical Log 54484 Complete, timestamp: 0xd4172711. . 16:48:05 Logical Log 54485 Complete, timestamp: 0xd4180cc0. 16:48:07 DR: Sending log 54482, size 2500 pages, 100.00 percent used 16:48:20 Checkpoint Completed: duration was 1 seconds. 17:09:02 CM:Connection manager whics-t10 registered with the server 17:09:18 Logical Log 54478 - Backup Started 17:09:22 Logical Log 54488 - Backup Completed 17:09:22 Logical Log 54489 - Backup Started 17:33:32 Checkpoint Completed: duration was 0 seconds. 17:33:32 Mon Oct 22 - loguniq 54494, logpos 0x3aa018, timestamp: 0xd41e0c9d Interval: 399671 17:33:32 Maximum server connections 2 17:33:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 24, Llog used 9

awwdst10b-故障前

16:47:46 IBM Informix Dynamic Server Started. Mon Oct 22 16:47:47 2018 16:47:48 Requested shared memory segment size rounded from 405443KB to 405504KB 16:47:48 Shared memory segment will use huge pages. 16:47:48 Segment locked: addr=0x8a600000, size=415236096 16:47:48 Successfully added a bufferpool of page size 16K. 16:47:48 Warning: ONCONFIG dump directory (DUMPDIR) '/dump/informix/t10' has insecure permissions 16:47:48 Event alarms enabled. ALARMPROG = '/usr/informix/etc/no_log.sh' 16:47:48 Booting Language <c> from module <> 16:47:48 Loading Module <CNULL> 16:47:48 Booting Language <builtin> from module <> 16:47:48 Loading Module <BUILTINNULL> 16:47:53 Affinitied VP 1 to phys proc 1 16:47:53 DR: DRAUTO is 3 (CMSM) 16:47:53 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 16:47:53 Event notification facility epoll enabled. 16:47:53 Trusted Host cache successfully built:/etc/hosts.equiv. 16:47:53 CCFLAGS2 value set to 0x200 16:47:53 SQL_FEAT_CTRL value set to 0x8008 16:47:53 SQL_DEF_CTRL value set to 0x4b0 16:47:53 IBM Informix Dynamic Server Version 12.10.FC10 Software Serial Number AAA#B000000 16:47:54 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 16:47:54 Results: Transaction performance might not be optimal. 16:47:54 Action: For better performance, increase the physical log buffer size to 128. 16:47:54 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 16:47:54 Started 1 B-tree scanners. 16:47:54 B-tree scanner threshold set at 5000. 16:47:54 B-tree scanner range scan size set to -1. 16:47:54 B-tree scanner ALICE mode set to 6. 16:47:54 B-tree scanner index compression level set to med. 16:47:54 Physical Recovery Started at Page (3:151957). 16:47:54 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 16:47:54 DR: Trying to connect to primary server = awwdst10a_t10 16:47:54 smx creates 1 transports to server awwdst10a_t10 16:47:55 Dataskip is now OFF for all dbspaces 16:47:55 Restartable Restore has been ENABLED 16:47:55 Recovery Mode 16:47:57 DR: Secondary server connected 16:47:58 DR: Secondary server needs failure recovery 16:47:59 DR: Failure recovery from disk in progress ... 16:47:59 Logical Recovery Started. 16:47:59 10 recovery worker threads will be started. 16:47:59 Start Logical Recovery - Start Log 54478, End Log ? 16:47:59 Starting Log Position - 54478 0xdd018 16:48:00 Started processing open transactions on secondary during startup Server in fast recovery until these open transactions finish 16:48:00 Finished processing open transactions on secondary during startup. 16:48:00 Checkpoint Completed: duration was 0 seconds. 16:48:00 Mon Oct 22 - loguniq 54478, logpos 0xdf0c0, timestamp: 0xd410117e Interval: 399666 16:48:00 Checkpoint Completed: duration was 0 seconds. 16:48:00 Mon Oct 22 - loguniq 54478, logpos 0xe30c0, timestamp: 0xd41011cf Interval: 399668 16:48:00 Maximum server connections 0 16:48:00 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 2, Llog used 0 16:48:03 Logical Log 54478 Complete, timestamp: 0xd410f549. 16:48:17 Logical Log 54486 Complete, timestamp: 0xd418e8e4. 16:48:19 B-tree scanners disabled. 16:48:20 DR: HDR secondary server operational 16:48:20 Checkpoint Completed: duration was 0 seconds. 16:48:20 Mon Oct 22 - loguniq 54487, logpos 0xd018, timestamp: 0xd418e973 Interval: 399669 16:48:20 Maximum server connections 0 16:48:20 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 7761, Llog used 0 16:48:51 Logical Log 54492 Complete, timestamp: 0xd41d1f30. 16:56:04 Logical Log 54493 Complete, timestamp: 0xd41dafde. 17:03:31 Checkpoint Completed: duration was 0 seconds. 17:03:31 Mon Oct 22 - loguniq 54494, logpos 0x35f598, timestamp: 0xd41dff42 Interval: 399670 17:09:03 Booting Language <spl> from module <> 17:09:03 Loading Module <SPLNULL> 17:09:14 CM:Connection manager whics-t10 registered with the server 17:18:32 Checkpoint Completed: duration was 0 seconds. 17:18:32 Mon Oct 22 - loguniq 54494, logpos 0x3a1018, timestamp: 17:18:32 Maximum server connections 1 17:18:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 27, Llog used 0 17:33:32 Checkpoint Completed: duration was 0 seconds. 17:33:32 Mon Oct 22 - loguniq 54494, logpos 0x3aa018, timestamp: 0xd41e0c7e Interval: 399672

Misha · Answer

フェイルオーバーを実行するには、少なくとも1つの接続マネージャーが稼働している必要があります。これは、データベースの高可用性を確保するために、接続マネージャーの役割がノードの障害を検出し、自動的にフェイルオーバーを実行するためです。データベースクラスターの前に接続マネージャーが1つしかない場合、それをシャットダウンした後、自動フェイルオーバーは発生しないため、実行する必要があります。

onmode -d make primary force

プライマリサーバーに昇格させるためにセカンダリサーバーで。適切に構成された接続マネージャーが複数ある場合は、フェイルオーバーが保証されます。

onmode -d make primary force

プライマリサーバーに昇格させるためにセカンダリサーバーで。適切に構成された接続マネージャーが複数ある場合は、フェイルオーバーが保証されます。

Nagaraju Inturi · Answer

複数の接続マネージャーが存在する可能性があるため、プライマリサーバーは「アクティブアービトレーター」の役割を接続マネージャー（CM）に割り当てます。プライマリサーバーと「アクティブアービトレーター」CMが同時に強制終了された場合、再起動後、CMにはアクティブアービトレーターの役割が割り当てられていないため、CMはHDRをプライマリにプロモートできません。 Informixは、マルチポイント障害からの自動リカバリーをサポートしていません。 CMは、プライマリサーバーハードウェアとは異なるハードウェアで実行する必要があります。

Lu&#237;s Marques · Answer

ラボのセットアップを行いました。比較できるように、いくつかの設定とログを投稿します。

ifx_aのプライマリとしてInformixサーバーmachine1を、ifx_bのHDRセカンダリとしてInformixサーバーmachine2を使用しています。

Informixサーバーではifx_a次のレプリケーションパラメーターがあります。 HA_ALIASはDBSERVERNAMEと同じです。 Informixサーバーのifx_bパラメーターは、HA_ALIASとDBSERVERNAMEを除いて同じです。

DRAUTO 3 DRINTERVAL 0 HDR_TXN_SCOPE ASYNC DRTIMEOUT 30 HA_ALIAS ifx_a HA_FOC_ORDER SDS,HDR,RSS DRLOSTFOUND $INFORMIXDIR/etc/dr.lostfound DRIDXAUTO 0 LOG_INDEX_BUILDS 1 SDS_ENABLE SDS_TIMEOUT 20 SDS_TEMPDBS SDS_PAGING SDS_LOGCHECK 10 SDS_ALTERNATE NONE SDS_FLOW_CONTROL 0 UPDATABLE_SECONDARY 0 FAILOVER_CALLBACK FAILOVER_TX_TIMEOUT 0 TEMPTAB_NOLOG 1 DELAY_APPLY 0 STOP_APPLY 0 LOG_STAGING_DIR $INFORMIXDIR/tmp RSS_FLOW_CONTROL 0 SMX_NUMPIPES 1 ENABLE_SNAPSHOT_COPY 0 SMX_COMPRESS 0 SMX_PING_INTERVAL 10 SMX_PING_RETRY 6 CLUSTER_TXN_SCOPE SERVER

両方のInformixサーバーのsqlhostsファイルは次のとおりです。

ifx_a onsoctcp machine1.local 15010 k=1 ifx_b onsoctcp machine2.local 15020 k=1

レプリケーションがセットアップされ、動作しています。セカンダリifx_bで実行します。

$ onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC12 -- Read-Only (Sec) -- Up 00:05:08 -- 156276 Kbytes Primary Server:ifx_a Index page logging status: Enabled Index page logging was enabled at: 2018/07/31 23:44:14 Server ACKed Log Supports Status (log, page) Updates ifx_b 16,6 No ASYNC(HDR),Connected,On $ onstat -g dri IBM Informix Dynamic Server Version 12.10.FC12 -- Read-Only (Sec) -- Up 00:12:23 -- 156276 Kbytes Data Replication at 0x45ac6028: Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes HDR Secondary on ifx_a 16 / 5 N DRINTERVAL 0 DRTIMEOUT 30 DRAUTO 3 DRLOSTFOUND /opt/IBM/informix/12.10/etc/dr.lostfound DRIDXAUTO 0 ENCRYPT_HDR 0 Backlog 0 Last Send 2018/08/01 00:12:12 Last Receive 2018/08/01 00:12:12 Last Ping 2018/08/01 00:12:03 Last log page applied(log id,page): 0,0

プライマリifx_aで実行します：

$ onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC12 -- On-Line (Prim) -- Up 00:21:30 -- 156276 Kbytes Primary Server:ifx_a Current Log Page:16,6 Index page logging status: Enabled Index page logging was enabled at: 2018/07/31 23:44:14 Server ACKed Log Applied Log Supports Status (log, page) (log, page) Updates ifx_b 16,6 16,6 No ASYNC(HDR),Connected,On $ onstat -g dri IBM Informix Dynamic Server Version 12.10.FC12 -- On-Line (Prim) -- Up 00:27:49 -- 156276 Kbytes Data Replication at 0x45ac6028: Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes primary on ifx_b 16 / 5 NA DRINTERVAL 0 DRTIMEOUT 30 DRAUTO 3 DRLOSTFOUND /opt/IBM/informix/12.10/etc/dr.lostfound DRIDXAUTO 0 ENCRYPT_HDR 0 Backlog 0 Last Send 2018/08/01 00:11:53 Last Receive 2018/08/01 00:11:53 Last Ping 2018/08/01 00:11:32 Last log page applied(log id,page): 16,6

したがって、レプリケーションは機能しており、サーバーは同期されています。

3番目のマシンでmachine3次のパラメーターを使用して接続マネージャーをセットアップします。

NAME icm_1 LOG 1 LOGFILE ${INFORMIXDIR}/tmp/icm_1.log DEBUG 1 CM_TIMEOUT 60 EVENT_TIMEOUT 60 SECONDARY_EVENT_TIMEOUT 60 SQLHOSTS LOCAL CLUSTER ifx_abc { INFORMIXSERVER servers_abc SLA sla_ifx_abc DBSERVERS=PRI FOC ORDER=ENABLED \ PRIORITY=1 }

そして、接続マネージャーのsqlhostsファイル：

servers_abc group - - i=10,c=1,e=ifx_b ifx_a onsoctcp machine1.local 15010 k=1,g=servers_abc ifx_b onsoctcp machine2.local 15020 k=1,g=servers_abc sla_ifx_abc onsoctcp machine3.local 15030 k=1

machine1を強制的に停止したところ、Informixサーバーifx_bのオンラインログで次のようになりました。

08/02/18 01:00:26 Maximum server connections 1 08/02/18 01:00:26 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 38, Llog used 0 08/02/18 01:06:39 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (60 seconds times the number of retries). 08/02/18 01:06:39 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (60 seconds times the number of retries). 08/02/18 01:07:28 DR: ping timeout 08/02/18 01:07:28 DR: Receive error 08/02/18 01:07:28 dr_secrcv thread : asfcode = -25582: oserr = 4: errstr = : Network connection is broken. System error = 4. 08/02/18 01:07:28 DR_ERR set to -1 08/02/18 01:07:28 DR: Receive Btree error 08/02/18 01:07:29 DR: Turned off on secondary server 08/02/18 01:07:36 SCHAPI: Issued Task() or Admin() command "task( 'ha make primary force', 'ifx_b' )". 08/02/18 01:07:37 Skipping failover callback. 08/02/18 01:07:48 Logical Recovery has reached the transaction cleanup phase. 08/02/18 01:07:48 Checkpoint Completed: duration was 0 seconds. 08/02/18 01:07:48 Thu Aug 2 - loguniq 19, logpos 0xcb4018, timestamp: 0xf287b Interval: 148 08/02/18 01:07:48 Maximum server connections 2 08/02/18 01:07:48 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 72, Llog used 1 08/02/18 01:07:48 Logical Recovery Complete. 9001 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks 08/02/18 01:07:48 Logical Recovery Complete. 08/02/18 01:07:48 Performance Advisory: Based on the current workload, the physical log might be too small to accommodate the time it takes to flush the buffer pool. 08/02/18 01:07:48 Results: The server might block transactions during checkpoints. 08/02/18 01:07:48 Action: If transactions are blocked during the checkpoint, increase the size of the physical log to at least 716800 KB. 08/02/18 01:07:48 Performance Advisory: Based on the current workload, the logical log space might be too small to accommodate the time it takes to flush the buffer pool. 08/02/18 01:07:48 Results: The server might block transactions during checkpoints. 08/02/18 01:07:48 Action: If transactions are blocked during the checkpoint, increase the size of the logical log space to at least 89600 KB. 08/02/18 01:07:48 Performance Advisory: The physical log is too small for automatic checkpoints. 08/02/18 01:07:48 Results: Automatic checkpoints are disabled. 08/02/18 01:07:48 Action: To enable automatic checkpoints, increase the physical log to at least 716800 KB. 08/02/18 01:07:48 Quiescent Mode 08/02/18 01:07:48 Checkpoint Completed: duration was 0 seconds. 08/02/18 01:07:48 Thu Aug 2 - loguniq 19, logpos 0xcb6018, timestamp: 0xf288d Interval: 149 08/02/18 01:07:48 Maximum server connections 2 08/02/18 01:07:48 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 4, Llog used 2 08/02/18 01:07:48 B-tree scanners enabled. 08/02/18 01:07:48 DR: Reservation of the last logical log for log backup turned on 08/02/18 01:07:48 DR: new type = primary, secondary server name = ifx_a 08/02/18 01:07:48 DR: Trying to connect to secondary server = ifx_a 08/02/18 01:07:48 Starting BldNotification 08/02/18 01:07:49 On-Line Mode 08/02/18 01:07:50 SCHAPI: Started dbScheduler thread. 08/02/18 01:07:50 Auto Registration is synced 08/02/18 01:07:50 SCHAPI: Started 2 dbWorker threads. 08/02/18 01:07:51 DR: Cannot connect to secondary server 08/02/18 01:07:51 DR: Turned off on primary server 08/02/18 01:07:51 DDR Log staging: Using the directory /opt/IBM/informix/12.10/tmp/ifmxddrlog_2. 08/02/18 01:07:51 SCHAPI: dbutil threads is already running. 08/02/18 01:07:51 SCHAPI: dbScheduler threads is already running. 08/02/18 01:07:52 Defragmenter cleaner thread now running 08/02/18 01:07:52 Defragmenter cleaner thread cleaned:0 partitions 08/02/18 01:08:12 DR: Cannot connect to secondary server 08/02/18 01:08:12 DR: Turned off on primary server 08/02/18 01:09:14 DR: Cannot connect to secondary server 08/02/18 01:09:14 DR: Turned off on primary server 08/02/18 01:10:14 DR: Cannot connect to secondary server 08/02/18 01:10:14 DR: Turned off on primary server 08/02/18 01:11:14 DR: Cannot connect to secondary server 08/02/18 01:11:14 DR: Turned off on primary server 08/02/18 01:12:14 DR: Cannot connect to secondary server 08/02/18 01:12:14 DR: Turned off on primary server 08/02/18 01:12:53 Checkpoint Completed: duration was 0 seconds. 08/02/18 01:12:53 Thu Aug 2 - loguniq 19, logpos 0xcf0704, timestamp: 0xf3ab0 Interval: 150 08/02/18 01:12:53 Maximum server connections 2 08/02/18 01:12:53 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 39, Llog used 58 08/02/18 01:13:14 DR: Cannot connect to secondary server 08/02/18 01:13:14 DR: Turned off on primary server

そして、接続マネージャーのログで次のようになります。

Thu Aug 2 00:57:42 2018 00:57:42 IBM Informix Connection Manager 00:57:42 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC12 00:57:42 Build Number: N037 00:57:42 Build Host: hans 00:57:42 Build OS: Linux 2.6.18-128.el5 00:57:42 Build Date: Tue Jun 26 08:55:45 CDT 2018 00:57:42 GLS Version: glslib-6.00.FC13 00:57:42 00:57:42 set CM_TIMEOUT 60 00:57:42 set global EVENT_TIMEOUT 60 00:57:42 set global SECONDARY_EVENT_TIMEOUT 60 00:57:42 DEBUG[TID139959615362880]:add type PRI to SLA sla_ifx_abc [cmsm_sla.c:parse_sla:2412] 00:57:42 SLA sla_ifx_abc listener mode REDIRECT 00:57:42 Connection Manager name is icm_1 00:57:42 Starting Connection Manager... 00:57:42 DEBUG[TID139959615362880]:Password File /opt/IBM/informix/4.10/etc/passwd_file failed error:No such file or directory[2] [cmsm_main.c:cmsm_pw_init:1782] 00:57:42 Warning: Password Manager failed; working in trusted node mode 00:57:42 set the maximum number of file descriptors 32767 failed 00:57:42 DEBUG[TID139959615362880]:setrlimit(RLIMIT_NOFILE, 32767) = -1, error:Operation not permitted[1] [cmsm_main.c:cmsm_run:4901] 00:57:42 the current maximum number of file descriptors is 1024 00:57:42 DEBUG[TID139959615362880]:new daemon pid is 11761 [cmsm_main.c:cmsm_daemonize:4653] 00:57:42 DEBUG[TID139959615362880]:dbservername = servers_abc [cmsm_server.ec:cmsm_primary:2825] 00:57:42 DEBUG[TID139959615362880]:nettype = [cmsm_server.ec:cmsm_primary:2826] 00:57:42 DEBUG[TID139959615362880]:hostname = - [cmsm_server.ec:cmsm_primary:2827] 00:57:42 DEBUG[TID139959615362880]:servicename = - [cmsm_server.ec:cmsm_primary:2828] 00:57:42 DEBUG[TID139959615362880]:options = i=10,c=1,e=ifx_b [cmsm_server.ec:cmsm_primary:2829] 00:57:42 DEBUG[TID139959615362880]:connect to servers_abc for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2845] 00:57:42 DEBUG[TID139959615362880]:connect to group servers_abc server ifx_a for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2866] 00:57:42 DEBUG[TID139959615362880]:create new thread -509528320 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996] 00:57:42 DEBUG[TID139959615362880]:connect to group servers_abc server ifx_b for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2866] 00:57:42 DEBUG[TID139959615362880]:create new thread -511629568 for ifx_b [cmsm_sla.c:cmsm_update_server_ex:996] 00:57:42 listener sla_ifx_abc initializing 00:57:42 DEBUG[TID139959585543936]:listenter sla_ifx_abc Host=machine3.local service=15030 nettype=soctcp engtypeon [cmsm_main.c:cmsm_listener_thread:1166] 00:57:42 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 00:57:42 DEBUG[TID139959589746432]:database sysmaster @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 00:57:42 DEBUG[TID139959589746432]:ifx_a protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2433] 00:57:42 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator setting primary name = ifx_a [cmsm_arb.c:arb_set_primary:793] 00:57:42 Cluster ifx_abc Arbitrator FOC ORDER=ENABLED PRIORITY=1 00:57:42 Connection Manager successfully connected to ifx_a 00:57:42 DEBUG[TID139959587645184]:CONNECT to @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 00:57:42 DEBUG[TID139959587645184]:database sysmaster @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 00:57:42 Listener sla_ifx_abc DBSERVERS=PRI is active with 4 worker threads 00:57:42 DEBUG[TID139959587645184]:ifx_b protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2433] 00:57:42 Connection Manager successfully connected to ifx_b 00:57:42 DEBUG[TID139959589746432]:server ifx_a version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2994] 00:57:42 DEBUG[TID139959589746432]:ACTIVE MONITOR ifx_a ifx_a 192.168.56.101 49464 [cmsm_er.ec:cmsm_er_monitor:3036] 00:57:42 DEBUG[TID139959589746432]:register CM successfully icm_1 with token 1782701540 [cmsm_server.ec:cmsm_cm_register:2336] 00:57:42 DEBUG[TID139959587645184]:server ifx_b version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2994] 00:57:42 DEBUG[TID139959587645184]:ACTIVE MONITOR ifx_b ifx_b 192.168.56.101 44376 [cmsm_er.ec:cmsm_er_monitor:3036] 00:57:42 DEBUG[TID139959589746432]:Register server ifx_a for ifx_a count = 1 [cmsm_er.ec:cmsm_er_event_process:1281] 00:57:42 DEBUG[TID139959587645184]:register CM successfully icm_1 with token 1120237392 [cmsm_server.ec:cmsm_cm_register:2336] 00:57:42 DEBUG[TID139959587645184]:Register server ifx_b for ifx_b count = 1 [cmsm_er.ec:cmsm_er_event_process:1281] 00:57:43 DEBUG[TID139959615362880]:fcntl(/opt/IBM/informix/4.10/tmp/cmsm.pid.icm_1) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193] 00:57:43 Connection Manager started successfully 00:57:43 DEBUG[TID139959589746432]:SQL get ifx_a event SRV_ADM 16:5 SDS,HDR,RSS [cmsm_er.ec:cmsm_er_event_process:1883] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584] 00:57:43 CM icm_1 arbitrator for ifx_abc is active 00:57:43 Cluster ifx_abc Arbitrator FOC ORDER=SDS,HDR,RSS PRIORITY=1 TIMEOUT=1 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655] 00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584] 01:01:25 DEBUG[TID139959583442688]:SLA sla_ifx_abc 3 ifx_a time=1 latency= 0.00 readyQ=19 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947] 01:01:25 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010 01:01:39 DEBUG[TID139959581341440]:SLA sla_ifx_abc 3 ifx_a time=0 latency= 0.00 readyQ=18 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947] 01:01:39 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010 01:01:39 DEBUG[TID139959579240192]:SLA sla_ifx_abc 3 ifx_a time=0 latency= 0.00 readyQ=18 session=0 adjustSession=1 [cmsm_sla.c:select_server_from_sla:2947] 01:01:39 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010 01:02:09 DEBUG[TID139959577138944]:SLA sla_ifx_abc 3 ifx_a time=6 latency= 0.00 readyQ=16 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947] 01:02:09 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010 01:06:16 Server ifx_a no response over EVENT_TIMEOUT 60 seconds 01:06:16 Connection Manager disconnected from ifx_a 01:06:16 DEBUG[TID139959589746432]:start failover in sqlbreakcallback for ifx_a [cmsm_server.ec:cmsm_event_callback_common:1274] 01:06:16 DEBUG[TID139959575037696]:starting the failover process [cmsm_server.ec:cmsm_failover_incallback:1184] 01:06:17 Cluster ifx_abc Arbitrator detected primary server is down. 01:06:17 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:06:17 Cluster ifx_abc Arbitrator will not perform failover. 01:06:17 DEBUG[TID139959575037696]:failover process returns -1 [cmsm_server.ec:cmsm_failover_incallback:1207] 01:06:39 DEBUG[TID139959587645184]:create new thread -524237056 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996] 01:06:39 DEBUG[TID139959587645184]:create new thread -526338304 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996] 01:06:39 DEBUG[TID139959575037696]:Exit server ifx_a monitor machine3.local thread status=8 [cmsm_er.ec:cmsm_er_monitor:2608] 01:07:19 DEBUG[TID139959589746432]:fetch sysrepstats_cursor SQLCODE = (-213,0,) [cmsm_er.ec:cmsm_er_event_process:1711] 01:07:19 DEBUG[TID139959589746432]:unregister event failed, sqlcode = (-1811,0,) [cmsm_server.ec:cmsm_cm_unregister:2383] 01:07:19 DEBUG[TID139959589746432]:Unregister server ifx_a for ifx_a count = 0 [cmsm_er.ec:cmsm_er_event_process:2092] 01:07:19 Connection Manager disconnected from ifx_a 01:07:19 ALARM 3002 detected lost connection to Informix server ifx_a from machine3.local 01:07:19 DEBUG[TID139959589746432]:Connection Manager disconnected from ifx_a sqlcode=-1803 [cmsm_er.ec:cmsm_er_monitor:3090] 01:07:19 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:19 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:19 Cluster ifx_abc Arbitrator will not perform failover. 01:07:19 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:19 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:19 Cluster ifx_abc Arbitrator will not perform failover. 01:07:20 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:20 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:20 Cluster ifx_abc Arbitrator will not perform failover. 01:07:21 DEBUG[TID139959572936448]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a2 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 01:07:21 ALARM 3001 unable to connect to Informix server ifx_a from machine3.local error -908 01:07:21 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:21 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:21 Cluster ifx_abc Arbitrator will not perform failover. 01:07:22 DEBUG[TID139959572936448]:Exit server ifx_a monitor machine3.local thread status=8 [cmsm_er.ec:cmsm_er_monitor:2608] 01:07:24 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a3 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 01:07:24 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:24 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:24 Cluster ifx_abc Arbitrator will not perform failover. 01:07:27 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a4 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 01:07:27 Cluster ifx_abc Arbitrator detected primary server is down. 01:07:27 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected. 01:07:27 Cluster ifx_abc Arbitrator will not perform failover. 01:07:30 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:07:30 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a5 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator start failover logic now... [cmsm_arb.c:arb_server_down:1564] 01:07:30 ALARM 2001 failover arbitrator automated failover in progress. 01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator try failover on generic SDS [cmsm_arb.c:arb_failover_by_foc:1174] 01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator try failover on generic HDR [cmsm_arb.c:arb_failover_by_foc:1200] 01:07:30 DEBUG[TID139959589746432]:CLuster ifx_abc Arbitrator failover to ifx_b, waiting 150 seconds for confirmation [cmsm_arb.c:arb_post_wait:1764] 01:07:30 DEBUG[TID139959589746432]:Arbitrator found failover node = ifx_b [cmsm_server_arb.ec:arb_failover_to:303] 01:07:37 DEBUG[TID139959589746432]:CONNECT to @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b2 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 01:07:37 DEBUG[TID139959589746432]:database sysmaster @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b2 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 01:07:37 DEBUG[TID139959589746432]:Arbitrator connected to failover node = ifx_b [cmsm_server_arb.ec:arb_failover_to:316] 01:07:37 DEBUG[TID139959589746432]:Arbitrator get HA_ALIAS on ifx_b, HA_ALIAS [ifx_b] [cmsm_server_arb.ec:arb_failover_to:326] 01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:3 1 [cmsm_er.ec:cmsm_er_event_process:1883] 01:07:50 Server ifx_b is in quiescent mode. 01:07:50 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:3 5 [cmsm_er.ec:cmsm_er_event_process:1883] 01:07:50 The server type of cluster ifx_abc server ifx_b is Primary. 01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator setting primary name = ifx_b [cmsm_arb.c:arb_set_primary:793] 01:07:50 Cluster ifx_abc Arbitrator FOC ORDER=SDS,HDR,RSS PRIORITY=1 TIMEOUT=1 01:07:50 Server ifx_b is in on-line mode. 01:07:50 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:5[NEWPRI] ifx_b [cmsm_er.ec:cmsm_er_event_process:1734] 01:07:50 Arbitrator make primary on node = ifx_b successful 01:07:50 ALARM 2002 failover arbitrator automated failover completed 01:07:50 DEBUG[TID139959589746432]:Monitor ifx_a exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2789] 01:07:50 DEBUG[TID139959589746432]:all monitors exited, server ifx_a register count = 0 [cmsm_er.ec:cmsm_er_monitor:3156] 01:07:50 DEBUG[TID139959589746432]:enqueue a cleanup task for ifx_a [cmsm_er.ec:cmsm_enqueu_cleanup:2378] 01:07:50 DEBUG[TID139959587645184]:server monitor ifx_b dequeue a cleanup task [cmsm_er.ec:cmsm_er_event_process:2060] 01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:5 SDS,HDR,RSS [cmsm_er.ec:cmsm_er_event_process:1883] 01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655] 01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584] 01:07:57 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:08:13 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:09:19 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:10:17 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:11:22 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:12:20 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734] 01:13:17 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]

セカンダリのプロモーション中にAF（アサーションエラー）が発生したようですので、違いがあります。

ログをもっと投稿してもらえますか？プライマリをプルダウンする直前からセカンダリがリカバリでスタックするまで。