PostgreSQL repmgr 高可用之故障转移

news/2025/9/18 19:31:36/文章来源:https://www.cnblogs.com/wy123/p/19098589

PostgreSQL repmgr 高可用之故障转移

PostgreSQL高可用之repmgr自动切换

之前写过一个repmgr的高可用搭建的,https://www.cnblogs.com/wy123/p/18531710,repmgr的搭建过程还是比较简单的,具体过程不再赘述。这里为了简化,做了1主2从的结构,之前一直没空测试repmgr的手动和自动故障转移,抽空找了个环境,做了个repmgr的故障转移测试。

环境:

ubuntu05:192.168.152.111(postgre服务为postgresql9000,repmgr服务为repmgr9000)
ubuntu06:192.168.152.111(postgre服务为postgresql9000,repmgr服务为repmgr9000)
ubuntu07:192.168.152.111(postgre服务为postgresql9000,repmgr服务为repmgr9000)

1,ubuntu05,ubuntu06,ubuntu07是一个repmgr集群,ubuntu05为主节点,其他两个为从节点
2,强制关闭ubuntu05上的PostgreSQL服务
3,repmgr完整自动故障转移,自动提升ubuntu06为这点

 

repmgr配置

repmgr的配置文件repmgr.conf

node_id=2
node_name='ubuntu06'
conninfo='host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=100'
data_directory='/usr/local/pgsql16/pg9000/data'
pg_bindir='/usr/local/pgsql16/server/bin'
priority=80#自动故障转移配置
failover=automatic
promote_command='/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql16/server/bin/repmgr standby follow -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/usr/local/pgsql16/repmgr/repmgr.log'#要启用 repmgrd 守护进程和监控,需在 repmgr.conf中启用 moitoring_history=yes
monitoring_history=true
#默认监控时间间隔为2秒
monitor_interval_secs=5
#故障转移之前,尝试重新连接主库次数(默认为6)参数
reconnect_attempts=12
#每间隔5s尝试重新连接一次参数
reconnect_interval=5

repmgrd的systemd服务启动脚本,设置repmgrd自动启动

[Unit]
Description=PostgreSQL Replication Manager Daemon
After=network.target postgresql9000.service
Requires=postgresql9000.service[Service]
Type=forking
User=postgres
Group=postgres
ExecStart=/usr/local/pgsql16/server/bin/repmgrd -f /usr/local/pgsql16/repmgr/repmgr.conf --pid-file /usr/local/pgsql16/repmgr/repmgrd.pid
ExecStop=/bin/kill -QUIT $MAINPID
PIDFile=/usr/local/pgsql16/repmgr/repmgrd.pid
Restart=always
RestartSec=5# 环境变量(如果需要)
Environment=PATH=/usr/local/pgsql16/server/bin:/usr/local/bin:/usr/bin:/bin[Install]
WantedBy=multi-user.target

 

手动切换主从

repmgr的前置条件是需要节点之间ssh互信,1,手动故障转移,哪个从节点需要提升为主节点,就在哪个节点上执行:/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby switchover --siblings-follow--siblings-follow  表示所有从库的同步源自动改成最新的主库节点switchover的内部流程如下:1.关闭当前的主库 ubuntu062.等待老主库彻底关闭后,在 ubuntu05 上进行 pg_promote()3.重启启动老主库 ubuntu06, 降级成 standby 数据库, 指向复制源 ubuntu054.sibling nodes兄弟节点同样进行了复制源重定向,指向 ubuntu055.整个switchover 过程结束在当前节点Ubuntu04查看集群状态repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showpostgres@ubuntu05:~$ repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+----------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | ubuntu05 | standby |   running | ubuntu06 | default  | 80       | 2        | host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=1002  | ubuntu06 | primary | * running |          | default  | 80       | 2        | host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=1003  | ubuntu07 | standby |   running | ubuntu06 | default  | 60       | 2        | host=192.168.152.113 user=repmgr dbname=repmgr port=9000 connect_timeout=100postgres@ubuntu05:~$postgres@ubuntu05:~$postgres@ubuntu05:~$执行switchoverpostgres@ubuntu05:~$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby switchover --siblings-followNOTICE: executing switchover on node "ubuntu05" (ID: 1)NOTICE: attempting to pause repmgrd on 3 nodesNOTICE: local node "ubuntu05" (ID: 1) will be promoted to primary; current primary "ubuntu06" (ID: 2) will be demoted to standbyNOTICE: stopping current primary node "ubuntu06" (ID: 2)NOTICE: issuing CHECKPOINT on node "ubuntu06" (ID: 2)DETAIL: executing server command "/usr/local/pgsql16/server/bin/pg_ctl  -D '/usr/local/pgsql16/pg9000/data' -W -m fast stop"INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")INFO: checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")INFO: checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")INFO: checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")NOTICE: current primary has been cleanly shut down at location 0/18000028NOTICE: promoting standby to primaryDETAIL: promoting server "ubuntu05" (ID: 1) using pg_promote()NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to completeNOTICE: STANDBY PROMOTE successfulDETAIL: server "ubuntu05" (ID: 1) was successfully promoted to primaryNOTICE: node "ubuntu05" (ID: 1) promoted to primary, node "ubuntu06" (ID: 2) demoted to standbyNOTICE: executing STANDBY FOLLOW on 1 of 1 siblingsINFO: STANDBY FOLLOW successfully executed on all reachable sibling nodesNOTICE: switchover was successfulDETAIL: node "ubuntu05" is now primary and node "ubuntu06" is attached as standbyNOTICE: STANDBY SWITCHOVER has completed successfullypostgres@ubuntu05:~$postgres@ubuntu05:~$postgres@ubuntu05:~$postgres@ubuntu05:~$ repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+----------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | ubuntu05 | primary | * running |          | default  | 80       | 3        | host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=1002  | ubuntu06 | standby |   running | ubuntu05 | default  | 80       | 2        | host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=1003  | ubuntu07 | standby |   running | ubuntu05 | default  | 60       | 2        | host=192.168.152.113 user=repmgr dbname=repmgr port=9000 connect_timeout=100postgres@ubuntu05:~$

 

手动故障转移

1,kill或者停止主节点服务来模拟主节点故障systemctl stop postgresql90002,从节点上查看集群状态,此时原始主节点已不可达postgres@ubuntu06:~$ repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showID | Name     | Role    | Status        | Upstream   | Location | Priority | Timeline | Connection string----+----------+---------+---------------+------------+----------+----------+----------+------------------------------------------------------------------------------1  | ubuntu05 | primary | ? unreachable | ?          | default  | 80       |          | host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=1002  | ubuntu06 | standby |   running     | ? ubuntu05 | default  | 80       | 3        | host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=1003  | ubuntu07 | standby |   running     | ? ubuntu05 | default  | 60       | 3        | host=192.168.152.113 user=repmgr dbname=repmgr port=9000 connect_timeout=100WARNING: following issues were detected- unable to connect to node "ubuntu05" (ID: 1)- node "ubuntu05" (ID: 1) is registered as an active primary but is unreachable- unable to connect to node "ubuntu06" (ID: 2)'s upstream node "ubuntu05" (ID: 1)- unable to determine if node "ubuntu06" (ID: 2) is attached to its upstream node "ubuntu05" (ID: 1)- unable to connect to node "ubuntu07" (ID: 3)'s upstream node "ubuntu05" (ID: 1)- unable to determine if node "ubuntu07" (ID: 3) is attached to its upstream node "ubuntu05" (ID: 1)HINT: execute with --verbose option to see connection error messagespostgres@ubuntu06:~$3,手动 promote 把 ubuntu06 提升为主库/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby promote --siblings-follow检查集群状态,此时Ubuntu06已经成为主节点,原主库 pg02 被标记为 failed 的状态postgres@ubuntu06:~$postgres@ubuntu06:~$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby promote --siblings-followNOTICE: promoting standby to primaryDETAIL: promoting server "ubuntu06" (ID: 2) using pg_promote()NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to completeNOTICE: STANDBY PROMOTE successfulDETAIL: server "ubuntu06" (ID: 2) was successfully promoted to primaryNOTICE: executing STANDBY FOLLOW on 1 of 1 siblingsINFO: STANDBY FOLLOW successfully executed on all reachable sibling nodespostgres@ubuntu06:~$postgres@ubuntu06:~$###检查集群状态,此时Ubuntu06已经成为主节点postgres@ubuntu06:~$ repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showID | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string----+----------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | ubuntu05 | primary | - failed  | ?        | default  | 80       |          | host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=1002  | ubuntu06 | primary | * running |          | default  | 80       | 4        | host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=1003  | ubuntu07 | standby |   running | ubuntu06 | default  | 60       | 3        | host=192.168.152.113 user=repmgr dbname=repmgr port=9000 connect_timeout=100WARNING: following issues were detected- unable to connect to node "ubuntu05" (ID: 1)HINT: execute with --verbose option to see connection error messagespostgres@ubuntu06:~$4,老主库重新加入集群4.1 启动老主库root@ubuntu05:~# systemctl start postgresql9000root@ubuntu05:~#root@ubuntu05:~# su - postgrespostgres@ubuntu05:~$postgres@ubuntu05:~$postgres@ubuntu05:~$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf cluster showID | Name     | Role    | Status               | Upstream   | Location | Priority | Timeline | Connection string----+----------+---------+----------------------+------------+----------+----------+----------+------------------------------------------------------------------------------1  | ubuntu05 | primary | * running            |            | default  | 80       | 3        | host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=1002  | ubuntu06 | standby | ! running as primary |            | default  | 80       | 4        | host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=1003  | ubuntu07 | standby |   running            | ! ubuntu06 | default  | 60       | 3        | host=192.168.152.113 user=repmgr dbname=repmgr port=9000 connect_timeout=100WARNING: following issues were detected- node "ubuntu06" (ID: 2) is registered as standby but running as primary- node "ubuntu07" (ID: 3) reports a different upstream (reported: "ubuntu06", expected "ubuntu05")postgres@ubuntu05:~$4.2 执行pg_rewind/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf node rejoin -d 'host=ubuntu06 dbname=repmgr user=repmgr password=****** port=9000' --force-rewind --dry-runpostgres@ubuntu05:~$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf node rejoin -d 'host=ubuntu06 dbname=repmgr user=repmgr password=****** port=9000' --force-rewind --dry-runNOTICE: rejoin target is node "ubuntu06" (ID: 2)INFO: replication connection to the rejoin target node was successfulINFO: local and rejoin target system identifiers matchDETAIL: system identifier is 7550951818891860956NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2DETAIL: rejoin target server s timeline 4 forked off current database system timeline 3 before current recovery point 0/1B000028INFO: prerequisites for using pg_rewind are metINFO: pg_rewind would now be executedDETAIL: pg_rewind command is:/usr/local/pgsql16/server/bin/pg_rewind -D '/usr/local/pgsql16/pg9000/data' --source-server='host=192.168.152.112 user=repmgr dbname=repmgr port=9000 connect_timeout=100'INFO: prerequisites for executing NODE REJOIN are metpostgres@ubuntu05:~$postgres@ubuntu05:~$或者简单粗暴,直接删除本地的数据,重新克隆克隆数据库/usr/local/pgsql16/server/bin/repmgr -h 192.168.152.112 -p 9000 -U repmgr -d repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby clone --dry-run直接启动数据库服务即可--取消注册,实际上是从nodes表中删除数据/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby unregister--重新注册,重新将repmgr.conf中的配置加载到nodes表中/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby register--强制注册force,实际上就是覆盖现有的配置/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby register --force--指定主节点,一般不用指定,直接会根据postgresql.auto.conf找到主节点/usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby register  --upstream-node-id=2对正常节点重新注册,目的是修改配置之后,重新注册会,达到重新加载的功能,从节点(pg02,pg03)进行重新注册操作$ repmgr -f /home/postgres/repmgr/repmgr.conf standby unregister$ repmgr -f /home/postgres/repmgr/repmgr.conf standby register --upstream-node-id=1

 

自动故障转移

强制关闭主节点Ubuntu05上的PostgreSQL服务模拟故障

自动故障转移过程如下:
image
image

repmgr的转移过程日志,可以看到repmgr会根据上面配置文件的重试间隔reconnect_interval和重试参数reconnect_attempts,一直重试,如果最终主节点不可达,开始故障转移,整个过程为1分钟

[2025-09-18 13:24:00] [INFO] monitoring connection to upstream node "ubuntu05" (ID: 1)
[2025-09-18 13:26:26] [INFO] node "ubuntu06" (ID: 2) monitoring upstream node "ubuntu05" (ID: 1) in normal state
[2025-09-18 13:26:26] [DETAIL] last monitoring statistics update was 5 seconds ago
[2025-09-18 13:29:01] [INFO] node "ubuntu06" (ID: 2) monitoring upstream node "ubuntu05" (ID: 1) in normal state
[2025-09-18 13:29:01] [DETAIL] last monitoring statistics update was 5 seconds ago
***************************************************这里开始模拟主节点故障,从节点开始重试*************************************************************************
[2025-09-18 13:30:01] [WARNING] unable to ping "host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=100"
[2025-09-18 13:30:01] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:01] [WARNING] unable to connect to upstream node "ubuntu05" (ID: 1)
[2025-09-18 13:30:01] [INFO] checking state of node "ubuntu05" (ID: 1), 1 of 12 attempts
[2025-09-18 13:30:01] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:01] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:01] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:02] [WARNING] unable to ping "host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=100"
[2025-09-18 13:30:02] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:02] [WARNING] unable to connect to upstream node "ubuntu05" (ID: 1)
[2025-09-18 13:30:02] [INFO] checking state of node "ubuntu05" (ID: 1), 1 of 12 attempts
[2025-09-18 13:30:02] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:02] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:02] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:06] [INFO] checking state of node "ubuntu05" (ID: 1), 2 of 12 attempts
[2025-09-18 13:30:06] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:06] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:06] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:07] [INFO] checking state of node "ubuntu05" (ID: 1), 2 of 12 attempts
[2025-09-18 13:30:07] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:07] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:11] [INFO] checking state of node "ubuntu05" (ID: 1), 3 of 12 attempts
[2025-09-18 13:30:11] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:11] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:11] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:12] [INFO] checking state of node "ubuntu05" (ID: 1), 3 of 12 attempts
[2025-09-18 13:30:12] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:12] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:12] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:16] [INFO] checking state of node "ubuntu05" (ID: 1), 4 of 12 attempts
[2025-09-18 13:30:16] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:16] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:16] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:17] [INFO] checking state of node "ubuntu05" (ID: 1), 4 of 12 attempts
[2025-09-18 13:30:17] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:17] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:17] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:22] [INFO] checking state of node "ubuntu05" (ID: 1), 5 of 12 attempts
[2025-09-18 13:30:22] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:22] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:22] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:22] [INFO] checking state of node "ubuntu05" (ID: 1), 5 of 12 attempts
[2025-09-18 13:30:22] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:22] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:22] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:27] [INFO] checking state of node "ubuntu05" (ID: 1), 6 of 12 attempts
[2025-09-18 13:30:27] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:27] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:27] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:27] [INFO] checking state of node "ubuntu05" (ID: 1), 6 of 12 attempts
[2025-09-18 13:30:27] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:27] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:27] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:32] [INFO] checking state of node "ubuntu05" (ID: 1), 7 of 12 attempts
[2025-09-18 13:30:32] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:32] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:32] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:32] [INFO] checking state of node "ubuntu05" (ID: 1), 7 of 12 attempts
[2025-09-18 13:30:32] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:32] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:32] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:37] [INFO] checking state of node "ubuntu05" (ID: 1), 8 of 12 attempts
[2025-09-18 13:30:37] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:37] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:37] [INFO] checking state of node "ubuntu05" (ID: 1), 8 of 12 attempts
[2025-09-18 13:30:37] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:37] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:42] [INFO] checking state of node "ubuntu05" (ID: 1), 9 of 12 attempts
[2025-09-18 13:30:42] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:42] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:42] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:42] [INFO] checking state of node "ubuntu05" (ID: 1), 9 of 12 attempts
[2025-09-18 13:30:42] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:42] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:42] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:47] [INFO] checking state of node "ubuntu05" (ID: 1), 10 of 12 attempts
[2025-09-18 13:30:47] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:47] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:47] [INFO] checking state of node "ubuntu05" (ID: 1), 10 of 12 attempts
[2025-09-18 13:30:47] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:47] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:52] [INFO] checking state of node "ubuntu05" (ID: 1), 11 of 12 attempts
[2025-09-18 13:30:52] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:52] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:52] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:52] [INFO] checking state of node "ubuntu05" (ID: 1), 11 of 12 attempts
[2025-09-18 13:30:52] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:52] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:52] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2025-09-18 13:30:57] [INFO] checking state of node "ubuntu05" (ID: 1), 12 of 12 attempts
[2025-09-18 13:30:57] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:57] [WARNING] unable to reconnect to node "ubuntu05" (ID: 1) after 12 attempts
[2025-09-18 13:30:57] [INFO] 1 active sibling nodes registered
[2025-09-18 13:30:57] [INFO] 3 total nodes registered
[2025-09-18 13:30:57] [INFO] primary node  "ubuntu05" (ID: 1) and this node have the same location ("default")
[2025-09-18 13:30:57] [INFO] local node's last receive lsn: 0/220000A0
[2025-09-18 13:30:57] [INFO] checking state of sibling node "ubuntu07" (ID: 3)
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) reports its upstream is node 1, last seen 56 second(s) ago
[2025-09-18 13:30:57] [INFO] standby node "ubuntu07" (ID: 3) last saw primary node 56 second(s) ago
[2025-09-18 13:30:57] [INFO] last receive LSN for sibling node "ubuntu07" (ID: 3) is: 0/220000A0
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) has same LSN as current candidate "ubuntu06" (ID: 2)
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) has lower priority (60) than current candidate "ubuntu06" (ID: 2) (80)
[2025-09-18 13:30:57] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2025-09-18 13:30:57] [NOTICE] promotion candidate is "ubuntu06" (ID: 2)
[2025-09-18 13:30:57] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2025-09-18 13:30:57] [INFO] promote_command is:"/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file"
[2025-09-18 13:30:57] [NOTICE] redirecting logging output to "/usr/local/pgsql16/repmgr/repmgr.log"[2025-09-18 13:30:57] [WARNING] 1 sibling nodes found, but option "--siblings-follow" not specified
[2025-09-18 13:30:57] [DETAIL] these nodes will remain attached to the current primary:ubuntu07 (node ID: 3)
[2025-09-18 13:30:57] [NOTICE] promoting standby to primary
[2025-09-18 13:30:57] [DETAIL] promoting server "ubuntu06" (ID: 2) using pg_promote()
[2025-09-18 13:30:57] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2025-09-18 13:30:57] [INFO] checking state of node "ubuntu05" (ID: 1), 12 of 12 attempts
[2025-09-18 13:30:57] [WARNING] unable to ping "user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr"
[2025-09-18 13:30:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:57] [WARNING] unable to reconnect to node "ubuntu05" (ID: 1) after 12 attempts
[2025-09-18 13:30:57] [INFO] 1 active sibling nodes registered
[2025-09-18 13:30:57] [INFO] 3 total nodes registered
[2025-09-18 13:30:57] [INFO] primary node  "ubuntu05" (ID: 1) and this node have the same location ("default")
[2025-09-18 13:30:57] [INFO] local node's last receive lsn: 0/220000A0
[2025-09-18 13:30:57] [INFO] checking state of sibling node "ubuntu07" (ID: 3)
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) reports its upstream is node 1, last seen 56 second(s) ago
[2025-09-18 13:30:57] [INFO] standby node "ubuntu07" (ID: 3) last saw primary node 56 second(s) ago
[2025-09-18 13:30:57] [INFO] last receive LSN for sibling node "ubuntu07" (ID: 3) is: 0/220000A0
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) has same LSN as current candidate "ubuntu06" (ID: 2)
[2025-09-18 13:30:57] [INFO] node "ubuntu07" (ID: 3) has lower priority (60) than current candidate "ubuntu06" (ID: 2) (80)
[2025-09-18 13:30:57] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 10 seconds
[2025-09-18 13:30:57] [NOTICE] promotion candidate is "ubuntu06" (ID: 2)
[2025-09-18 13:30:57] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2025-09-18 13:30:57] [INFO] promote_command is:"/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file"
[2025-09-18 13:30:57] [NOTICE] redirecting logging output to "/usr/local/pgsql16/repmgr/repmgr.log"[2025-09-18 13:30:57] [ERROR] STANDBY PROMOTE can only be executed on a standby node
[2025-09-18 13:30:57] [ERROR] promote command failed
[2025-09-18 13:30:57] [DETAIL] promote command exited with error code 8
[2025-09-18 13:30:57] [INFO] checking if original primary node has reappeared
[2025-09-18 13:30:57] [ERROR] connection to database failed
[2025-09-18 13:30:57] [DETAIL] 
connection to server at "192.168.152.111", port 9000 failed: Connection refusedIs the server running on that host and accepting TCP/IP connections?[2025-09-18 13:30:57] [DETAIL] attempted to connect using:user=repmgr connect_timeout=100 dbname=repmgr host=192.168.152.111 port=9000 fallback_application_name=repmgr options=-csearch_path=
[2025-09-18 13:30:57] [WARNING] unable to ping "host=192.168.152.111 user=repmgr dbname=repmgr port=9000 connect_timeout=100"
[2025-09-18 13:30:57] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2025-09-18 13:30:57] [NOTICE] local node is primary, checking local node state
[2025-09-18 13:30:57] [NOTICE] resuming monitoring as primary node after 0 seconds
[2025-09-18 13:30:57] [INFO] 1 followers to notify
[2025-09-18 13:30:57] [INFO] reconnecting to node "ubuntu07" (ID: 3)...
[2025-09-18 13:30:57] [NOTICE] notifying node "ubuntu07" (ID: 3) to follow node 2
INFO:  node 3 received notification to follow node 2
[2025-09-18 13:30:57] [NOTICE] monitoring cluster primary "ubuntu06" (ID: 2)
[2025-09-18 13:30:58] [NOTICE] STANDBY PROMOTE successful
[2025-09-18 13:30:58] [DETAIL] server "ubuntu06" (ID: 2) was successfully promoted to primary
[2025-09-18 13:30:58] [INFO] checking state of node 2, 1 of 12 attempts
[2025-09-18 13:30:58] [NOTICE] node 2 has recovered, reconnecting
[2025-09-18 13:30:58] [INFO] connection to node 2 succeeded
[2025-09-18 13:30:58] [INFO] original connection is still available
[2025-09-18 13:30:58] [INFO] 1 followers to notify
[2025-09-18 13:30:58] [NOTICE] notifying node "ubuntu07" (ID: 3) to follow node 2
INFO:  node 3 received notification to follow node 2
[2025-09-18 13:30:58] [INFO] switching to primary monitoring mode
[2025-09-18 13:30:58] [NOTICE] monitoring cluster primary "ubuntu06" (ID: 2)
[2025-09-18 13:30:58] [INFO] child node "ubuntu07" (ID: 3) is attached
[2025-09-18 13:31:02] [NOTICE] new standby "ubuntu07" (ID: 3) has connected
[2025-09-18 13:35:57] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:35:58] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:40:58] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:40:58] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:45:58] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:45:59] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:50:58] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state
[2025-09-18 13:50:59] [INFO] monitoring primary node "ubuntu06" (ID: 2) in normal state

 

repmgr的优缺点总结

repmgr在高可用方案上,勉强能用吧。
优点是安装配置都比较简单,
缺点是没办法做到连续自动故障转移,第一次转移完成后,故障节点想拉起来,还是要先做手动pg_rewind。
repmgr把元数据保存在本地的PostgreSQL数据库中,数据库启动之前repmgr进程不知道集群状态,所以不可能自动rewind,这也就是用PostgreSQL自身保存集群元数据的缺陷,也算是跟partoni的差距吧。


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/907369.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

25.9.18随笔联考总结

考试 通读题面,发现前两道是签。然后开做,饭堂,最后花费大部分时间过掉。后面两道题都不会。寄寄。 估计:100+100+0+0。实际:100+100+0+0。 有人藏分,素质有待提高! 改题+总结 T3 需要看出无限制的方案数对应卡…

P3642 [APIO2016] 烟花表演 解题报告

简要题意 给定一颗有根树,边有边权。你可以花费 \(1\) 的代价使任意一条边的边权减一或加一。询问使所有叶子到根的距离相等的最小代价。 分析 首先看上去就很 dp,于是考虑状态设计。设 \(f_{u,i}\) 表示使 \(u\) 子…

Manim实现闪光轨迹特效

在动画制作中,轨迹特效常常用于增强视觉效果,而带有闪光效果的轨迹更是能够吸引观众的注意力。 本文将介绍如何使用Manim动画库实现闪光轨迹特效。 1. 实现原理 下面的GlowingTracedPath类参考了Manim中的TracePath类…

Slope Trick 学习笔记

前言 诚然,虽然它名字里带了"Slope",但是它不是斜率优化,而是一个比它还要难的东西(作者本人主观臆断)。 并且,关于 CF13C,有一点很多文章都没有提及,所以会有人看不懂为什么要这么做(作者本人亲身…

使用 libaudioclient 实现 Android Native层 音频测试工具

libaudioclient 除了支持 setAudioPortConfig() 调用,也支持 setMasterMute()、setStreamMute()、setParameters()、getParameters()、setMode() 等接口调用,满足各种开发测试需求。它让你不需要关注这些细枝末节的差…

03-初始化测试数据

03-初始化测试数据$(".postTitle2").removeClass("postTitle2").addClass("singleposttitle");显示所有数据库 show databases;创建数据库 create database testdb;使用数据库 use test…

漏洞详解--文件上传 如何花样绕过?!

一、漏洞原理 1.1 核心 文件上传漏洞,顾名思义,将攻击者将恶意文件上传到服务器,服务器将恶意文件解析,攻击就达成了。 1.2 漏洞详解 文件上传漏洞非常好理解,有三个关键点,一是上传文件,二是找到文件上传的路径…

深入解析:AI Agent开发秘籍:Prompt工程与测评最佳实践(建议收藏反复研读)

深入解析:AI Agent开发秘籍:Prompt工程与测评最佳实践(建议收藏反复研读)2025-09-18 19:18 tlnshuju 阅读(0) 评论(0) 收藏 举报pre { white-space: pre !important; word-wrap: normal !important; overflow-…

使用Windows客户端访问EDA环境的NFS共享

在IC设计环境中, 也总是会有Windows操作系统的开发服务器或者客户端需要使用IC设计平台中Linux主机使用的NFS服务器。 使用者也是IC设计团队中的一员,可能出于设计工具的原因, 他/她的部分工作必须在Windows中完成,…

Day03-1

public class HelloWorld { public static void main(String[] args) { String teacher = "David"; System.out.println("Hello World"); } //有趣的代码注释 //单行注释 //输出一个Hello,World! …

实用指南:鸿蒙智能设备自动诊断实战:从传感器采集到远程上报的完整实现

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

使用php -S 127.0.0.1:8000 新建php服务

php -S 127.0.0.1:8000 启动服务后 当前坐在目录下的所有文件都可以在浏览器上以 http://127.0.0.1/目录/文件名称 的形式访问 比如: 浏览器访问 http://127.0.0.1/upload_files.php 就会直接调用当前目录 upload_fi…

WPF ControlTemplate DI Via Microsoft.Extensions.DependencyInjection

Install-Package Microsoft.Extensions.DependencyInjection; Install-Package CommunityToolkit.mvvm; //app.xaml <Application x:Class="WpfApp21.App"xmlns="http://schemas.microsoft.com/winf…

完整教程:从“我店”模式看绿色积分电商平台的困境与破局

完整教程:从“我店”模式看绿色积分电商平台的困境与破局pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consola…

Java第三周课前思考

什么样的方法应该用static修饰?不用static修饰的方法往往具有什么特性?Student的getName应该用static修饰吗?完成独立功能或创建类的实例或对类级别的属性进行操作的方法应该用static修饰。 不用static修饰的方法往…

完整教程:光伏电站安全 “守护神”:QB800 绝缘监测平台,为清洁能源高效运行筑固防线

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

Java的安装及卸载

卸载JDK删除java的安装目录 删除JAVA_HOME(环境配置中) 删除path下关于java的目录(环境配置中) cmd中查找java -version是否仍存在安装JDK百度搜索JDK8,找到下载地址 同意协议 下载电脑对应的版本 双击安装JDK 记…

sql server 折腾时不小心去掉了 sysadmin 权限

sql server 折腾时不小心去掉了 sysadmin 权限恢复方法: net stop MSSQLSERVERsqlcmd -E -S . -Q "ALTER SERVER ROLE sysadmin ADD MEMBER [MyPC\admin]"net start MSSQLSERVER桂棹兮兰桨,击空明兮溯流光…

题解:P13882 [蓝桥杯 2023 省 Java A] 小蓝的旅行计划

挺可爱的反悔贪心,乍一看没看出和旅行家的预算的区别,甚至做完才发现不一样的说。 正文 首先我们可以将操作分为两个部分。分别是用油操作和加油操作。 用油 有一个简单的贪心策略,用油的时候首先使用最便宜的油,这…

深入解析:无人设备遥控器之帧同步技术篇

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …