Linux软件安装 —— PostgreSQL高可用集群安装（postgreSQL + repmgr主从复制 + keepalived故障转移）

文章目录

一、节点说明
二、软件下载
三、安装PostgreSQL
四、安装repmgr
- - 1、前置准备
  - 2、编译安装（两台节点）
  - 3、配置repmgr
  - 4、添加主从到repmgr集群
五、安装keepalived（root用户）
- - 1、编译并安装（两台节点）
  - 2、配置keepalived
  - 3、启动
六、测试验证
- - 1、初始状态验证
  - 2、主从复制测试
  - 3、手动主从切换触发VIP漂移
  - 4、主节点挂掉后VIP漂移与恢复
  - 5、小结
七、可能发生的问题
- - 1、主节点宕机后主从切换失败
  - 2、手动主从失败

一、节点说明

1、相关软件

IP	主机名	部署软件
192.168.10.102	node02	postgresql，keepalived，repmgr
192.168.10.103	node03	postgresql，keepalived，repmgr
192.168.10.110	vip

二、软件下载

PostgreSQL：PostgreSQL: File Browser

Keepalived：Keepalived for Linux

repmgr：repmgr - Replication Manager for PostgreSQL clusters

本文使用：

postgresql-16.10.tar.gz

keepalived-2.2.4.tar.gz

repmgr-5.5.0.tar.gz

三、安装PostgreSQL

参考本人此篇文章：Linux软件安装 —— PostgreSQL集群安装（主从复制集群）

上文是采用postgreSQL原生的流式复制搭建的主从集群，
本文是在原生流式复制的基础上做高可用搭建。

四、安装repmgr

1、前置准备

（1）安装依赖（两台节点）

# 安装依赖包yuminstall-y libcurl-devel json-c-devel openssl-devel\postgresql16-devel libevent-devel ncurses-devel libedit-devel\libselinux-devel libxslt-devel perl-devel systemd-devel libxml2-devel\krb5-devel flex bison popt-devel

2、编译安装（两台节点）

# 使用postgres用户su- postgres# 解压tar-zxvf repmgr-5.5.0.tar.gzcd/opt/module/pgsql16/repmgr-5.5.0# 编译安装./configurePG_CONFIG=/opt/module/pgsql16/pgsql/bin/pg_config# 这里是pg数据库bin目录下的执行文件make&&makeinstall# 检查版本号，验证是否安装成功repmgr --version

3、配置repmgr

（1）配pgsql.conf（配置主节点即可，从节点需要用repmgr重新同步）

# 配置postgresql.confvim$PGDATA/postgresql.conf# 修改shared_preload_libraries='repmgr'# 添加IP段vi$PGDATA/pg_hba.conf# 这允许repmgr用户进行流复制连接，并采用 scram-sha-256 加密认证。hostreplication repmgr192.168.10.102/32 scram-sha-256hostreplication repmgr192.168.10.103/32 scram-sha-256# 这允许repmgr用户连接repmgr数据库进行元数据管理，并采用 scram-sha-256 加密认证。hostrepmgr repmgr192.168.10.102/32 scram-sha-256hostrepmgr repmgr192.168.10.103/32 scram-sha-256# 重启主库pg_ctl -D$PGDATA-l$PGHOME/logfile restart

（2）配置repmgr免密登录（两台节点）

# 需要创建vim/home/postgres/.pgpass# hostname:port:database:username:password192.168.10.102:5432:replication:repmgr:repmgr192.168.10.103:5432:replication:repmgr:repmgr192.168.10.102:5432:repmgr:repmgr:repmgr192.168.10.103:5432:repmgr:repmgr:repmgr192.168.10.102:5432:*:postgres:postgres192.168.10.103:5432:*:postgres:postgres# 权限必须600chmod600/home/postgres/.pgpass

（3）创建用于同步的用户及数据库（主库）

# 进入pgsql客户端psql# 创建repmgr用户，需要超级用户权限CREATEUSERrepmgrWITHSUPERUSER LOGIN ENCRYPTED PASSWORD'repmgr';# 创建repmgr数据库，所属用户为repmgr用户CREATEDATABASErepmgrWITHOWNER repmgr;GRANTALLPRIVILEGESONDATABASErepmgrTOrepmgr;

（4）repmgr.conf

主库配置

# 这个配置文件是没有的，需要新建，不要放在pgsql数据目录就行mkdir-p /opt/module/pgsql16/repmgr/cd/opt/module/pgsql16/repmgr/vimrepmgr.confnode_id=2# 节点唯一IDnode_name='node02'# 节点名称conninfo='host=192.168.10.102 port=5432 user=repmgr dbname=repmgr connect_timeout=2'# 连接信息data_directory='/opt/module/pgsql16/pgdata'# 数据目录pg_bindir='/opt/module/pgsql16/pgsql/bin'repmgr_bindir='/opt/module/pgsql16/pgsql/bin'use_replication_slots=yes# 使用复制槽，避免WAL日志被过早删除# repmgrdrepmgrd_service_start_command='repmgrd -f /opt/module/pgsql16/repmgr/repmgr.conf -p /opt/module/pgsql16/repmgr/repmgrd.pid -d'# 开启repmgrd_service_stop_command='kill `cat /opt/module/pgsql16/repmgr/repmgrd.pid`'# 关闭promote_command='repmgr standby promote -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file'# 提升命令follow_command='repmgr standby follow -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file -W --upstream-node-id=%n'# 跟随命令# 复制配置failover=automatic# 故障转移模式monitor_interval_secs=5# 监控间隔connection_check_type=ping# 连接检查类型reconnect_attempts=3# 重连尝试次数reconnect_interval=5# 重连间隔（秒）# 日志配置log_level=INFO# 日志级别log_file='/opt/module/pgsql16/repmgr/repmgr.log'# 日志文件log_status_interval=300# 状态日志间隔

从库配置

# 这个配置文件是没有的，需要新建，不要放在pgsql数据目录就行mkdir-p /opt/module/pgsql16/repmgr/cd/opt/module/pgsql16/repmgr/vimrepmgr.confnode_id=3# 节点唯一IDnode_name='node03'# 节点名称conninfo='host=192.168.10.103 port=5432 user=repmgr dbname=repmgr connect_timeout=2'# 连接信息data_directory='/opt/module/pgsql16/pgdata'# 数据目录pg_bindir='/opt/module/pgsql16/pgsql/bin'repmgr_bindir='/opt/module/pgsql16/pgsql/bin'use_replication_slots=yes# 使用复制槽，避免WAL日志被过早删除# repmgrdrepmgrd_service_start_command='repmgrd -f /opt/module/pgsql16/repmgr/repmgr.conf -p /opt/module/pgsql16/repmgr/repmgrd.pid -d'# 开启repmgrd_service_stop_command='kill `cat /opt/module/pgsql16/repmgr/repmgrd.pid`'# 关闭promote_command='repmgr standby promote -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file'# 提升命令follow_command='repmgr standby follow -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'# 跟随命令# 复制配置failover=automatic# 故障转移模式monitor_interval_secs=5# 监控间隔connection_check_type=ping# 连接检查类型reconnect_attempts=3# 重连尝试次数reconnect_interval=5# 重连间隔（秒）# 日志配置log_level=INFO# 日志级别log_file='/opt/module/pgsql16/repmgr/repmgr.log'# 日志文件log_status_interval=300# 状态日志间隔

4、添加主从到repmgr集群

（1）添加主库到repmgr（操作主库）

# 添加主库repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf primary register -F# 启动repmgrrepmgr -f /opt/module/pgsql16/repmgr/repmgr.conf daemon start# 查看repmgr信息repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus

（2）从库同步主库数据（操作从库）

# 停止从库pg_ctl -D$PGDATA-l$PGHOME/logfile stop# 先清空从库数据目录rm-rf /opt/module/pgsql16/pgdata/*# 检查从库是否具备同步条件repmgr -h192.168.10.102 -U repmgr -d repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby clone --dry-run# 同步从库repmgr -h192.168.10.102 -U repmgr -d repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby clone# 此时postgresql.auto.conf中显示的是repmgr账号的信息# primary_conninfo = 'host=192.168.10.102 user=repmgr application_name=''pg-node2'' password=repmgr port=5432'# 启动从库pg_ctl -D$PGDATA-l$PGHOME/logfile start

主库从库分别查看同步状态

psql-U repmgr repmgr=# \x# 主库查看repmgr=# select * from pg_stat_replication;# 从库查看repmgr=# SELECT * FROM pg_stat_wal_receiver;

同步没问题就可以将从库添加到repmgr

（3）添加从库到repmgr（操作从库）

# 添加从库repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby register -F# 启动repmgrrepmgr -f /opt/module/pgsql16/repmgr/repmgr.conf daemon start# 查看repmgr信息repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus

五、安装keepalived（root用户）

1、编译并安装（两台节点）

# 解压tar-zxvf keepalived-2.2.4.tar.gzcdkeepalived-2.2.4# 编译安装mkdir-p /opt/module/keepalived/ ./configure --prefix=/opt/module/keepalived/make&&makeinstall# 配置环境变量vim/etc/profile.d/my_env.sh# KEEPALIVED_HOMEexportKEEPALIVED_HOME=/opt/module/keepalivedexportPATH=$PATH:$KEEPALIVED_HOME/bin:$KEEPALIVED_HOME/sbinsource/etc/profile.d/my_env.sh# 检查版本号，验证是否安装成功keepalived --version# 创建软链接方便系统调用ln-s /opt/module/keepalived/sbin/keepalived /usr/sbin/

2、配置keepalived

# 备份原文件cd/opt/module/keepalived/etc/keepalivedcpkeepalived.conf keepalived.conf.bakvimkeepalived.conf

（1）主节点配置

!Configuration Fileforkeepalived global_defs{router_id node02}vrrp_script check_pg_alived{script"/opt/module/keepalived/etc/keepalived/check_postgres.sh"interval5# 检查间隔5秒weight20# 优先级降低加减20fall2# 连续2次失败才认为KOrise1# 1次成功就认为恢复timeout5# 脚本执行超时时间}vrrp_instance VI_PG{state BACKUP# nopreempt # 非抢占模式interface ens33# 网络接口（根据实际修改）virtual_router_id110# 虚拟路由ID，各节点配置必须一致priority100# 节点优先级，advert_int1authentication{auth_type PASS auth_pass1111# 加入集群密码，需保持一致}track_script{check_pg_alived}virtual_ipaddress{# VIP网卡信息192.168.10.110/24 dev ens33 label ens33:pgvip# VIP地址}}

（2）从节点配置

!Configuration Fileforkeepalived global_defs{router_id node03}vrrp_script check_pg_alived{script"/opt/module/keepalived/etc/keepalived/check_postgres.sh"interval5# 检查间隔5秒weight20# 优先级降低加减20fall2# 连续2次失败才认为KOrise1# 1次成功就认为恢复timeout5# 脚本执行超时时间}vrrp_instance VI_PG{state BACKUP# nopreempt # 非抢占模式interface ens33# 网卡名virtual_router_id110# 虚拟路由ID，各节点配置必须一致priority90# 节点优先级，抢占模式与权重weight组合使用advert_int1# VRRP通告间隔1秒authentication{auth_type PASS auth_pass1111# 加入集群密码，需保持一致}track_script{check_pg_alived}virtual_ipaddress{# VIP网卡信息192.168.10.110/24 dev ens33 label ens33:pgvip# VIP地址}}

（3）check_postgres.sh配置（主从一样配置）

# 在两台服务器的 /opt/keepalived/conf 目录下创建脚本vim/opt/module/keepalived/etc/keepalived/check_postgres.sh#!/bin/bash# 1. 检查PostgreSQL服务是否运行count=`ps-ef|greppostgres|grep-vgrep|wc-l`if[$count-eq0];thenexit1# 服务停止fi# 2. 检查数据库连接if!su- postgres -c"psql -tAc 'SELECT 1;'">/dev/null2>&1;thenexit1# 无法连接fi# 3. 检查是否是主节点ROLE=$(su - postgres -c"psql -tAc 'SELECT pg_is_in_recovery();'"2>/dev/null)if["$ROLE"="f"];thenexit0# 主节点elseexit1# 从节点fichmod+x /opt/module/keepalived/etc/keepalived/check_postgres.sh

3、启动

# 启动keepalived -f /opt/module/keepalived/etc/keepalived/keepalived.conf -D# 查看进程psaux|grepkeepalived# 停止pkill-TERMkeepalived

六、测试验证

1、初始状态验证

# 1. 在node02，node03分别执行，检查VIP位置（应在Node02）ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令，应跳转到node02ssh192.168.10.110# 2. 检查repmgr集群状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 3. 检查流复制状态（在主节点执行）psql -c"SELECT client_addr, state, sync_state FROM pg_stat_replication;"# 4. 测试通过VIP连接psql -h192.168.10.110 -U postgres -c"SELECT inet_server_addr(), pg_is_in_recovery();"

2、主从复制测试

# 当前主节点node02，到主节点node02操作psql-h192.168.10.102-U postgres-c"CREATE TABLE test_rep2 (id int, name text);"psql-h192.168.10.102-U postgres-c"INSERT INTO test_rep2 VALUES (1, 'Hello from Master');"# VIP写入psql-h192.168.10.110-U postgres-c"INSERT INTO test_rep2 VALUES (2, 'Hello from VIP');"# 从库查询psql-h192.168.10.103-U postgres-c"SELECT * FROM test_rep2;"# 此外，从库是只读状态，进行增删改会报错psql-h192.168.10.103-U postgres-c"INSERT INTO test_rep2 VALUES (2, 'Hello from Slave');"

3、手动主从切换触发VIP漂移

# 当前主节点node02，到从节点node03操作# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 从库执行，升级为主节点repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby switchover --siblings-follow# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 查看 VIP 是否漂移到node03ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令，应跳转到node03ssh192.168.10.110# 验证主从复制# 主库操作psql -h192.168.10.103 -U postgres -c"INSERT INTO test_rep2 VALUES (3, 'Repmgr from Master');"# VIP写入psql -h192.168.10.110 -U postgres -c"INSERT INTO test_rep2 VALUES (4, 'Repmgr from VIP');"# 从库查询psql -h192.168.10.102 -U postgres -c"SELECT * FROM test_rep2;"# 此外，从库是只读状态，进行增删改会报错psql -h192.168.10.102 -U postgres -c"INSERT INTO test_rep2 VALUES (5, 'Hello from Slave');"

4、主节点挂掉后VIP漂移与恢复

# 当前主节点为node03# 停止主节点pg_ctl -D$PGDATA-l$PGHOME/logfile stop# 到node02查看repmgr状态，node02已经成为主节点repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 查看 VIP 是否漂移到node02ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令，应跳转到node02ssh192.168.10.110# 在node03执行，此命令功能重启pgsql并成为node02从节点repmgr -h192.168.10.102 -U repmgr -p5432-d repmgr -f /opt/module/pgsql16/repmgr/repmgr.confnoderejoin --force-rewind# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 验证主从复制# 主库操作psql -h192.168.10.102 -U postgres -c"INSERT INTO test_rep2 VALUES (5, 'Repmgr02 from Master');"# VIP写入psql -h192.168.10.110 -U postgres -c"INSERT INTO test_rep2 VALUES (6, 'Repmgr02 from VIP');"# 从库查询psql -h192.168.10.103 -U postgres -c"SELECT * FROM test_rep2;"# 此外，从库是只读状态，进行增删改会报错psql -h192.168.10.103 -U postgres -c"INSERT INTO test_rep2 VALUES (6, 'Hello from Slave');"

5、小结

# 手动主从切换在从节点执行repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby switchover --siblings-follow# 主节点挂掉发生主从切换在挂掉的节点执行，重启pgsql并成为从节点，-h 指向新的主节点repmgr -h192.168.10.103 -U repmgr -p5432-d repmgr -f /opt/module/pgsql16/repmgr/repmgr.confnoderejoin --force-rewind# 查看repmgr状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus

七、可能发生的问题

1、主节点宕机后主从切换失败

上图可以看见Paused?表示repmgr集群被逻辑暂停了，此时主节点宕机，repmgr集群无法检测psql状态，导致主从切换不过去，生产环境确保Paused?值为no

# 从启主节点pg_ctl -D$PGDATA-l$PGHOME/logfile start# 立即解除暂停repmgr -f /opt/module/pgsql16/repmgr/repmgr.confserviceunpause# 手动暂停# repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf service pause

2、手动主从失败

这个问题是系统找不到/opt/module/pgsql16/pgsql/lib这个目录

# 配置系统环境变量sudovim/etc/profile.d/my_env.shexportPGHOME=/opt/module/pgsql16/pgsqlexportPGDATA=/opt/module/pgsql16/pgdataexportPATH=$PATH:$PGHOME/binexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PGHOME/lib# 重新加载环境变量sudosource/etc/profile.d/my_env.sh