文章目录
- 一、节点说明
- 二、软件下载
- 三、安装PostgreSQL
- 四、安装repmgr
- 1、前置准备
- 2、编译安装(两台节点)
- 3、配置repmgr
- 4、添加主从到repmgr集群
- 五、安装keepalived(root用户)
- 1、编译并安装(两台节点)
- 2、配置keepalived
- 3、启动
- 六、测试验证
- 1、初始状态验证
- 2、主从复制测试
- 3、手动主从切换触发VIP漂移
- 4、主节点挂掉后VIP漂移与恢复
- 5、小结
- 七、可能发生的问题
- 1、主节点宕机后主从切换失败
- 2、手动主从失败
一、节点说明
1、相关软件
| IP | 主机名 | 部署软件 |
|---|---|---|
| 192.168.10.102 | node02 | postgresql,keepalived,repmgr |
| 192.168.10.103 | node03 | postgresql,keepalived,repmgr |
| 192.168.10.110 | vip |
二、软件下载
PostgreSQL:PostgreSQL: File Browser
Keepalived:Keepalived for Linux
repmgr:repmgr - Replication Manager for PostgreSQL clusters
本文使用:
postgresql-16.10.tar.gz
keepalived-2.2.4.tar.gz
repmgr-5.5.0.tar.gz
三、安装PostgreSQL
参考本人此篇文章:Linux软件安装 —— PostgreSQL集群安装(主从复制集群)
上文是采用postgreSQL原生的流式复制搭建的主从集群,
本文是在原生流式复制的基础上做高可用搭建。
四、安装repmgr
1、前置准备
(1)安装依赖(两台节点)
# 安装依赖包yuminstall-y libcurl-devel json-c-devel openssl-devel\postgresql16-devel libevent-devel ncurses-devel libedit-devel\libselinux-devel libxslt-devel perl-devel systemd-devel libxml2-devel\krb5-devel flex bison popt-devel2、编译安装(两台节点)
# 使用postgres用户su- postgres# 解压tar-zxvf repmgr-5.5.0.tar.gzcd/opt/module/pgsql16/repmgr-5.5.0# 编译安装./configurePG_CONFIG=/opt/module/pgsql16/pgsql/bin/pg_config# 这里是pg数据库bin目录下的执行文件make&&makeinstall# 检查版本号,验证是否安装成功repmgr --version3、配置repmgr
(1)配pgsql.conf(配置主节点即可,从节点需要用repmgr重新同步)
# 配置postgresql.confvim$PGDATA/postgresql.conf# 修改shared_preload_libraries='repmgr'# 添加IP段vi$PGDATA/pg_hba.conf# 这允许repmgr用户进行流复制连接,并采用 scram-sha-256 加密认证。hostreplication repmgr192.168.10.102/32 scram-sha-256hostreplication repmgr192.168.10.103/32 scram-sha-256# 这允许repmgr用户连接repmgr数据库进行元数据管理,并采用 scram-sha-256 加密认证。hostrepmgr repmgr192.168.10.102/32 scram-sha-256hostrepmgr repmgr192.168.10.103/32 scram-sha-256# 重启主库pg_ctl -D$PGDATA-l$PGHOME/logfile restart(2)配置repmgr免密登录(两台节点)
# 需要创建vim/home/postgres/.pgpass# hostname:port:database:username:password192.168.10.102:5432:replication:repmgr:repmgr192.168.10.103:5432:replication:repmgr:repmgr192.168.10.102:5432:repmgr:repmgr:repmgr192.168.10.103:5432:repmgr:repmgr:repmgr192.168.10.102:5432:*:postgres:postgres192.168.10.103:5432:*:postgres:postgres# 权限必须600chmod600/home/postgres/.pgpass(3)创建用于同步的用户及数据库(主库)
# 进入pgsql客户端psql# 创建repmgr用户,需要超级用户权限CREATEUSERrepmgrWITHSUPERUSER LOGIN ENCRYPTED PASSWORD'repmgr';# 创建repmgr数据库,所属用户为repmgr用户CREATEDATABASErepmgrWITHOWNER repmgr;GRANTALLPRIVILEGESONDATABASErepmgrTOrepmgr;(4)repmgr.conf
主库配置
# 这个配置文件是没有的,需要新建,不要放在pgsql数据目录就行mkdir-p /opt/module/pgsql16/repmgr/cd/opt/module/pgsql16/repmgr/vimrepmgr.confnode_id=2# 节点唯一IDnode_name='node02'# 节点名称conninfo='host=192.168.10.102 port=5432 user=repmgr dbname=repmgr connect_timeout=2'# 连接信息data_directory='/opt/module/pgsql16/pgdata'# 数据目录pg_bindir='/opt/module/pgsql16/pgsql/bin'repmgr_bindir='/opt/module/pgsql16/pgsql/bin'use_replication_slots=yes# 使用复制槽,避免WAL日志被过早删除# repmgrdrepmgrd_service_start_command='repmgrd -f /opt/module/pgsql16/repmgr/repmgr.conf -p /opt/module/pgsql16/repmgr/repmgrd.pid -d'# 开启repmgrd_service_stop_command='kill `cat /opt/module/pgsql16/repmgr/repmgrd.pid`'# 关闭promote_command='repmgr standby promote -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file'# 提升命令follow_command='repmgr standby follow -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file -W --upstream-node-id=%n'# 跟随命令# 复制配置failover=automatic# 故障转移模式monitor_interval_secs=5# 监控间隔connection_check_type=ping# 连接检查类型reconnect_attempts=3# 重连尝试次数reconnect_interval=5# 重连间隔(秒)# 日志配置log_level=INFO# 日志级别log_file='/opt/module/pgsql16/repmgr/repmgr.log'# 日志文件log_status_interval=300# 状态日志间隔从库配置
# 这个配置文件是没有的,需要新建,不要放在pgsql数据目录就行mkdir-p /opt/module/pgsql16/repmgr/cd/opt/module/pgsql16/repmgr/vimrepmgr.confnode_id=3# 节点唯一IDnode_name='node03'# 节点名称conninfo='host=192.168.10.103 port=5432 user=repmgr dbname=repmgr connect_timeout=2'# 连接信息data_directory='/opt/module/pgsql16/pgdata'# 数据目录pg_bindir='/opt/module/pgsql16/pgsql/bin'repmgr_bindir='/opt/module/pgsql16/pgsql/bin'use_replication_slots=yes# 使用复制槽,避免WAL日志被过早删除# repmgrdrepmgrd_service_start_command='repmgrd -f /opt/module/pgsql16/repmgr/repmgr.conf -p /opt/module/pgsql16/repmgr/repmgrd.pid -d'# 开启repmgrd_service_stop_command='kill `cat /opt/module/pgsql16/repmgr/repmgrd.pid`'# 关闭promote_command='repmgr standby promote -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file'# 提升命令follow_command='repmgr standby follow -f /opt/module/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'# 跟随命令# 复制配置failover=automatic# 故障转移模式monitor_interval_secs=5# 监控间隔connection_check_type=ping# 连接检查类型reconnect_attempts=3# 重连尝试次数reconnect_interval=5# 重连间隔(秒)# 日志配置log_level=INFO# 日志级别log_file='/opt/module/pgsql16/repmgr/repmgr.log'# 日志文件log_status_interval=300# 状态日志间隔4、添加主从到repmgr集群
(1)添加主库到repmgr(操作主库)
# 添加主库repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf primary register -F# 启动repmgrrepmgr -f /opt/module/pgsql16/repmgr/repmgr.conf daemon start# 查看repmgr信息repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus(2)从库同步主库数据(操作从库)
# 停止从库pg_ctl -D$PGDATA-l$PGHOME/logfile stop# 先清空从库数据目录rm-rf /opt/module/pgsql16/pgdata/*# 检查从库是否具备同步条件repmgr -h192.168.10.102 -U repmgr -d repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby clone --dry-run# 同步从库repmgr -h192.168.10.102 -U repmgr -d repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby clone# 此时postgresql.auto.conf中显示的是repmgr账号的信息# primary_conninfo = 'host=192.168.10.102 user=repmgr application_name=''pg-node2'' password=repmgr port=5432'# 启动从库pg_ctl -D$PGDATA-l$PGHOME/logfile start主库从库分别查看同步状态
psql-U repmgr repmgr=# \x# 主库查看repmgr=# select * from pg_stat_replication;# 从库查看repmgr=# SELECT * FROM pg_stat_wal_receiver;同步没问题就可以将从库添加到repmgr
(3)添加从库到repmgr(操作从库)
# 添加从库repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby register -F# 启动repmgrrepmgr -f /opt/module/pgsql16/repmgr/repmgr.conf daemon start# 查看repmgr信息repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus五、安装keepalived(root用户)
1、编译并安装(两台节点)
# 解压tar-zxvf keepalived-2.2.4.tar.gzcdkeepalived-2.2.4# 编译安装mkdir-p /opt/module/keepalived/ ./configure --prefix=/opt/module/keepalived/make&&makeinstall# 配置环境变量vim/etc/profile.d/my_env.sh# KEEPALIVED_HOMEexportKEEPALIVED_HOME=/opt/module/keepalivedexportPATH=$PATH:$KEEPALIVED_HOME/bin:$KEEPALIVED_HOME/sbinsource/etc/profile.d/my_env.sh# 检查版本号,验证是否安装成功keepalived --version# 创建软链接方便系统调用ln-s /opt/module/keepalived/sbin/keepalived /usr/sbin/2、配置keepalived
# 备份原文件cd/opt/module/keepalived/etc/keepalivedcpkeepalived.conf keepalived.conf.bakvimkeepalived.conf(1)主节点配置
!Configuration Fileforkeepalived global_defs{router_id node02}vrrp_script check_pg_alived{script"/opt/module/keepalived/etc/keepalived/check_postgres.sh"interval5# 检查间隔5秒weight20# 优先级降低加减20fall2# 连续2次失败才认为KOrise1# 1次成功就认为恢复timeout5# 脚本执行超时时间}vrrp_instance VI_PG{state BACKUP# nopreempt # 非抢占模式interface ens33# 网络接口(根据实际修改)virtual_router_id110# 虚拟路由ID,各节点配置必须一致priority100# 节点优先级,advert_int1authentication{auth_type PASS auth_pass1111# 加入集群密码,需保持一致}track_script{check_pg_alived}virtual_ipaddress{# VIP网卡信息192.168.10.110/24 dev ens33 label ens33:pgvip# VIP地址}}(2)从节点配置
!Configuration Fileforkeepalived global_defs{router_id node03}vrrp_script check_pg_alived{script"/opt/module/keepalived/etc/keepalived/check_postgres.sh"interval5# 检查间隔5秒weight20# 优先级降低加减20fall2# 连续2次失败才认为KOrise1# 1次成功就认为恢复timeout5# 脚本执行超时时间}vrrp_instance VI_PG{state BACKUP# nopreempt # 非抢占模式interface ens33# 网卡名virtual_router_id110# 虚拟路由ID,各节点配置必须一致priority90# 节点优先级,抢占模式与权重weight组合使用advert_int1# VRRP通告间隔1秒authentication{auth_type PASS auth_pass1111# 加入集群密码,需保持一致}track_script{check_pg_alived}virtual_ipaddress{# VIP网卡信息192.168.10.110/24 dev ens33 label ens33:pgvip# VIP地址}}(3)check_postgres.sh配置(主从一样配置)
# 在两台服务器的 /opt/keepalived/conf 目录下创建脚本vim/opt/module/keepalived/etc/keepalived/check_postgres.sh#!/bin/bash# 1. 检查PostgreSQL服务是否运行count=`ps-ef|greppostgres|grep-vgrep|wc-l`if[$count-eq0];thenexit1# 服务停止fi# 2. 检查数据库连接if!su- postgres -c"psql -tAc 'SELECT 1;'">/dev/null2>&1;thenexit1# 无法连接fi# 3. 检查是否是主节点ROLE=$(su - postgres -c"psql -tAc 'SELECT pg_is_in_recovery();'"2>/dev/null)if["$ROLE"="f"];thenexit0# 主节点elseexit1# 从节点fichmod+x /opt/module/keepalived/etc/keepalived/check_postgres.sh3、启动
# 启动keepalived -f /opt/module/keepalived/etc/keepalived/keepalived.conf -D# 查看进程psaux|grepkeepalived# 停止pkill-TERMkeepalived六、测试验证
1、初始状态验证
# 1. 在node02,node03分别执行,检查VIP位置(应在Node02)ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令,应跳转到node02ssh192.168.10.110# 2. 检查repmgr集群状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 3. 检查流复制状态(在主节点执行)psql -c"SELECT client_addr, state, sync_state FROM pg_stat_replication;"# 4. 测试通过VIP连接psql -h192.168.10.110 -U postgres -c"SELECT inet_server_addr(), pg_is_in_recovery();"2、主从复制测试
# 当前主节点node02,到主节点node02操作psql-h192.168.10.102-U postgres-c"CREATE TABLE test_rep2 (id int, name text);"psql-h192.168.10.102-U postgres-c"INSERT INTO test_rep2 VALUES (1, 'Hello from Master');"# VIP写入psql-h192.168.10.110-U postgres-c"INSERT INTO test_rep2 VALUES (2, 'Hello from VIP');"# 从库查询psql-h192.168.10.103-U postgres-c"SELECT * FROM test_rep2;"# 此外,从库是只读状态,进行增删改会报错psql-h192.168.10.103-U postgres-c"INSERT INTO test_rep2 VALUES (2, 'Hello from Slave');"3、手动主从切换触发VIP漂移
# 当前主节点node02,到从节点node03操作# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 从库执行,升级为主节点repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby switchover --siblings-follow# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 查看 VIP 是否漂移到node03ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令,应跳转到node03ssh192.168.10.110# 验证主从复制# 主库操作psql -h192.168.10.103 -U postgres -c"INSERT INTO test_rep2 VALUES (3, 'Repmgr from Master');"# VIP写入psql -h192.168.10.110 -U postgres -c"INSERT INTO test_rep2 VALUES (4, 'Repmgr from VIP');"# 从库查询psql -h192.168.10.102 -U postgres -c"SELECT * FROM test_rep2;"# 此外,从库是只读状态,进行增删改会报错psql -h192.168.10.102 -U postgres -c"INSERT INTO test_rep2 VALUES (5, 'Hello from Slave');"4、主节点挂掉后VIP漂移与恢复
# 当前主节点为node03# 停止主节点pg_ctl -D$PGDATA-l$PGHOME/logfile stop# 到node02查看repmgr状态,node02已经成为主节点repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 查看 VIP 是否漂移到node02ipaddr show ens33|grep192.168.10.110# 或者执行ssh远程命令,应跳转到node02ssh192.168.10.110# 在node03执行,此命令功能重启pgsql并成为node02从节点repmgr -h192.168.10.102 -U repmgr -p5432-d repmgr -f /opt/module/pgsql16/repmgr/repmgr.confnoderejoin --force-rewind# 查看当前状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus# 验证主从复制# 主库操作psql -h192.168.10.102 -U postgres -c"INSERT INTO test_rep2 VALUES (5, 'Repmgr02 from Master');"# VIP写入psql -h192.168.10.110 -U postgres -c"INSERT INTO test_rep2 VALUES (6, 'Repmgr02 from VIP');"# 从库查询psql -h192.168.10.103 -U postgres -c"SELECT * FROM test_rep2;"# 此外,从库是只读状态,进行增删改会报错psql -h192.168.10.103 -U postgres -c"INSERT INTO test_rep2 VALUES (6, 'Hello from Slave');"5、小结
# 手动主从切换在从节点执行repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf standby switchover --siblings-follow# 主节点挂掉发生主从切换在挂掉的节点执行,重启pgsql并成为从节点,-h 指向新的主节点repmgr -h192.168.10.103 -U repmgr -p5432-d repmgr -f /opt/module/pgsql16/repmgr/repmgr.confnoderejoin --force-rewind# 查看repmgr状态repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf cluster show repmgr -f /opt/module/pgsql16/repmgr/repmgr.confservicestatus七、可能发生的问题
1、主节点宕机后主从切换失败
上图可以看见Paused?表示repmgr集群被逻辑暂停了,此时主节点宕机,repmgr集群无法检测psql状态,导致主从切换不过去,生产环境确保Paused?值为no
# 从启主节点pg_ctl -D$PGDATA-l$PGHOME/logfile start# 立即解除暂停repmgr -f /opt/module/pgsql16/repmgr/repmgr.confserviceunpause# 手动暂停# repmgr -f /opt/module/pgsql16/repmgr/repmgr.conf service pause2、手动主从失败
这个问题是系统找不到/opt/module/pgsql16/pgsql/lib这个目录
# 配置系统环境变量sudovim/etc/profile.d/my_env.shexportPGHOME=/opt/module/pgsql16/pgsqlexportPGDATA=/opt/module/pgsql16/pgdataexportPATH=$PATH:$PGHOME/binexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PGHOME/lib# 重新加载环境变量sudosource/etc/profile.d/my_env.sh