二十三、K8s企业级架构设计及落地
- 二十三、K8s企业级架构设计及落地
- 1、K8s企业级架构设计
- 1.1 Kubernetes集群架构--高可用
- 1.2 K8s生产环境资源配置
- 1.3 K8s生产环境磁盘划分
- 1.4 k8s集群网段划分
- 2、基本环境配置
- 2.1 集群规划(学习测试环境)
- 2.2 磁盘挂载
- 2.3 基本环境配置
- 2.3.1 配置hosts (所有节点)
- 2.3.2 配置阿里云镜像源 (所有节点)
- 2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点)
- 2.3.4 时间同步 (所有节点)
- 2.3.5 配置limit (所有节点)
- 2.3.6 升级系统 (所有节点)
- 2.3.7 配置免密钥登录 (Master01节点)
- 2.4 内核配置 (所有节点)
- 2.4.1 安装ipvsadm
- 2.4.2 配置ipvs模块
- 2.4.3 内核优化配置
- 2.4.4 重启
- 3、高可用组件安装 (Master节点)
- 3.1 安装HAProxy和KeepAlived
- 3.2 配置HAProxy
- 3.3 配置KeepAlived
- 3.4 配置KeepAlived健康检查文件
- 3.5 测试keepalived的VIP是否是正常的
- 4、Runtime安装 (所有节点)
- 4.1 配置安装源
- 4.2 安装docker-ce
- 4.3 配置Containerd所需的模块
- 4.4 加载模块:
- 4.5 配置Containerd所需的内核:
- 4.6 生成Containerd的配置文件:
- 5、安装Kubernetes组件 (所有节点)
- 6、集群初始化
- 6.1 创建kubeadm文件 (Master01节点)
- 6.2 将new.yaml文件复制到其他master节点 (Master01节点)
- 6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改)
- 6.4 初始化集群 (Master01节点)
- 6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点)
- 6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):
- 6.7 高可用Master (在master02和master03分别执行join命令)
- 6.8 工作节点的配置 (在node01和node02分别执行join命令)
- 7、Calico组件的安装
- 7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点)
- 7.2 安装Calico (只在master01执行,.x不需要更改)
- 8、Metrics部署 (master01节点)
- 9、Dashboard部署 (master01节点)
- 9.1 安装
- 9.2 登录dashboard
- 10、一些必须的配置更改 (master01节点)
- 11、k8s集群维护管理
- 11.1 节点下线
- 11.1.1 下线步骤
- 11.1.2 执行下线
- 11.2 添加节点
- 11.2.1 基本环境配置,新增节点更改主机名(具体查看第2.3章节):
- 11.2.2 内核配置(具体查看第2.4章节)
- 11.2.3 安装Containerd (具体查看第4章节)
- 11.2.4 安装Kubernetes组件(具体查看第5章节)
- 11.2.5 新增节点配置源(注意更改版本号):
- 11.3 集群升级
- 11.3.1 升级流程及注意事项
- 11.3.2 升级主节点:
- 11.3.3 升级其他主节点
- 11.3.4 升级工作节点
- 11.1 节点下线
- 附录:如何是真正生产可用的集群?
- 1、K8s企业级架构设计
1、K8s企业级架构设计
1.1 Kubernetes集群架构--高可用
1.2 K8s生产环境资源配置
工作节点数量 | 工作节点最低数量 | 工作节点配置 | 控制节点数量 | 控制节点配置 | Etcd节点配置 | Master&Etcd |
---|---|---|---|---|---|---|
0-100 | 3 | 8C32G/16C64G | 3 | / | / | 8C32G+128G SSD |
100-250 | 3 | 8C32G/16C64G | 3 | / | / | 16C32G+256G SSD |
250-500 | 3 | 8C32G/16C64G | 3 | 16C32G+ | 8C32G+512G SSD*5 | / |
1.3 K8s生产环境磁盘划分
节点 | 根分区(100G) | Etcd数据盘(100G NVME SSD) | 数据盘(500G SSD) |
---|---|---|---|
控制节点 | / | /var/lib/etcd | /data /var/lib/kubelet /var/lib/containers |
工作节点 | / | - | /data /var/lib/kubelet /var/lib/containers |
1.4 k8s集群网段划分
- 节点网段:192.168.181.0/24
- Service网段:10.96.0.0/16
- Pod网段:172.16.0.0/16
- Service保留IP:
- CoreDNS Service IP:10.96.0.10
- APIServer Service IP:10.96.0.1
2、基本环境配置
2.1 集群规划(学习测试环境)
主机名称 | 物理IP | 系统 | 资源配置 | 系统盘 | Etcd | 数据盘(生产可做成逻辑卷的形式) |
---|---|---|---|---|---|---|
k8s-master01 | 192.168.200.61 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-master02 | 192.168.200.62 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-master03 | 192.168.200.63 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-node01 | 192.168.200.64 | Rocky9.4 | 4C4G | 40G | / | 40G |
k8s-node02 | 192.168.200.65 | Rocky9.4 | 4C4G | 40G | / | 40G |
VIP | 192.168.200.100 | / | / | / | / | / |
2.2 磁盘挂载
[root@k8s-master01 ~]# fdisk -l|grep "Disk /dev/nvme0n"
Disk /dev/nvme0n1: 40 GiB, 42949672960 bytes, 83886080 sectors
Disk /dev/nvme0n2: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk /dev/nvme0n3: 40 GiB, 42949672960 bytes, 83886080 sectors# 创建etcd目录/数据目录
[root@k8s-master01 ~]# mkdir -p /var/lib/etcd /data# 创建分区
[root@k8s-master01 ~]# fdisk /dev/nvme0n2
[root@k8s-master01 ~]# fdisk /dev/nvme0n3# 格式化磁盘
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n2p1
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n3p1
# 查看uid
[root@k8s-master01 ~]# blkid /dev/nvme0n2p1
/dev/nvme0n2p1: UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" TYPE="xfs" PARTUUID="71ff24c2-01"
[root@k8s-master01 ~]# blkid /dev/nvme0n3p1
/dev/nvme0n3p1: UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" TYPE="xfs" PARTUUID="9ab6883d-01"# 设置为开机自动挂载
[root@k8s-master01 ~]# vim /etc/fstab
[root@k8s-master01 ~]# tail -2 /etc/fstab
UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" /var/lib/etcd xfs defaults 0 0
UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" /data xfs defaults 0 0
# 挂载磁盘
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# mount -a[root@k8s-master01 ~]# df -hT | grep /dev/nvme0n
/dev/nvme0n1p1 xfs 960M 330M 631M 35% /boot
/dev/nvme0n2p1 xfs 20G 175M 20G 1% /var/lib/etcd
/dev/nvme0n3p1 xfs 40G 318M 40G 1% /data
[root@k8s-master01 ~]# mkdir -p /data/kubelet /data/containers#设置软连接
[root@k8s-master01 ~]# ln -s /data/kubelet/ /var/lib/
[root@k8s-master01 ~]# ln -s /data/containers/ /var/lib/
2.3 基本环境配置
2.3.1 配置hosts (所有节点)
[root@k8s-master01 ~]# vim /etc/hosts
[root@k8s-master01 ~]# tail -5 /etc/hosts
192.168.200.61 k8s-master01
192.168.200.62 k8s-master02
192.168.200.63 k8s-master03
192.168.200.64 k8s-node01
192.168.200.65 k8s-node02
2.3.2 配置阿里云镜像源 (所有节点)
[root@k8s-master01 ~]# sed -e 's|^mirrorlist=|#mirrorlist=|g' -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' -i.bak /etc/yum.repos.d/*.repo[root@k8s-master01 ~]# dnf makecache
2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点)
# 关闭防火墙
[root@k8s-master01 ~]# systemctl disable --now firewalld
[root@k8s-master01 ~]# systemctl disable --now dnsmasq# 关闭selinux
[root@k8s-master01 ~]# setenforce 0
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config# 关闭swap分区
[root@k8s-master01 ~]# swapoff -a && sysctl -w vm.swappiness=0
[root@k8s-master01 ~]# sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
2.3.4 时间同步 (所有节点)
# 安装ntpdate
[root@k8s-master01 ~]# dnf install epel-release -y
[root@k8s-master01 ~]# dnf config-manager --set-enabled epel
[root@k8s-master01 ~]# dnf install ntpsec -y
# 同步时间并配置上海时区
[root@k8s-master01 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@k8s-master01 ~]# echo 'Asia/Shanghai' >/etc/timezone
[root@k8s-master01 ~]# ntpdate time2.aliyun.com
# 加入到crontab
[root@k8s-master01 ~]# crontab -e
[root@k8s-master01 ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
2.3.5 配置limit (所有节点)
[root@k8s-master01 ~]# ulimit -SHn 65535
[root@k8s-master01 ~]# vim /etc/security/limits.conf
[root@k8s-master01 ~]# tail -6 /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
2.3.6 升级系统 (所有节点)
[root@k8s-master01 ~]# yum update -y
2.3.7 配置免密钥登录 (Master01节点)
# 生成密匙
[root@k8s-master01 ~]# ssh-keygen -t rsa# 分发密匙
[root@k8s-master01 ~]# for i in k8s-master01 k8s-master02 k8s-master03 k8s-node01 k8s-node02;do ssh-copy-id -i .ssh/id_rsa.pub $i;done
2.4 内核配置 (所有节点)
2.4.1 安装ipvsadm
[root@k8s-master01 ~]# yum install ipvsadm ipset sysstat conntrack libseccomp -y
2.4.2 配置ipvs模块
# 配置ipvs模块:
[root@k8s-master01 ~]# modprobe -- ip_vs
[root@k8s-master01 ~]# modprobe -- ip_vs_rr
[root@k8s-master01 ~]# modprobe -- ip_vs_wrr
[root@k8s-master01 ~]# modprobe -- ip_vs_sh
[root@k8s-master01 ~]# modprobe -- nf_conntrack# 创建ipvs.conf,并配置开机自动加载
[root@k8s-master01 ~]# vim /etc/modules-load.d/ipvs.conf
[root@k8s-master01 ~]# cat /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
# 报错可忽略
[root@k8s-master01 ~]# systemctl enable --now systemd-modules-load.service
2.4.3 内核优化配置
cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
net.ipv4.conf.all.route_localnet = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF# 应用内核配置
[root@k8s-master01 ~]# sysctl --system
2.4.4 重启
[root@k8s-master01 ~]# reboot# 查看内核模块是否已自动加载
[root@k8s-master01 ~]# lsmod | grep --color=auto -e ip_vs -e nf_conntrack
ip_vs_ftp 12288 0
nf_nat 65536 1 ip_vs_ftp
ip_vs_sed 12288 0
ip_vs_nq 12288 0
ip_vs_fo 12288 0
ip_vs_sh 12288 0
ip_vs_dh 12288 0
ip_vs_lblcr 12288 0
ip_vs_lblc 12288 0
ip_vs_wrr 12288 0
ip_vs_rr 12288 0
ip_vs_wlc 12288 0
ip_vs_lc 12288 0
ip_vs 237568 25 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_fo,ip_vs_nq,ip_vs_lblc,ip_vs_wrr,ip_vs_lc,ip_vs_sed,ip_vs_ftp
nf_conntrack 229376 2 nf_nat,ip_vs
nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
nf_defrag_ipv4 12288 1 nf_conntrack
libcrc32c 12288 4 nf_conntrack,nf_nat,xfs,ip_vs
3、高可用组件安装 (Master节点)
公有云要用公有云自带的负载均衡,比如阿里云的SLB、NLB,腾讯云的ELB,用来替代haproxy和keepalived,因为公有云大部分都是不支持keepalived的。
3.1 安装HAProxy和KeepAlived
[root@k8s-master01 ~]# yum install keepalived haproxy -y
3.2 配置HAProxy
[root@k8s-master01 ~]# vim /etc/haproxy/haproxy.cfg
[root@k8s-master01 ~]# cat /etc/haproxy/haproxy.cfg
globalmaxconn 2000ulimit-n 16384log 127.0.0.1 local0 errstats timeout 30sdefaultslog globalmode httpoption httplogtimeout connect 5000timeout client 50000timeout server 50000timeout http-request 15stimeout http-keep-alive 15sfrontend monitor-inbind *:33305mode httpoption httplogmonitor-uri /monitorfrontend k8s-masterbind 0.0.0.0:16443bind 127.0.0.1:16443mode tcpoption tcplogtcp-request inspect-delay 5sdefault_backend k8s-masterbackend k8s-mastermode tcpoption tcplogoption tcp-checkbalance roundrobindefault-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100server k8s-master01 192.168.200.61:6443 checkserver k8s-master02 192.168.200.62:6443 checkserver k8s-master03 192.168.200.63:6443 check
3.3 配置KeepAlived
1、k8s-master01节点
[root@k8s-master01 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master01 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2
rise 1
}
vrrp_instance VI_1 {state MASTERinterface ens160mcast_src_ip 192.168.200.61virtual_router_id 51priority 101advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}
2、k8s-master02节点
[root@k8s-master02 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master02 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2
rise 1
}
vrrp_instance VI_1 {state BACKUPinterface ens160mcast_src_ip 192.168.200.62virtual_router_id 51priority 100advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}
3、k8s-master03节点
[root@k8s-master03 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master03 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2
rise 1
}
vrrp_instance VI_1 {state BACKUPinterface ens160mcast_src_ip 192.168.200.63virtual_router_id 51priority 100advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}
3.4 配置KeepAlived健康检查文件
# 配置脚本
[root@k8s-master01 ~]# vim /etc/keepalived/check_apiserver.sh
[root@k8s-master01 ~]# cat /etc/keepalived/check_apiserver.sh
#!/bin/bash
err=0
for k in $(seq 1 3)
docheck_code=$(pgrep haproxy)if [[ $check_code == "" ]]; thenerr=$(expr $err + 1)sleep 1continueelseerr=0breakfi
doneif [[ $err != "0" ]]; thenecho "systemctl stop keepalived"/usr/bin/systemctl stop keepalivedexit 1
elseexit 0
fi
# 添加权限:
[root@k8s-master01 ~]# chmod +x /etc/keepalived/check_apiserver.sh# 所有master节点启动haproxy和keepalived:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now haproxy
[root@k8s-master01 ~]# systemctl enable --now keepalived
3.5 测试keepalived的VIP是否是正常的
[root@k8s-master01 ~]# ping -c2 192.168.200.100
PING 192.168.200.100 (192.168.200.100) 56(84) bytes of data.
64 bytes from 192.168.200.100: icmp_seq=1 ttl=64 time=0.200 ms
64 bytes from 192.168.200.100: icmp_seq=2 ttl=64 time=0.072 ms--- 192.168.200.100 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1042ms
rtt min/avg/max/mdev = 0.072/0.136/0.200/0.064 ms[root@k8s-master01 ~]# echo|telnet 192.168.200.100 16443
Trying 192.168.200.100...
Connected to 192.168.200.100.
Escape character is '^]'.
Connection closed by foreign host.
- 如果ping不通且telnet不通,排查步骤:
- 确认VIP是否正确
- 所有节点查看防火墙状态必须为
disable
和inactive
:systemctl status firewalld
- 所有节点查看selinux状态,必须
为disable
:getenforce
- master节点查看haproxy和keepalived状态:
systemctl status keepalived haproxy
- master节点查看监听端口:
netstat -lntp
- 如果以上都没有问题,需要确认:
- 是否是公有云机器
- 是否是私有云机器(类似OpenStack)
上述公有云一般都是不支持keepalived,私有云可能也有限制,需要和自己的私有云管理员咨询
4、Runtime安装 (所有节点)
4.1 配置安装源
[root@k8s-master01 ~]# yum install wget jq psmisc vim net-tools telnet yum-utils device-mapper-persistent-data lvm2 git -y
[root@k8s-master01 ~]# yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
4.2 安装docker-ce
[root@k8s-master01 ~]# yum install docker-ce containerd -y
4.3 配置Containerd所需的模块
[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
4.4 加载模块:
[root@k8s-master01 ~]# modprobe -- overlay
[root@k8s-master01 ~]# modprobe -- br_netfilter
4.5 配置Containerd所需的内核:
[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF# 加载内核:
[root@k8s-master01 ~]# sysctl --system
4.6 生成Containerd的配置文件:
[root@k8s-master01 ~]# mkdir -p /etc/containerd
[root@k8s-master01 ~]# containerd config default | tee /etc/containerd/config.toml更改Containerd的Cgroup和Pause镜像配置:
[root@k8s-master01 ~]# sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#k8s.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.k8s.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
启动Containerd,并配置开机自启动:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now containerd
配置crictl客户端连接的运行时位置(可选):
[root@k8s-master01 ~]# cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
5、安装Kubernetes组件 (所有节点)
[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/repodata/repomd.xml.key
EOF
所有节点安装1.33最新版本kubeadm、kubelet和kubectl:
[root@k8s-master01 ~]# yum install kubeadm-1.33* kubelet-1.33* kubectl-1.33* -y
所有节点设置Kubelet开机自启动(由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需关心):
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now kubelet
6、集群初始化
6.1 创建kubeadm文件 (Master01节点)
[root@k8s-master01 ~]# vim kubeadm-config.yaml
[root@k8s-master01 ~]# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:- system:bootstrappers:kubeadm:default-node-tokentoken: 7t2weq.bjbawausm0jaxuryttl: 24h0m0susages:- signing- authentication
kind: InitConfiguration
localAPIEndpoint:advertiseAddress: 192.168.200.61bindPort: 6443
nodeRegistration:criSocket: unix:///var/run/containerd/containerd.sockimagePullPolicy: IfNotPresentimagePullSerial: truename: k8s-master01taints:- effect: NoSchedulekey: node-role.kubernetes.io/control-plane
timeouts:controlPlaneComponentHealthCheck: 4m0sdiscovery: 5m0setcdAPICall: 2m0skubeletHealthCheck: 4m0skubernetesAPICall: 1m0stlsBootstrap: 5m0supgradeManifests: 5m0s
---
apiServer:certSANs:- 192.168.200.100
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 876000h0m0s
certificateValidityPeriod: 876000h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.200.100:16443
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:local:dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.33.5
networking:dnsDomain: cluster.localpodSubnet: 172.16.0.0/16serviceSubnet: 10.96.0.0/16
proxy: {}
scheduler: {}
# 更新kubeadm文件:
[root@k8s-master01 ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml# 修改时间
[root@k8s-master01 ~]# vim new.yaml
[root@k8s-master01 ~]# sed -n "22,23p" new.yaml
timeouts:controlPlaneComponentHealthCheck: 4m0s
6.2 将new.yaml文件复制到其他master节点 (Master01节点)
[root@k8s-master01 ~]# for i in k8s-master02 k8s-master03; do scp new.yaml $i:/root/; done
6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改)
[root@k8s-master01 ~]# kubeadm config images pull --config /root/new.yaml
6.4 初始化集群 (Master01节点)
初始化以后会在/etc/kubernetes
目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可
[root@k8s-master01 ~]# kubeadm init --config /root/new.yaml --upload-certs
...
Your Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/You can now join any number of control-plane nodes running the following command on each as root:kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.Then you can join any number of worker nodes by running the following on each as root:kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d
6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点)
[root@k8s-master01 ~]# cat <<EOF >> /root/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF[root@k8s-master01 ~]# source /root/.bashrc# 显示NotReady不影响
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 106s v1.33.5
如果其它节点(包括非K8s节点)也想使用kubectl操作集群,请把admin.conf复制到其它节点,然后操作即可
6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):
kubeadm reset -f; ipvsadm --clear; rm -rf ~/.kube
如果多次尝试都是初始化失败,需要看系统日志,CentOS/RockyLinux日志路径:/var/log/messages,Ubuntu系列日志路径:/var/log/syslog:
tail -f /var/log/messages | grep -v "not found"
- 经常出错的原因:
- Containerd的配置文件修改的不对,自行参考《安装containerd》小节核对
- new.yaml配置问题,比如非高可用集群忘记修改16443端口为6443
- new.yaml配置问题,三个网段有交叉,出现IP地址冲突
- VIP不通导致无法初始化成功,此时messages日志会有VIP超时的报错
6.7 高可用Master (在master02和master03分别执行join命令)
添加其它Master节点到集群中,只需要执行如下的join命令即可。
注意:千万不要在master01再次执行,不能直接复制文档当中的命令,而是你自己刚才在master01初始化之后产生的命令
[root@k8s-master02 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87
查看当前状态:(如果显示NotReady不影响)
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 7m17s v1.33.5
k8s-master02 NotReady control-plane 78s v1.33.5
k8s-master03 NotReady control-plane 87s v1.33.5
6.8 工作节点的配置 (在node01和node02分别执行join命令)
工作节点上主要部署公司的一些业务应用,生产环境中不建议Master节点部署系统组件之外的其他Pod,测试环境可以允许Master节点部署Pod以节省系统资源。
[root@k8s-node01 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d
所有节点初始化完成后,查看集群状态(NotReady不影响)
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 10m v1.33.5
k8s-master02 NotReady control-plane 4m16s v1.33.5
k8s-master03 NotReady control-plane 4m25s v1.33.5
k8s-node01 NotReady <none> 71s v1.33.5
k8s-node02 NotReady <none> 57s v1.33.5
7、Calico组件的安装
7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点)
[root@k8s-master01 ~]# cat >>/etc/NetworkManager/conf.d/calico.conf<<EOF
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:vxlan-v6.calico;interface-name:wireguard.cali;interface-name:wg-v6.cali
EOF[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart NetworkManager
7.2 安装Calico (只在master01执行,.x不需要更改)
[root@k8s-master01 ~]# cd /root/;git clone https://gitee.com/dukuan/k8s-ha-install.git
[root@k8s-master01 ~]# cd /root/k8s-ha-install && git checkout manual-installation-v1.33.x && cd calico/
修改Pod网段:
[root@k8s-master01 calico]# POD_SUBNET=`cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`
替换Calico配置文件并安装:
[root@k8s-master01 calico]# sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml
[root@k8s-master01 calico]# kubectl apply -f calico.yaml
此时节点全部变为Ready状态:
[root@k8s-master01 calico]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 35m v1.33.5
k8s-master02 Ready control-plane 29m v1.33.5
k8s-master03 Ready control-plane 29m v1.33.5
k8s-node01 Ready <none> 26m v1.33.5
k8s-node02 Ready <none> 26m v1.33.5
查看容器和节点状态:
[root@k8s-master01 calico]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-8678987965-4j5bp 1/1 Running 0 16m
calico-node-92bnb 1/1 Running 0 16m
calico-node-9gqpm 1/1 Running 0 16m
calico-node-gdz59 1/1 Running 0 16m
calico-node-hrfkr 1/1 Running 0 16m
calico-node-tdgh8 1/1 Running 0 16m
coredns-746c97786-gz7hp 1/1 Running 0 45m
coredns-746c97786-sf7mw 1/1 Running 0 45m
etcd-k8s-master01 1/1 Running 0 45m
etcd-k8s-master02 1/1 Running 0 39m
etcd-k8s-master03 1/1 Running 0 39m
....
8、Metrics部署 (master01节点)
在新版的Kubernetes中系统资源的采集均使用Metrics-server,可以通过Metrics采集节点和Pod的内存、磁盘、CPU和网络的使用率。
(将Master01节点的front-proxy-ca.crt复制到所有Node节点)
[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node01:/etc/kubernetes/pki/front-proxy-ca.crt
[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node02:/etc/kubernetes/pki/front-proxy-ca.crt
安装metrics server:
[root@k8s-master01 calico]# cd /root/k8s-ha-install/kubeadm-metrics-server
[root@k8s-master01 kubeadm-metrics-server]# kubectl create -f comp.yaml
查看状态:
[root@k8s-master01 kubeadm-metrics-server]# kubectl get po -n kube-system -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
metrics-server-7d9d8df576-zzq9j 1/1 Running 0 57s
等Pod变成1/1 Running后,查看节点和Pod资源使用率:
[root@k8s-master01 kubeadm-metrics-server]# kubectl top node
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
k8s-master01 727m 18% 977Mi 27%
k8s-master02 636m 15% 945Mi 26%
k8s-master03 626m 15% 916Mi 26%
k8s-node01 300m 7% 550Mi 15%
k8s-node02 331m 8% 433Mi 12%
9、Dashboard部署 (master01节点)
9.1 安装
Dashboard用于展示集群中的各类资源,同时也可以通过Dashboard实时查看Pod的日志和在容器中执行一些命令等。接下来安装Dashboard:
[root@k8s-master01 kubeadm-metrics-server]# cd /root/k8s-ha-install/dashboard/
[root@k8s-master01 dashboard]# kubectl create -f .
9.2 登录dashboard
更改dashboard的svc为NodePort:
[root@k8s-master01 dashboard]# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
查看端口号:
[root@k8s-master01 dashboard]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.96.128.108 <none> 8000/TCP 9m40s
kubernetes-dashboard NodePort 10.96.119.44 <none> 443:32506/TCP 9m42s
根据自己的实例端口号,通过任意安装了kube-proxy的宿主机的IP+端口即可访问到dashboard:访问Dashboard:https://192.168.200.61:32506 (把IP地址和端口改成你自己的)选择登录方式为令牌(即token方式)
创建登录Token:
[root@k8s-master01 dashboard]# kubectl create token admin-user -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6IjFsVVlxQWhNZ2RWVlRXRWNLX2VjZmNJZlhUbDNMazM0bzR3bWNMcmhoNkEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzYwMjczMTYxLCJpYXQiOjE3NjAyNjk1NjEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiMmVmYjhjZWYtMzg2Ni00ZGJhLWEzM2MtNGY4OWE2Mjk2MGFmIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZGY2ZWIwNTMtYzA3Ni00MDFjLWE0N2MtNjI5MTZiNjNkOTgyIn19LCJuYmYiOjE3NjAyNjk1NjEsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTphZG1pbi11c2VyIn0.byzK70mYJaoCJL3sh1sxaVUi88Q24MPWkEe4PQ2yHIKRbuYPJh8PHkyXmmdRL6VVJd7k_927P5VJp_2e9ScMOcyqADSu44CkVwHGBI9C66hvJpTnAm4XwUmTrotc-5lDebTrbjLgUe7eD54CpIbng7FM0eg98lcyv4o-6Zto-cjMG_92s_oCC1W9DMVvPctd8_q3wZmY2v6hx8vFd95wbRrr4JxTtZlPoKAyisbUSATw2MUG8at82QpZzNoXIJGQf0DXEJxOxbU_DCJ6xemB8urgfHWT4L0tu1v35nL6_uaRXEKCfxQxrLzstfl_TwQzN09AE66-kNAv6VUUzDqP3Q
将token值输入到令牌后,单击登录即可访问Dashboard
10、一些必须的配置更改 (master01节点)
将Kube-proxy改为ipvs模式,因为在初始化集群的时候注释了ipvs配置,所以需要自行修改一下:
[root@k8s-master01 dashboard]# kubectl edit cm kube-proxy -n kube-systemmode: ipvs
更新Kube-Proxy的Pod:
[root@k8s-master01 dashboard]# kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
验证Kube-Proxy模式
[root@k8s-master01 dashboard]# curl 127.0.0.1:10249/proxyMode
ipvs
11、k8s集群维护管理
11.1 节点下线
11.1.1 下线步骤
如果某个节点需要下线,可以使用如下步骤平滑下线:
- 1、添加污点禁止调度
- 2、查询节点是否有重要服务
- 漂移重要服务至其它节点
- 3、确认是否是ingress入口
- 端口流量
- 4、使用drain设置为驱逐状态
- 5、再次检查节点上的其它服务
- 基础组件等
- 6、查看有无异常的Pod
- 有无Pending的Pod
- 非Running状态
- 7、使用delete删除节点
- 8、节点下线
- kubeadm reset -f
- systemctl disable --now kubelet
11.1.2 执行下线
假设k8s-node02
为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:
[root@k8s-master01 ~]# kubectl taint node k8s-node02 offline=true:NoSchedule
查看是否有重要服务:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-gdz59 1/1 Running 2 (61m ago) 6d17h 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-d6pp4 1/1 Running 2 (61m ago) 6d16h 192.168.200.65 k8s-node02 <none> <none>
kube-system metrics-server-7d9d8df576-zzq9j 1/1 Running 6 (59m ago) 6d17h 172.16.58.199 k8s-node02 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-69b4796d9b-dmqd9 1/1 Running 2 (61m ago) 6d16h 172.16.58.201 k8s-node02 <none> <none>
kubernetes-dashboard kubernetes-dashboard-778584b9dd-kg2hc 1/1 Running 3 (61m ago) 6d16h 172.16.58.200 k8s-node02 <none> <none>
假设dashboard-metrics-scraper
、kubernetes-dashboard
、metrics-server
为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):
[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-gdz59 1/1 Running 2 (72m ago) 6d17h 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-d6pp4 1/1 Running 2 (72m ago) 6d16h 192.168.200.65 k8s-node02 <none> <none>
其它检查按需执行:
# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node02 --ignore-daemonsets# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 接下来删除节点即可:
[root@k8s-master01 ~]# kubectl delete node k8s-node02# 节点下线,根据需要处理节点即可:
[root@k8s-node02 ~]# kubeadm reset -f
[root@k8s-node02 ~]# systemctl disable --now kubelet# 查看当前节点:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d18h v1.33.5
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
11.2 添加节点
11.2.1 基本环境配置,新增节点更改主机名(具体查看第2.3章节):
11.2.2 内核配置(具体查看第2.4章节)
11.2.3 安装Containerd (具体查看第4章节)
11.2.4 安装Kubernetes组件(具体查看第5章节)
11.2.5 新增节点配置源(注意更改版本号):
# Master节点拷贝front-proxy证书:
[root@k8s-node02 ~]# mkdir -p /etc/kubernetes/pki/
[root@k8s-master01 ~]# scp /etc/kubernetes/pki/front-proxy-ca.crt 192.168.200.65:/etc/kubernetes/pki/
# Master节点生成新的token:
[root@k8s-master01 ~]# kubeadm token create --print-join-command# 新增节点执行join命令:
[root@k8s-node02 ~]# kubeadm join 192.168.200.100:16443 --token gtuckg.ahx37p3zq54jrgy3 --discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d# Master节点查看节点状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d18h v1.33.5
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
k8s-node02 Ready <none> 27s v1.33.5# Master节点查看Pod状态是否正常:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-lw4s5 1/1 Running 0 2m36s 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-kd6sf 1/1 Running 0 2m36s 192.168.200.65 k8s-node02 <none> <none>
11.3 集群升级
11.3.1 升级流程及注意事项
官方文档:
- 升级流程:
- 升级Master节点
- 维护工作节点
- 升级工作节点
- 注意事项:
- kubeadm不可以跨版本升级
- 有条件先备份后升级
- 关闭swap
# 假设需要升级到1.34,需要先配置1.34的源(所有节点):
[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/repodata/repomd.xml.key
EOF
11.3.2 升级主节点:
# 升级主节点需要挨个升级,首先升级Master01节点:
[root@k8s-master01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 查看版本:
[root@k8s-master01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"34", EmulationMajor:"", EmulationMinor:"", MinCompatibilityMajor:"", MinCompatibilityMinor:"", GitVersion:"v1.34.1", GitCommit:"93248f9ae092f571eb870b7664c534bfc7d00f03", GitTreeState:"clean", BuildDate:"2025-09-09T19:43:15Z", GoVersion:"go1.24.6", Compiler:"gc", Platform:"linux/amd64"}[root@k8s-master01 ~]# kubectl version
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.33.5
# 验证升级计划:
[root@k8s-master01 ~]# kubeadm upgrade plan
....
Upgrade to the latest stable version:COMPONENT NODE CURRENT TARGET
kube-apiserver k8s-master01 v1.33.5 v1.34.1
kube-apiserver k8s-master02 v1.33.5 v1.34.1
kube-apiserver k8s-master03 v1.33.5 v1.34.1
kube-controller-manager k8s-master01 v1.33.5 v1.34.1
kube-controller-manager k8s-master02 v1.33.5 v1.34.1
kube-controller-manager k8s-master03 v1.33.5 v1.34.1
kube-scheduler k8s-master01 v1.33.5 v1.34.1
kube-scheduler k8s-master02 v1.33.5 v1.34.1
kube-scheduler k8s-master03 v1.33.5 v1.34.1
kube-proxy 1.33.5 v1.34.1
CoreDNS v1.12.0 v1.12.1
etcd k8s-master01 3.5.21-0 3.6.4-0
etcd k8s-master02 3.5.21-0 3.6.4-0
etcd k8s-master03 3.5.21-0 3.6.4-0You can now apply the upgrade by executing the following command:kubeadm upgrade apply v1.34.1_____________________________________________________________________# 执行升级:
[root@k8s-master01 ~]# kubeadm upgrade apply v1.34.1
# 重启kubelet:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart kubelet# 确认版本:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
k8s-node02 Ready <none> 30m v1.33.5[root@k8s-master01 ~]# grep "image:" /etc/kubernetes/manifests/*.yaml
/etc/kubernetes/manifests/etcd.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.6.4-0
/etc/kubernetes/manifests/kube-apiserver.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.34.1
/etc/kubernetes/manifests/kube-controller-manager.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.34.1
/etc/kubernetes/manifests/kube-scheduler.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.34.1
11.3.3 升级其他主节点
# 接下来升级其他主节点,首先安装组件:
[root@k8s-master02 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 升级其他主节点:
[root@k8s-master02 ~]# kubeadm upgrade node# 重启kubelet:
[root@k8s-master02 ~]# systemctl daemon-reload
[root@k8s-master02 ~]# systemctl restart kubelet# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.33.5
k8s-node02 Ready <none> 40m v1.33.5
11.3.4 升级工作节点
升级工作节点比较简单,只需要安装kubelet,然后重启即可,但是需要注意提前把节点设置为维护状态(和下线步骤类似,测试环境可以直接重启无法设置为维护)。
假设k8s-node01
为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:
[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade=true:NoSchedule
查看是否有重要服务:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system calico-kube-controllers-8678987965-4j5bp 1/1 Running 2 (7m41s ago) 6d18h 172.16.85.200 k8s-node01 <none> <none>
kube-system calico-node-tdgh8 1/1 Running 1 (70m ago) 6d18h 192.168.200.64 k8s-node01 <none> <none>
kube-system kube-proxy-278gx 1/1 Running 0 4m17s 192.168.200.64 k8s-node01 <none> <none>
kube-system metrics-server-74767fc66c-lv5w7 1/1 Running 0 63m 172.16.85.208 k8s-node01 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-5b47ccc9c7-45lds 1/1 Running 0 64m 172.16.85.204 k8s-node01 <none> <none>
kubernetes-dashboard kubernetes-dashboard-65fd974fd6-gfgpq 1/1 Running 0 62m 172.16.85.209 k8s-node01 <none> <none>
假设dashboard-metrics-scraper
、kubernetes-dashboard
、metrics-server
为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):
[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system calico-kube-controllers-8678987965-4j5bp 1/1 Running 2 (10m ago) 6d18h 172.16.85.200 k8s-node01 <none> <none>
kube-system calico-node-tdgh8 1/1 Running 1 (73m ago) 6d18h 192.168.200.64 k8s-node01 <none> <none>
kube-system kube-proxy-278gx 1/1 Running 0 7m22s 192.168.200.64 k8s-node01 <none> <none>
其它检查按需执行:
# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node01 --ignore-daemonsets# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 升级工作节点
[root@k8s-node01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 重启kubelet:
[root@k8s-node01 ~]# systemctl daemon-reload
[root@k8s-node01 ~]# systemctl restart kubelet# 升级后,去掉drain和污点:
[root@k8s-master01 ~]# kubectl uncordon k8s-node01
[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade-# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.34.1
k8s-node02 Ready <none> 65m v1.34.1# 其它节点同样操作步骤
附录:如何是真正生产可用的集群?
1、节点均正常(节点的状态全是Ready)
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.34.1
k8s-node02 Ready <none> 65m v1.34.1
2、Pod 均正常(Pod的状态全是Running,READY前后的数字都是一致的,RESTARTS(重启)的次数没有增加)
[root@k8s-master01 ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
....
kube-system etcd-k8s-master01 1/1 Running 2 (23m ago) 6d17h
kube-system etcd-k8s-master02 1/1 Running 2 (23m ago) 6d17h
kube-system etcd-k8s-master03 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master01 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master02 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master03 1/1 Running 2 (23m ago) 6d17h
kube-system metrics-server-7d9d8df576-zzq9j 1/1 Running 6 (20m ago) 6d16h
....
3、集群网段无任何冲突(svc网段10.96,node网段192.168,pod网段172.16)
[root@k8s-master01 ~]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d17h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d17h
kube-system metrics-server ClusterIP 10.96.87.203 <none> 443/TCP 6d16h
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.96.128.108 <none> 8000/TCP 6d16h
kubernetes-dashboard kubernetes-dashboard NodePort 10.96.119.44 <none> 443:32506/TCP 6d16h[root@k8s-master01 ~]# kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready control-plane 6d17h v1.33.5 192.168.200.61 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-master02 Ready control-plane 6d17h v1.33.5 192.168.200.62 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-master03 Ready control-plane 6d17h v1.33.5 192.168.200.63 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-node01 Ready <none> 6d17h v1.33.5 192.168.200.64 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-node02 Ready <none> 6d17h v1.33.5 192.168.200.65 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28[root@k8s-master01 ~]# kubectl get po -A -owide | grep coredns
kube-system coredns-746c97786-gz7hp 1/1 Running 2 (30m ago) 6d17h 172.16.85.201 k8s-node01 <none> <none>
kube-system coredns-746c97786-sf7mw 1/1 Running 2 (30m ago) 6d17h 172.16.85.200 k8s-node01 <none> <none>
4、能够正常创建资源
kubectl create deploy cluster-test --image=registry.cn-beijing.aliyuncs.com/dotbalo/debug-tools -- sleep 3600
5、Pod 必须能够解析 Service(同 namespace 和跨 namespace)
a) nslookup kubernetes
b) nslookup kube-dns.kube-system
6、每个节点都必须要能访问 Kubernetes 的 kubernetes svc 443 和 kube-dns 的 service 53
[root@k8s-master02 ~]# curl https://10.96.0.1:443 -k
{"kind": "Status","apiVersion": "v1","metadata": {},"status": "Failure","message": "forbidden: User \"system:anonymous\" cannot get path \"/\"","reason": "Forbidden","details": {},"code": 403
}[root@k8s-node02 ~]# curl http://10.96.0.10:53 -k
curl: (52) Empty reply from server
7、Pod 和 Pod 之间要能够正常通讯(同 namespace 和跨 namespace)
8、Pod 和 Pod 之间要能够正常通讯(同机器和跨机器)
此博客来源于:https://edu.51cto.com/lecturer/11062970.html