二十三、K8s企业级架构设计及落地

news/2025/10/22 15:16:30/文章来源:https://www.cnblogs.com/ywb123/p/19158101

二十三、K8s企业级架构设计及落地

目录
  • 二十三、K8s企业级架构设计及落地
    • 1、K8s企业级架构设计
      • 1.1 Kubernetes集群架构--高可用
      • 1.2 K8s生产环境资源配置
      • 1.3 K8s生产环境磁盘划分
      • 1.4 k8s集群网段划分
    • 2、基本环境配置
      • 2.1 集群规划(学习测试环境)
      • 2.2 磁盘挂载
      • 2.3 基本环境配置
        • 2.3.1 配置hosts (所有节点)
        • 2.3.2 配置阿里云镜像源 (所有节点)
        • 2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点)
        • 2.3.4 时间同步 (所有节点)
        • 2.3.5 配置limit (所有节点)
        • 2.3.6 升级系统 (所有节点)
        • 2.3.7 配置免密钥登录 (Master01节点)
      • 2.4 内核配置 (所有节点)
        • 2.4.1 安装ipvsadm
        • 2.4.2 配置ipvs模块
        • 2.4.3 内核优化配置
        • 2.4.4 重启
    • 3、高可用组件安装 (Master节点)
      • 3.1 安装HAProxy和KeepAlived
      • 3.2 配置HAProxy
      • 3.3 配置KeepAlived
      • 3.4 配置KeepAlived健康检查文件
      • 3.5 测试keepalived的VIP是否是正常的
    • 4、Runtime安装 (所有节点)
      • 4.1 配置安装源
      • 4.2 安装docker-ce
      • 4.3 配置Containerd所需的模块
      • 4.4 加载模块:
      • 4.5 配置Containerd所需的内核:
      • 4.6 生成Containerd的配置文件:
    • 5、安装Kubernetes组件 (所有节点)
    • 6、集群初始化
      • 6.1 创建kubeadm文件 (Master01节点)
      • 6.2 将new.yaml文件复制到其他master节点 (Master01节点)
      • 6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改)
      • 6.4 初始化集群 (Master01节点)
      • 6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点)
      • 6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):
      • 6.7 高可用Master (在master02和master03分别执行join命令)
      • 6.8 工作节点的配置 (在node01和node02分别执行join命令)
    • 7、Calico组件的安装
      • 7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点)
      • 7.2 安装Calico (只在master01执行,.x不需要更改)
    • 8、Metrics部署 (master01节点)
    • 9、Dashboard部署 (master01节点)
      • 9.1 安装
      • 9.2 登录dashboard
    • 10、一些必须的配置更改 (master01节点)
    • 11、k8s集群维护管理
      • 11.1 节点下线
        • 11.1.1 下线步骤
        • 11.1.2 执行下线
      • 11.2 添加节点
        • 11.2.1 基本环境配置,新增节点更改主机名(具体查看第2.3章节)
        • 11.2.2 内核配置(具体查看第2.4章节)
        • 11.2.3 安装Containerd (具体查看第4章节)
        • 11.2.4 安装Kubernetes组件(具体查看第5章节)
        • 11.2.5 新增节点配置源(注意更改版本号):
      • 11.3 集群升级
        • 11.3.1 升级流程及注意事项
        • 11.3.2 升级主节点:
        • 11.3.3 升级其他主节点
        • 11.3.4 升级工作节点
    • 附录:如何是真正生产可用的集群?

1、K8s企业级架构设计

1.1 Kubernetes集群架构--高可用

image.png-407.2kB

1.2 K8s生产环境资源配置

工作节点数量 工作节点最低数量 工作节点配置 控制节点数量 控制节点配置 Etcd节点配置 Master&Etcd
0-100 3 8C32G/16C64G 3 / / 8C32G+128G SSD
100-250 3 8C32G/16C64G 3 / / 16C32G+256G SSD
250-500 3 8C32G/16C64G 3 16C32G+ 8C32G+512G SSD*5 /

1.3 K8s生产环境磁盘划分

节点 根分区(100G) Etcd数据盘(100G NVME SSD) 数据盘(500G SSD)
控制节点 / /var/lib/etcd /data /var/lib/kubelet /var/lib/containers
工作节点 / - /data /var/lib/kubelet /var/lib/containers

1.4 k8s集群网段划分

  • 节点网段:192.168.181.0/24
  • Service网段:10.96.0.0/16
  • Pod网段:172.16.0.0/16
  • Service保留IP:
    • CoreDNS Service IP:10.96.0.10
    • APIServer Service IP:10.96.0.1

2、基本环境配置

2.1 集群规划(学习测试环境)

主机名称 物理IP 系统 资源配置 系统盘 Etcd 数据盘(生产可做成逻辑卷的形式)
k8s-master01 192.168.200.61 Rocky9.4 4C4G 40G 20G 40G
k8s-master02 192.168.200.62 Rocky9.4 4C4G 40G 20G 40G
k8s-master03 192.168.200.63 Rocky9.4 4C4G 40G 20G 40G
k8s-node01 192.168.200.64 Rocky9.4 4C4G 40G / 40G
k8s-node02 192.168.200.65 Rocky9.4 4C4G 40G / 40G
VIP 192.168.200.100 / / / / /

2.2 磁盘挂载

[root@k8s-master01 ~]# fdisk -l|grep "Disk /dev/nvme0n"
Disk /dev/nvme0n1: 40 GiB, 42949672960 bytes, 83886080 sectors
Disk /dev/nvme0n2: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk /dev/nvme0n3: 40 GiB, 42949672960 bytes, 83886080 sectors# 创建etcd目录/数据目录
[root@k8s-master01 ~]# mkdir -p /var/lib/etcd /data# 创建分区
[root@k8s-master01 ~]# fdisk /dev/nvme0n2 
[root@k8s-master01 ~]# fdisk /dev/nvme0n3# 格式化磁盘
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n2p1
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n3p1
# 查看uid
[root@k8s-master01 ~]# blkid /dev/nvme0n2p1 
/dev/nvme0n2p1: UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" TYPE="xfs" PARTUUID="71ff24c2-01"
[root@k8s-master01 ~]# blkid /dev/nvme0n3p1
/dev/nvme0n3p1: UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" TYPE="xfs" PARTUUID="9ab6883d-01"# 设置为开机自动挂载
[root@k8s-master01 ~]# vim /etc/fstab 
[root@k8s-master01 ~]# tail -2 /etc/fstab 
UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" /var/lib/etcd xfs defaults 0 0
UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" /data xfs defaults 0 0
# 挂载磁盘
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# mount -a[root@k8s-master01 ~]# df -hT | grep /dev/nvme0n
/dev/nvme0n1p1      xfs       960M  330M  631M  35% /boot
/dev/nvme0n2p1      xfs        20G  175M   20G   1% /var/lib/etcd
/dev/nvme0n3p1      xfs        40G  318M   40G   1% /data
[root@k8s-master01 ~]# mkdir -p /data/kubelet /data/containers#设置软连接
[root@k8s-master01 ~]# ln -s /data/kubelet/ /var/lib/
[root@k8s-master01 ~]# ln -s /data/containers/ /var/lib/

2.3 基本环境配置

2.3.1 配置hosts (所有节点)

[root@k8s-master01 ~]# vim /etc/hosts 
[root@k8s-master01 ~]# tail -5 /etc/hosts
192.168.200.61 k8s-master01
192.168.200.62 k8s-master02
192.168.200.63 k8s-master03
192.168.200.64 k8s-node01
192.168.200.65 k8s-node02

2.3.2 配置阿里云镜像源 (所有节点)

[root@k8s-master01 ~]# sed -e 's|^mirrorlist=|#mirrorlist=|g' -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g'  -i.bak /etc/yum.repos.d/*.repo[root@k8s-master01 ~]# dnf makecache

2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点)

# 关闭防火墙
[root@k8s-master01 ~]# systemctl disable --now firewalld
[root@k8s-master01 ~]# systemctl disable --now dnsmasq# 关闭selinux
[root@k8s-master01 ~]# setenforce 0
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config# 关闭swap分区
[root@k8s-master01 ~]# swapoff -a && sysctl -w vm.swappiness=0
[root@k8s-master01 ~]# sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab

2.3.4 时间同步 (所有节点)

# 安装ntpdate
[root@k8s-master01 ~]# dnf install epel-release -y
[root@k8s-master01 ~]# dnf config-manager --set-enabled epel
[root@k8s-master01 ~]# dnf install ntpsec -y
# 同步时间并配置上海时区
[root@k8s-master01 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@k8s-master01 ~]# echo 'Asia/Shanghai' >/etc/timezone
[root@k8s-master01 ~]# ntpdate time2.aliyun.com
# 加入到crontab
[root@k8s-master01 ~]# crontab -e
[root@k8s-master01 ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com

2.3.5 配置limit (所有节点)

[root@k8s-master01 ~]# ulimit -SHn 65535
[root@k8s-master01 ~]# vim /etc/security/limits.conf
[root@k8s-master01 ~]# tail -6 /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited

2.3.6 升级系统 (所有节点)

[root@k8s-master01 ~]# yum update -y

2.3.7 配置免密钥登录 (Master01节点)

# 生成密匙
[root@k8s-master01 ~]# ssh-keygen -t rsa# 分发密匙
[root@k8s-master01 ~]# for i in k8s-master01 k8s-master02 k8s-master03 k8s-node01 k8s-node02;do ssh-copy-id -i .ssh/id_rsa.pub $i;done

2.4 内核配置 (所有节点)

2.4.1 安装ipvsadm

[root@k8s-master01 ~]# yum install ipvsadm ipset sysstat conntrack libseccomp -y

2.4.2 配置ipvs模块

# 配置ipvs模块:
[root@k8s-master01 ~]# modprobe -- ip_vs
[root@k8s-master01 ~]# modprobe -- ip_vs_rr
[root@k8s-master01 ~]# modprobe -- ip_vs_wrr
[root@k8s-master01 ~]# modprobe -- ip_vs_sh
[root@k8s-master01 ~]# modprobe -- nf_conntrack# 创建ipvs.conf,并配置开机自动加载
[root@k8s-master01 ~]# vim /etc/modules-load.d/ipvs.conf
[root@k8s-master01 ~]# cat /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
# 报错可忽略
[root@k8s-master01 ~]# systemctl enable --now systemd-modules-load.service

2.4.3 内核优化配置

cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
net.ipv4.conf.all.route_localnet = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF# 应用内核配置
[root@k8s-master01 ~]# sysctl --system

2.4.4 重启

[root@k8s-master01 ~]# reboot# 查看内核模块是否已自动加载
[root@k8s-master01 ~]# lsmod | grep --color=auto -e ip_vs -e nf_conntrack
ip_vs_ftp              12288  0
nf_nat                 65536  1 ip_vs_ftp
ip_vs_sed              12288  0
ip_vs_nq               12288  0
ip_vs_fo               12288  0
ip_vs_sh               12288  0
ip_vs_dh               12288  0
ip_vs_lblcr            12288  0
ip_vs_lblc             12288  0
ip_vs_wrr              12288  0
ip_vs_rr               12288  0
ip_vs_wlc              12288  0
ip_vs_lc               12288  0
ip_vs                 237568  25 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_fo,ip_vs_nq,ip_vs_lblc,ip_vs_wrr,ip_vs_lc,ip_vs_sed,ip_vs_ftp
nf_conntrack          229376  2 nf_nat,ip_vs
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
nf_defrag_ipv4         12288  1 nf_conntrack
libcrc32c              12288  4 nf_conntrack,nf_nat,xfs,ip_vs

3、高可用组件安装 (Master节点)

公有云要用公有云自带的负载均衡,比如阿里云的SLB、NLB,腾讯云的ELB,用来替代haproxy和keepalived,因为公有云大部分都是不支持keepalived的。

3.1 安装HAProxy和KeepAlived

[root@k8s-master01 ~]# yum install keepalived haproxy -y

3.2 配置HAProxy

[root@k8s-master01 ~]# vim /etc/haproxy/haproxy.cfg
[root@k8s-master01 ~]# cat /etc/haproxy/haproxy.cfg
globalmaxconn  2000ulimit-n  16384log  127.0.0.1 local0 errstats timeout 30sdefaultslog globalmode  httpoption  httplogtimeout connect 5000timeout client  50000timeout server  50000timeout http-request 15stimeout http-keep-alive 15sfrontend monitor-inbind *:33305mode httpoption httplogmonitor-uri /monitorfrontend k8s-masterbind 0.0.0.0:16443bind 127.0.0.1:16443mode tcpoption tcplogtcp-request inspect-delay 5sdefault_backend k8s-masterbackend k8s-mastermode tcpoption tcplogoption tcp-checkbalance roundrobindefault-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100server k8s-master01	192.168.200.61:6443  checkserver k8s-master02	192.168.200.62:6443  checkserver k8s-master03	192.168.200.63:6443  check

3.3 配置KeepAlived

1、k8s-master01节点

[root@k8s-master01 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master01 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2  
rise 1
}
vrrp_instance VI_1 {state MASTERinterface ens160mcast_src_ip 192.168.200.61virtual_router_id 51priority 101advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}

2、k8s-master02节点

[root@k8s-master02 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master02 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2  
rise 1
}
vrrp_instance VI_1 {state BACKUPinterface ens160mcast_src_ip 192.168.200.62virtual_router_id 51priority 100advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}

3、k8s-master03节点

[root@k8s-master03 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master03 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {router_id LVS_DEVEL
script_user rootenable_script_security
}
vrrp_script chk_apiserver {script "/etc/keepalived/check_apiserver.sh"interval 5weight -5fall 2  
rise 1
}
vrrp_instance VI_1 {state BACKUPinterface ens160mcast_src_ip 192.168.200.63virtual_router_id 51priority 100advert_int 2authentication {auth_type PASSauth_pass K8SHA_KA_AUTH}virtual_ipaddress {192.168.200.100}track_script {chk_apiserver}
}

3.4 配置KeepAlived健康检查文件

# 配置脚本
[root@k8s-master01 ~]# vim /etc/keepalived/check_apiserver.sh 
[root@k8s-master01 ~]# cat /etc/keepalived/check_apiserver.sh 
#!/bin/bash
err=0
for k in $(seq 1 3)
docheck_code=$(pgrep haproxy)if [[ $check_code == "" ]]; thenerr=$(expr $err + 1)sleep 1continueelseerr=0breakfi
doneif [[ $err != "0" ]]; thenecho "systemctl stop keepalived"/usr/bin/systemctl stop keepalivedexit 1
elseexit 0
fi
# 添加权限:
[root@k8s-master01 ~]# chmod +x /etc/keepalived/check_apiserver.sh# 所有master节点启动haproxy和keepalived:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now haproxy
[root@k8s-master01 ~]# systemctl enable --now keepalived

3.5 测试keepalived的VIP是否是正常的

[root@k8s-master01 ~]# ping -c2 192.168.200.100
PING 192.168.200.100 (192.168.200.100) 56(84) bytes of data.
64 bytes from 192.168.200.100: icmp_seq=1 ttl=64 time=0.200 ms
64 bytes from 192.168.200.100: icmp_seq=2 ttl=64 time=0.072 ms--- 192.168.200.100 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1042ms
rtt min/avg/max/mdev = 0.072/0.136/0.200/0.064 ms[root@k8s-master01 ~]# echo|telnet 192.168.200.100 16443
Trying 192.168.200.100...
Connected to 192.168.200.100.
Escape character is '^]'.
Connection closed by foreign host.
  • 如果ping不通且telnet不通,排查步骤:
  1. 确认VIP是否正确
  2. 所有节点查看防火墙状态必须为disableinactivesystemctl status firewalld
  3. 所有节点查看selinux状态,必须为disablegetenforce
  4. master节点查看haproxy和keepalived状态:systemctl status keepalived haproxy
  5. master节点查看监听端口:netstat -lntp
  • 如果以上都没有问题,需要确认:
  1. 是否是公有云机器
  2. 是否是私有云机器(类似OpenStack)

上述公有云一般都是不支持keepalived,私有云可能也有限制,需要和自己的私有云管理员咨询

4、Runtime安装 (所有节点)

4.1 配置安装源

[root@k8s-master01 ~]# yum install wget jq psmisc vim net-tools telnet yum-utils device-mapper-persistent-data lvm2 git -y
[root@k8s-master01 ~]# yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

4.2 安装docker-ce

[root@k8s-master01 ~]# yum install docker-ce containerd -y

4.3 配置Containerd所需的模块

[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

4.4 加载模块:

[root@k8s-master01 ~]# modprobe -- overlay
[root@k8s-master01 ~]# modprobe -- br_netfilter

4.5 配置Containerd所需的内核:

[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF# 加载内核:
[root@k8s-master01 ~]# sysctl --system

4.6 生成Containerd的配置文件:

[root@k8s-master01 ~]# mkdir -p /etc/containerd
[root@k8s-master01 ~]# containerd config default | tee /etc/containerd/config.toml更改Containerd的Cgroup和Pause镜像配置:
[root@k8s-master01 ~]# sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#k8s.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.k8s.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml

启动Containerd,并配置开机自启动:

[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now containerd

配置crictl客户端连接的运行时位置(可选):

[root@k8s-master01 ~]# cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

5、安装Kubernetes组件 (所有节点)

[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/repodata/repomd.xml.key
EOF

所有节点安装1.33最新版本kubeadm、kubelet和kubectl:

[root@k8s-master01 ~]# yum install kubeadm-1.33* kubelet-1.33* kubectl-1.33* -y

所有节点设置Kubelet开机自启动(由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需关心):

[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now kubelet

6、集群初始化

6.1 创建kubeadm文件 (Master01节点)

[root@k8s-master01 ~]# vim kubeadm-config.yaml
[root@k8s-master01 ~]# cat kubeadm-config.yaml 
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:- system:bootstrappers:kubeadm:default-node-tokentoken: 7t2weq.bjbawausm0jaxuryttl: 24h0m0susages:- signing- authentication
kind: InitConfiguration
localAPIEndpoint:advertiseAddress: 192.168.200.61bindPort: 6443
nodeRegistration:criSocket: unix:///var/run/containerd/containerd.sockimagePullPolicy: IfNotPresentimagePullSerial: truename: k8s-master01taints:- effect: NoSchedulekey: node-role.kubernetes.io/control-plane
timeouts:controlPlaneComponentHealthCheck: 4m0sdiscovery: 5m0setcdAPICall: 2m0skubeletHealthCheck: 4m0skubernetesAPICall: 1m0stlsBootstrap: 5m0supgradeManifests: 5m0s
---
apiServer:certSANs:- 192.168.200.100
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 876000h0m0s
certificateValidityPeriod: 876000h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.200.100:16443
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:local:dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.33.5
networking:dnsDomain: cluster.localpodSubnet: 172.16.0.0/16serviceSubnet: 10.96.0.0/16
proxy: {}
scheduler: {}
# 更新kubeadm文件:
[root@k8s-master01 ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml# 修改时间
[root@k8s-master01 ~]# vim new.yaml 
[root@k8s-master01 ~]# sed -n "22,23p" new.yaml 
timeouts:controlPlaneComponentHealthCheck: 4m0s

6.2 将new.yaml文件复制到其他master节点 (Master01节点)

[root@k8s-master01 ~]# for i in k8s-master02 k8s-master03; do scp new.yaml $i:/root/; done

6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改)

[root@k8s-master01 ~]# kubeadm config images pull --config /root/new.yaml 

6.4 初始化集群 (Master01节点)

初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可

[root@k8s-master01 ~]# kubeadm init --config /root/new.yaml --upload-certs
...
Your Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/You can now join any number of control-plane nodes running the following command on each as root:kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.Then you can join any number of worker nodes by running the following on each as root:kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d

6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点)

[root@k8s-master01 ~]# cat <<EOF >> /root/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF[root@k8s-master01 ~]# source /root/.bashrc# 显示NotReady不影响
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS     ROLES           AGE    VERSION
k8s-master01   NotReady   control-plane   106s   v1.33.5

如果其它节点(包括非K8s节点)也想使用kubectl操作集群,请把admin.conf复制到其它节点,然后操作即可

6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):

kubeadm reset -f; ipvsadm --clear; rm -rf ~/.kube

如果多次尝试都是初始化失败,需要看系统日志,CentOS/RockyLinux日志路径:/var/log/messages,Ubuntu系列日志路径:/var/log/syslog:

tail -f /var/log/messages | grep -v "not found"
  • 经常出错的原因:
  1. Containerd的配置文件修改的不对,自行参考《安装containerd》小节核对
  2. new.yaml配置问题,比如非高可用集群忘记修改16443端口为6443
  3. new.yaml配置问题,三个网段有交叉,出现IP地址冲突
  4. VIP不通导致无法初始化成功,此时messages日志会有VIP超时的报错

6.7 高可用Master (在master02和master03分别执行join命令)

添加其它Master节点到集群中,只需要执行如下的join命令即可。

注意:千万不要在master01再次执行,不能直接复制文档当中的命令,而是你自己刚才在master01初始化之后产生的命令

[root@k8s-master02 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87

查看当前状态:(如果显示NotReady不影响)

[root@k8s-master01 ~]# kubectl get node
NAME           STATUS     ROLES           AGE     VERSION
k8s-master01   NotReady   control-plane   7m17s   v1.33.5
k8s-master02   NotReady   control-plane   78s     v1.33.5
k8s-master03   NotReady   control-plane   87s     v1.33.5

6.8 工作节点的配置 (在node01和node02分别执行join命令)

工作节点上主要部署公司的一些业务应用,生产环境中不建议Master节点部署系统组件之外的其他Pod,测试环境可以允许Master节点部署Pod以节省系统资源。

[root@k8s-node01 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d

所有节点初始化完成后,查看集群状态(NotReady不影响)

[root@k8s-master01 ~]# kubectl get node
NAME           STATUS     ROLES           AGE     VERSION
k8s-master01   NotReady   control-plane   10m     v1.33.5
k8s-master02   NotReady   control-plane   4m16s   v1.33.5
k8s-master03   NotReady   control-plane   4m25s   v1.33.5
k8s-node01     NotReady   <none>          71s     v1.33.5
k8s-node02     NotReady   <none>          57s     v1.33.5

7、Calico组件的安装

7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点)

[root@k8s-master01 ~]# cat >>/etc/NetworkManager/conf.d/calico.conf<<EOF
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:vxlan-v6.calico;interface-name:wireguard.cali;interface-name:wg-v6.cali
EOF[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart NetworkManager

7.2 安装Calico (只在master01执行,.x不需要更改)

[root@k8s-master01 ~]# cd /root/;git clone https://gitee.com/dukuan/k8s-ha-install.git
[root@k8s-master01 ~]# cd /root/k8s-ha-install && git checkout manual-installation-v1.33.x && cd calico/

修改Pod网段:

[root@k8s-master01 calico]# POD_SUBNET=`cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`

替换Calico配置文件并安装:

[root@k8s-master01 calico]# sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml
[root@k8s-master01 calico]# kubectl apply -f calico.yaml

此时节点全部变为Ready状态:

[root@k8s-master01 calico]# kubectl get node
NAME           STATUS   ROLES           AGE   VERSION
k8s-master01   Ready    control-plane   35m   v1.33.5
k8s-master02   Ready    control-plane   29m   v1.33.5
k8s-master03   Ready    control-plane   29m   v1.33.5
k8s-node01     Ready    <none>          26m   v1.33.5
k8s-node02     Ready    <none>          26m   v1.33.5

查看容器和节点状态:

[root@k8s-master01 calico]# kubectl get po -n kube-system
NAME                                       READY   STATUS    RESTARTS      AGE
calico-kube-controllers-8678987965-4j5bp   1/1     Running   0             16m
calico-node-92bnb                          1/1     Running   0             16m
calico-node-9gqpm                          1/1     Running   0             16m
calico-node-gdz59                          1/1     Running   0             16m
calico-node-hrfkr                          1/1     Running   0             16m
calico-node-tdgh8                          1/1     Running   0             16m
coredns-746c97786-gz7hp                    1/1     Running   0             45m
coredns-746c97786-sf7mw                    1/1     Running   0             45m
etcd-k8s-master01                          1/1     Running   0             45m
etcd-k8s-master02                          1/1     Running   0             39m
etcd-k8s-master03                          1/1     Running   0             39m
....

8、Metrics部署 (master01节点)

在新版的Kubernetes中系统资源的采集均使用Metrics-server,可以通过Metrics采集节点和Pod的内存、磁盘、CPU和网络的使用率。

(将Master01节点的front-proxy-ca.crt复制到所有Node节点)

[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node01:/etc/kubernetes/pki/front-proxy-ca.crt
[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node02:/etc/kubernetes/pki/front-proxy-ca.crt

安装metrics server:

[root@k8s-master01 calico]# cd /root/k8s-ha-install/kubeadm-metrics-server
[root@k8s-master01 kubeadm-metrics-server]# kubectl create -f comp.yaml

查看状态:

[root@k8s-master01 kubeadm-metrics-server]# kubectl get po -n kube-system -l k8s-app=metrics-server
NAME                              READY   STATUS    RESTARTS   AGE
metrics-server-7d9d8df576-zzq9j   1/1     Running   0          57s

等Pod变成1/1 Running后,查看节点和Pod资源使用率:

[root@k8s-master01 kubeadm-metrics-server]# kubectl top node
NAME           CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   
k8s-master01   727m         18%      977Mi           27%         
k8s-master02   636m         15%      945Mi           26%         
k8s-master03   626m         15%      916Mi           26%         
k8s-node01     300m         7%       550Mi           15%         
k8s-node02     331m         8%       433Mi           12% 

9、Dashboard部署 (master01节点)

9.1 安装

Dashboard用于展示集群中的各类资源,同时也可以通过Dashboard实时查看Pod的日志和在容器中执行一些命令等。接下来安装Dashboard:

[root@k8s-master01 kubeadm-metrics-server]# cd /root/k8s-ha-install/dashboard/
[root@k8s-master01 dashboard]# kubectl  create -f .

9.2 登录dashboard

更改dashboard的svc为NodePort:

[root@k8s-master01 dashboard]# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard

查看端口号:

[root@k8s-master01 dashboard]# kubectl get svc -n kubernetes-dashboard
NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
dashboard-metrics-scraper   ClusterIP   10.96.128.108   <none>        8000/TCP        9m40s
kubernetes-dashboard        NodePort    10.96.119.44    <none>        443:32506/TCP   9m42s

根据自己的实例端口号,通过任意安装了kube-proxy的宿主机的IP+端口即可访问到dashboard:访问Dashboard:https://192.168.200.61:32506 (把IP地址和端口改成你自己的)选择登录方式为令牌(即token方式)

image.png-94.1kB

创建登录Token:

[root@k8s-master01 dashboard]# kubectl create token admin-user -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6IjFsVVlxQWhNZ2RWVlRXRWNLX2VjZmNJZlhUbDNMazM0bzR3bWNMcmhoNkEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzYwMjczMTYxLCJpYXQiOjE3NjAyNjk1NjEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiMmVmYjhjZWYtMzg2Ni00ZGJhLWEzM2MtNGY4OWE2Mjk2MGFmIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZGY2ZWIwNTMtYzA3Ni00MDFjLWE0N2MtNjI5MTZiNjNkOTgyIn19LCJuYmYiOjE3NjAyNjk1NjEsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTphZG1pbi11c2VyIn0.byzK70mYJaoCJL3sh1sxaVUi88Q24MPWkEe4PQ2yHIKRbuYPJh8PHkyXmmdRL6VVJd7k_927P5VJp_2e9ScMOcyqADSu44CkVwHGBI9C66hvJpTnAm4XwUmTrotc-5lDebTrbjLgUe7eD54CpIbng7FM0eg98lcyv4o-6Zto-cjMG_92s_oCC1W9DMVvPctd8_q3wZmY2v6hx8vFd95wbRrr4JxTtZlPoKAyisbUSATw2MUG8at82QpZzNoXIJGQf0DXEJxOxbU_DCJ6xemB8urgfHWT4L0tu1v35nL6_uaRXEKCfxQxrLzstfl_TwQzN09AE66-kNAv6VUUzDqP3Q

将token值输入到令牌后,单击登录即可访问Dashboard
image.png-244.9kB

10、一些必须的配置更改 (master01节点)

将Kube-proxy改为ipvs模式,因为在初始化集群的时候注释了ipvs配置,所以需要自行修改一下:

[root@k8s-master01 dashboard]# kubectl edit cm kube-proxy -n kube-systemmode: ipvs

更新Kube-Proxy的Pod:

[root@k8s-master01 dashboard]# kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system

验证Kube-Proxy模式

[root@k8s-master01 dashboard]# curl 127.0.0.1:10249/proxyMode
ipvs

11、k8s集群维护管理

11.1 节点下线

11.1.1 下线步骤

如果某个节点需要下线,可以使用如下步骤平滑下线:

  • 1、添加污点禁止调度
  • 2、查询节点是否有重要服务
    • 漂移重要服务至其它节点
  • 3、确认是否是ingress入口
    • 端口流量
  • 4、使用drain设置为驱逐状态
  • 5、再次检查节点上的其它服务
    • 基础组件等
  • 6、查看有无异常的Pod
    • 有无Pending的Pod
    • 非Running状态
  • 7、使用delete删除节点
  • 8、节点下线
    • kubeadm reset -f
    • systemctl disable --now kubelet

11.1.2 执行下线

假设k8s-node02为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:

[root@k8s-master01 ~]# kubectl taint node k8s-node02 offline=true:NoSchedule

查看是否有重要服务:

[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system            calico-node-gdz59                            1/1     Running   2 (61m ago)   6d17h   192.168.200.65   k8s-node02     <none>           <none>
kube-system            kube-proxy-d6pp4                             1/1     Running   2 (61m ago)   6d16h   192.168.200.65   k8s-node02     <none>           <none>
kube-system            metrics-server-7d9d8df576-zzq9j              1/1     Running   6 (59m ago)   6d17h   172.16.58.199    k8s-node02     <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-69b4796d9b-dmqd9   1/1     Running   2 (61m ago)   6d16h   172.16.58.201    k8s-node02     <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-778584b9dd-kg2hc        1/1     Running   3 (61m ago)   6d16h   172.16.58.200    k8s-node02     <none>           <none>

假设dashboard-metrics-scraperkubernetes-dashboardmetrics-server为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):

[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system            calico-node-gdz59                            1/1     Running             2 (72m ago)   6d17h   192.168.200.65   k8s-node02     <none>           <none>
kube-system            kube-proxy-d6pp4                             1/1     Running             2 (72m ago)   6d16h   192.168.200.65   k8s-node02     <none>           <none>

其它检查按需执行:

# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node02 --ignore-daemonsets# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A  | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 接下来删除节点即可:
[root@k8s-master01 ~]# kubectl delete node k8s-node02# 节点下线,根据需要处理节点即可:
[root@k8s-node02 ~]# kubeadm reset -f
[root@k8s-node02 ~]# systemctl disable --now kubelet# 查看当前节点:
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d18h   v1.33.5
k8s-master02   Ready    control-plane   6d18h   v1.33.5
k8s-master03   Ready    control-plane   6d18h   v1.33.5
k8s-node01     Ready    <none>          6d18h   v1.33.5

11.2 添加节点

11.2.1 基本环境配置,新增节点更改主机名(具体查看第2.3章节)

11.2.2 内核配置(具体查看第2.4章节)

11.2.3 安装Containerd (具体查看第4章节)

11.2.4 安装Kubernetes组件(具体查看第5章节)

11.2.5 新增节点配置源(注意更改版本号):

# Master节点拷贝front-proxy证书:
[root@k8s-node02 ~]# mkdir -p  /etc/kubernetes/pki/
[root@k8s-master01 ~]# scp /etc/kubernetes/pki/front-proxy-ca.crt 192.168.200.65:/etc/kubernetes/pki/
# Master节点生成新的token:
[root@k8s-master01 ~]# kubeadm token create --print-join-command# 新增节点执行join命令:
[root@k8s-node02 ~]# kubeadm join 192.168.200.100:16443 --token gtuckg.ahx37p3zq54jrgy3 --discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d# Master节点查看节点状态:
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d18h   v1.33.5
k8s-master02   Ready    control-plane   6d18h   v1.33.5
k8s-master03   Ready    control-plane   6d18h   v1.33.5
k8s-node01     Ready    <none>          6d18h   v1.33.5
k8s-node02     Ready    <none>          27s     v1.33.5# Master节点查看Pod状态是否正常:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system            calico-node-lw4s5                            1/1     Running   0             2m36s   192.168.200.65   k8s-node02     <none>           <none>
kube-system            kube-proxy-kd6sf                             1/1     Running   0             2m36s   192.168.200.65   k8s-node02     <none>           <none>

11.3 集群升级

11.3.1 升级流程及注意事项

官方文档:

  • 升级流程:
    • 升级Master节点
    • 维护工作节点
    • 升级工作节点
  • 注意事项:
    • kubeadm不可以跨版本升级
    • 有条件先备份后升级
    • 关闭swap
# 假设需要升级到1.34,需要先配置1.34的源(所有节点):
[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/repodata/repomd.xml.key
EOF

11.3.2 升级主节点:

# 升级主节点需要挨个升级,首先升级Master01节点:
[root@k8s-master01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 查看版本:
[root@k8s-master01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"34", EmulationMajor:"", EmulationMinor:"", MinCompatibilityMajor:"", MinCompatibilityMinor:"", GitVersion:"v1.34.1", GitCommit:"93248f9ae092f571eb870b7664c534bfc7d00f03", GitTreeState:"clean", BuildDate:"2025-09-09T19:43:15Z", GoVersion:"go1.24.6", Compiler:"gc", Platform:"linux/amd64"}[root@k8s-master01 ~]# kubectl version
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.33.5
# 验证升级计划:
[root@k8s-master01 ~]# kubeadm upgrade plan
....
Upgrade to the latest stable version:COMPONENT                 NODE           CURRENT    TARGET
kube-apiserver            k8s-master01   v1.33.5    v1.34.1
kube-apiserver            k8s-master02   v1.33.5    v1.34.1
kube-apiserver            k8s-master03   v1.33.5    v1.34.1
kube-controller-manager   k8s-master01   v1.33.5    v1.34.1
kube-controller-manager   k8s-master02   v1.33.5    v1.34.1
kube-controller-manager   k8s-master03   v1.33.5    v1.34.1
kube-scheduler            k8s-master01   v1.33.5    v1.34.1
kube-scheduler            k8s-master02   v1.33.5    v1.34.1
kube-scheduler            k8s-master03   v1.33.5    v1.34.1
kube-proxy                               1.33.5     v1.34.1
CoreDNS                                  v1.12.0    v1.12.1
etcd                      k8s-master01   3.5.21-0   3.6.4-0
etcd                      k8s-master02   3.5.21-0   3.6.4-0
etcd                      k8s-master03   3.5.21-0   3.6.4-0You can now apply the upgrade by executing the following command:kubeadm upgrade apply v1.34.1_____________________________________________________________________# 执行升级:
[root@k8s-master01 ~]# kubeadm upgrade apply v1.34.1
# 重启kubelet:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart kubelet# 确认版本:
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d19h   v1.34.1
k8s-master02   Ready    control-plane   6d18h   v1.33.5
k8s-master03   Ready    control-plane   6d18h   v1.33.5
k8s-node01     Ready    <none>          6d18h   v1.33.5
k8s-node02     Ready    <none>          30m     v1.33.5[root@k8s-master01 ~]# grep "image:" /etc/kubernetes/manifests/*.yaml
/etc/kubernetes/manifests/etcd.yaml:    image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.6.4-0
/etc/kubernetes/manifests/kube-apiserver.yaml:    image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.34.1
/etc/kubernetes/manifests/kube-controller-manager.yaml:    image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.34.1
/etc/kubernetes/manifests/kube-scheduler.yaml:    image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.34.1

11.3.3 升级其他主节点

# 接下来升级其他主节点,首先安装组件:
[root@k8s-master02 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 升级其他主节点:
[root@k8s-master02 ~]# kubeadm upgrade node# 重启kubelet:
[root@k8s-master02 ~]# systemctl daemon-reload
[root@k8s-master02 ~]# systemctl restart kubelet# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d19h   v1.34.1
k8s-master02   Ready    control-plane   6d19h   v1.34.1
k8s-master03   Ready    control-plane   6d19h   v1.34.1
k8s-node01     Ready    <none>          6d19h   v1.33.5
k8s-node02     Ready    <none>          40m     v1.33.5

11.3.4 升级工作节点

升级工作节点比较简单,只需要安装kubelet,然后重启即可,但是需要注意提前把节点设置为维护状态(和下线步骤类似,测试环境可以直接重启无法设置为维护)。

假设k8s-node01为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:

[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade=true:NoSchedule

查看是否有重要服务:

[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system            calico-kube-controllers-8678987965-4j5bp     1/1     Running   2 (7m41s ago)   6d18h   172.16.85.200    k8s-node01     <none>           <none>
kube-system            calico-node-tdgh8                            1/1     Running   1 (70m ago)     6d18h   192.168.200.64   k8s-node01     <none>           <none>
kube-system            kube-proxy-278gx                             1/1     Running   0               4m17s   192.168.200.64   k8s-node01     <none>           <none>
kube-system            metrics-server-74767fc66c-lv5w7              1/1     Running   0               63m     172.16.85.208    k8s-node01     <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-5b47ccc9c7-45lds   1/1     Running   0               64m     172.16.85.204    k8s-node01     <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-65fd974fd6-gfgpq        1/1     Running   0               62m     172.16.85.209    k8s-node01     <none>           <none>

假设dashboard-metrics-scraperkubernetes-dashboardmetrics-server为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):

[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system            calico-kube-controllers-8678987965-4j5bp     1/1     Running            2 (10m ago)   6d18h   172.16.85.200    k8s-node01     <none>           <none>
kube-system            calico-node-tdgh8                            1/1     Running            1 (73m ago)   6d18h   192.168.200.64   k8s-node01     <none>           <none>
kube-system            kube-proxy-278gx                             1/1     Running            0             7m22s   192.168.200.64   k8s-node01     <none>           <none>

其它检查按需执行:

# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node01 --ignore-daemonsets# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A  | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 升级工作节点
[root@k8s-node01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes# 重启kubelet:
[root@k8s-node01 ~]# systemctl daemon-reload
[root@k8s-node01 ~]# systemctl restart kubelet# 升级后,去掉drain和污点:
[root@k8s-master01 ~]# kubectl uncordon k8s-node01
[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade-# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d19h   v1.34.1
k8s-master02   Ready    control-plane   6d19h   v1.34.1
k8s-master03   Ready    control-plane   6d19h   v1.34.1
k8s-node01     Ready    <none>          6d19h   v1.34.1
k8s-node02     Ready    <none>          65m     v1.34.1# 其它节点同样操作步骤

附录:如何是真正生产可用的集群?

1、节点均正常(节点的状态全是Ready)

[root@k8s-master01 ~]# kubectl get node
NAME           STATUS   ROLES           AGE     VERSION
k8s-master01   Ready    control-plane   6d19h   v1.34.1
k8s-master02   Ready    control-plane   6d19h   v1.34.1
k8s-master03   Ready    control-plane   6d19h   v1.34.1
k8s-node01     Ready    <none>          6d19h   v1.34.1
k8s-node02     Ready    <none>          65m     v1.34.1

2、Pod 均正常(Pod的状态全是Running,READY前后的数字都是一致的,RESTARTS(重启)的次数没有增加)

[root@k8s-master01 ~]# kubectl get po -A
NAMESPACE              NAME                                         READY   STATUS    RESTARTS      AGE
....
kube-system            etcd-k8s-master01                            1/1     Running   2 (23m ago)   6d17h
kube-system            etcd-k8s-master02                            1/1     Running   2 (23m ago)   6d17h
kube-system            etcd-k8s-master03                            1/1     Running   2 (23m ago)   6d17h
kube-system            kube-scheduler-k8s-master01                  1/1     Running   2 (23m ago)   6d17h
kube-system            kube-scheduler-k8s-master02                  1/1     Running   2 (23m ago)   6d17h
kube-system            kube-scheduler-k8s-master03                  1/1     Running   2 (23m ago)   6d17h
kube-system            metrics-server-7d9d8df576-zzq9j              1/1     Running   6 (20m ago)   6d16h
....

3、集群网段无任何冲突(svc网段10.96,node网段192.168,pod网段172.16)

[root@k8s-master01 ~]# kubectl get svc -A
NAMESPACE              NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default                kubernetes                  ClusterIP   10.96.0.1       <none>        443/TCP                  6d17h
kube-system            kube-dns                    ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   6d17h
kube-system            metrics-server              ClusterIP   10.96.87.203    <none>        443/TCP                  6d16h
kubernetes-dashboard   dashboard-metrics-scraper   ClusterIP   10.96.128.108   <none>        8000/TCP                 6d16h
kubernetes-dashboard   kubernetes-dashboard        NodePort    10.96.119.44    <none>        443:32506/TCP            6d16h[root@k8s-master01 ~]# kubectl get node -owide 
NAME           STATUS   ROLES           AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                      KERNEL-VERSION                 CONTAINER-RUNTIME
k8s-master01   Ready    control-plane   6d17h   v1.33.5   192.168.200.61   <none>        Rocky Linux 9.6 (Blue Onyx)   5.14.0-570.49.1.el9_6.x86_64   containerd://1.7.28
k8s-master02   Ready    control-plane   6d17h   v1.33.5   192.168.200.62   <none>        Rocky Linux 9.6 (Blue Onyx)   5.14.0-570.49.1.el9_6.x86_64   containerd://1.7.28
k8s-master03   Ready    control-plane   6d17h   v1.33.5   192.168.200.63   <none>        Rocky Linux 9.6 (Blue Onyx)   5.14.0-570.49.1.el9_6.x86_64   containerd://1.7.28
k8s-node01     Ready    <none>          6d17h   v1.33.5   192.168.200.64   <none>        Rocky Linux 9.6 (Blue Onyx)   5.14.0-570.49.1.el9_6.x86_64   containerd://1.7.28
k8s-node02     Ready    <none>          6d17h   v1.33.5   192.168.200.65   <none>        Rocky Linux 9.6 (Blue Onyx)   5.14.0-570.49.1.el9_6.x86_64   containerd://1.7.28[root@k8s-master01 ~]# kubectl get po -A -owide | grep coredns
kube-system            coredns-746c97786-gz7hp                      1/1     Running   2 (30m ago)   6d17h   172.16.85.201    k8s-node01     <none>           <none>
kube-system            coredns-746c97786-sf7mw                      1/1     Running   2 (30m ago)   6d17h   172.16.85.200    k8s-node01     <none>           <none>

4、能够正常创建资源

kubectl create deploy cluster-test --image=registry.cn-beijing.aliyuncs.com/dotbalo/debug-tools -- sleep 3600

5、Pod 必须能够解析 Service(同 namespace 和跨 namespace)

a) nslookup kubernetes
b) nslookup kube-dns.kube-system

6、每个节点都必须要能访问 Kubernetes 的 kubernetes svc 443 和 kube-dns 的 service 53

[root@k8s-master02 ~]# curl https://10.96.0.1:443 -k
{"kind": "Status","apiVersion": "v1","metadata": {},"status": "Failure","message": "forbidden: User \"system:anonymous\" cannot get path \"/\"","reason": "Forbidden","details": {},"code": 403
}[root@k8s-node02 ~]# curl http://10.96.0.10:53 -k
curl: (52) Empty reply from server

7、Pod 和 Pod 之间要能够正常通讯(同 namespace 和跨 namespace)
8、Pod 和 Pod 之间要能够正常通讯(同机器和跨机器)


此博客来源于:https://edu.51cto.com/lecturer/11062970.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/943398.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

题解:P9464 [EGOI 2023] Padel Prize Pursuit / 追梦笼式网球

目前暂无修正。选手无法观察到树形结构,于是选手写了比较没脑子的可持久化线段树做法。 我们考虑第 \(i\) 个奖牌会到哪里去,发现每个关系 \((X,Y)\) 意思是此时 \(Y\) 的奖牌会被 \(X\) 赢走,即奖牌此时到 \(Y\) 后…

Spring Boot 中@RestController注解的详解和运用

Spring Boot 中@RestController注解的详解和运用2025-10-22 15:09 tlnshuju 阅读(0) 评论(0) 收藏 举报pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display…

软件工程课程第二次团队作业

这个作业属于哪个课程 https://edu.cnblogs.com/campus/fzu/202501SoftwareEngineering/这个作业要求在哪里 https://edu.cnblogs.com/campus/fzu/202501SoftwareEngineering/homework/13559这个作业的目标 构建一个能…

AGC 板刷记录2

AT_agc073_a [AGC073A] Chords and Checkered 题解 自己手画几组,没添加一条线其实就是把穿过的区域复制添加一份颜色反转的,一块区域如果是黑的,一定是被奇数条线覆盖。我们将其拆成两部分,第一部分是只有一条线围…

2025 年涿州装修公司最新推荐榜,深度解析企业服务能力与市场口碑优势

涿州作为环京核心区域,装修市场已聚集超 1500 家注册企业,但行业内资质参差、报价混乱、工艺缩水等问题频发,不少业主因选错服务商陷入工期延误、增项加价的困境。为破解这一难题,本榜单基于企业综合实力、施工标准…

结对编程项目总结

项目 GitHub 地址:https://github.com/LoadStar822/Elevator我们把结对开发的里程碑、算法设计心得以及协作复盘一起整理在这份文档里,方便后续直接发布到博客或项目页。全文以“先数据、再故事”的顺序铺陈,读者可…

刘强东带火数字人直播?商业化逐步成熟,逐渐取代真人带货!zhibo175

4月16日晚6点18分,刘强东准时出现在京东家电家居采销直播间和京东超市采销直播间。 不过,此次出镜带货的并非刘强东本人,而是其数字虚拟人分身“采销东哥”。开播不足半小时,两大直播间就吸引了超1000万次观看。 相…

Hive事务管理详解:从ACID原理到UPDATE/DELETE实战 - 实践

Hive事务管理详解:从ACID原理到UPDATE/DELETE实战 - 实践pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consola…

TabControl控件

TabControl控件,页面集合 用于管理一个TabPages集合,每个TabPage都是一个容器控件 常用属性: MultiLine,TabPages,AlignMent,Appearance,ItemSize,ImagesList 知识点1: MultiLine,是否允许多行选项卡 AlignM…

权威调研榜单:硬质合金挤压模具厂家TOP3综合实力深度解析

权威调研榜单:硬质合金挤压模具厂家TOP3综合实力深度解析 随着制造业向高端化、精密化方向发展,硬质合金挤压模具作为精密加工领域的核心工具,其性能直接影响产品质量和生产效率。根据行业调研数据显示,2024年我国…

详细介绍:【Linux指南】gdb进阶技巧:断点高级玩法与变量跟踪实战

详细介绍:【Linux指南】gdb进阶技巧:断点高级玩法与变量跟踪实战pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "…

Nacos 3.1.0 正式发布,支持 A2A 注册中心与 MCP 注册协议增强

Nacos 社区正式发布 3.1.0 版本!作为全新的里程碑版本,3.1.0 在 A2A(Agent-to-Agent)注册中心和 MCP(Model-Context-Protocol)注册中心两大核心能力上实现重大突破,同时修复多项历史问题并升级关键依赖。作者:…

2025 年点火器厂家最新推荐排行榜:综合评估高能 / 自动 / 防爆等多类型产品,精选优质品牌

在工业生产、民生应用等领域,点火器作为核心设备,其性能好坏直接关系到生产效率提升、作业安全保障以及能源消耗控制。当前点火器市场呈现品牌数量多、产品质量差异大的特点,部分品牌因技术滞后,生产的点火器存在点…

VS2026 使用 WebDeploy 发布到 IIS - Jeff

这里有B站的一位up发的视频 - 博文只是为了记录一下大体步骤,主要是记录最后的问题以及解决方案,因为遇到的问题在网上搜不到。通过使用Visual Studio将你的程序WebDeploy一键发布到windows的IIS_哔哩哔哩_bilibili …

2025 激光灯厂家最新推荐榜:全方位测评核心实力与潜力,甄选优质供应商实用指南

引言 2025 年激光灯行业迎来技术迭代与新品牌爆发的双重浪潮,市场呈现 “老品牌深耕、新势力突围” 的格局,但选型难题愈发突出。部分厂商偷工减料导致产品性能不稳定,中小品牌技术滞后难以适配文旅亮化、商业演艺等…

SpringBoot3 集成Junit4 - 实践

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

详细介绍:Spark Shuffle:分布式计算的数据重分布艺术

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

2025 年火焰检测器生产厂家最新推荐权威排名:涵盖防爆 / 一体化 / 紫外线 / 离子 / 红外线 / 红紫外复合 / 智能型,多维度解析助力企业精准选型

引言 当前工业领域对火焰检测器的需求日益严苛,不同场景下需匹配防爆、一体化、紫外线等多种类型产品,而市场中厂家技术水平悬殊,部分产品存在检测精度不足、适应复杂工况能力弱等问题,导致企业选型时易陷入 “选贵…

排序算法的介绍

排序算法的介绍概要排序算法是众多算法中常见的基本算法,它的任务是将一组数据按一定的顺序排列。排序算法广泛应用于数据处理、搜索优化、数据库管理等领域。不同的排序算法适用于不同的场景,本文将介绍几种常见的排…

调理neovide之 自定义keymap-不用starter-template的话,直接init.lua中改

感谢提供完整的 init.lua 内容!现在问题非常清晰了。 你没有使用标准的 LazyVim starter 配置,而是手动集成了 LazyVim/LazyVim 作为插件,并自定义了数据目录(D:/nvim-data)。 在这种模式下,LazyVim 不会自动加载…