k8s api server

news/2025/10/20 6:49:17/文章来源:https://www.cnblogs.com/todaygood/p/19151686

Experiencing kube-apiserver response times exceeding 3 seconds is a critical performance issue that can impact cluster stability and reliability. This is often caused by high request loads, resource contention, etcd problems, or misconfigured admission controllers.

Here is a systematic approach to diagnosing and resolving kube-apiserver latency.

  1. Monitor API server metrics
    Use a monitoring tool like Prometheus and Grafana to examine the API server's metrics. This is the first step to narrowing down the source of the problem.
    Request duration: Look at the apiserver_request_duration_seconds metric, segmented by verb (GET, LIST, POST), resource, group, and component.

High GET/LIST latency indicates potential issues with the underlying etcd storage or the volume of objects being requested.

High POST/PUT latency points to possible delays from admission webhooks or general write performance bottlenecks.

In-flight requests: Check apiserver_current_inflight_requests. A high number can indicate the API server is overloaded and struggling to keep up with the incoming request rate.
Request throttling: Look for apiserver_flowcontrol_rejected_requests_total. A high or non-zero value indicates that API Priority and Fairness (APF) is throttling requests, suggesting resource bottlenecks.
API server logs: Check the kube-apiserver pod logs for any network-related errors, connection issues, or webhook failures.

  1. Identify the source of high API server load

An overloaded API server is one of the most common causes of high latency.
Find noisy clients: Use kube-audit logs to identify which user agents, service accounts, or pods are making a high volume of requests. Managed Kubernetes services like AKS offer built-in diagnostics to identify noisy clients making excessive LIST calls.
Inspect API Priority and Fairness (APF): Review the APF metrics, such as apiserver_flowcontrol_current_inqueue_request, to see if a particular request queue has a backlog.
Identify inefficient requests: Check for clients making frequent, unoptimized LIST requests. Instead of polling, applications should use "watch" features, which are more efficient.

  1. Troubleshoot etcd performance
    The API server relies on etcd for all cluster state data. etcd latency directly impacts API server performance.
    Monitor etcd metrics: Check the etcd_request_duration_seconds metric to measure the latency of read and write requests to the database.
    Check database size: A large number of objects in etcd can cause performance degradation. Check the etcd_db_total_size_in_bytes or apiserver_storage_db_total_size_in_bytes metric to monitor size. The etcd database has a default size limit of 4 GB.
    Defragment etcd: If the etcd database is fragmented, use etcdctl defrag to clean up storage.
    Clean up old resources: Identify and remove old, unused objects, such as completed jobs, to free up etcd space. For example:

  2. Investigate Admission Controller overhead
    Admission controllers can add latency, especially with multiple validating or mutating webhooks.

Check admission webhook latency: Monitor the apiserver_admission_webhook_admission_duration_seconds metric to identify any webhooks causing delays.

Look for deadlocks: Check logs for errors related to webhook communication failures, such as failed calling webhook or timeout errors.

Tune webhooks: Optimize or disable any slow or unnecessary webhooks. In some cases, you may be able to use built-in ValidatingAdmissionPolicy instead of external webhooks.

  1. Check cluster resources and network

API server resources: Ensure the kube-apiserver pod has adequate CPU and memory requests and limits configured. A lack of resources will directly impact performance.
etcd cluster resources: For self-hosted etcd, ensure the nodes have sufficient resources, including fast SSD storage.
Network latency: Poor network connectivity between the API server and its clients, or between the API server and etcd, can introduce significant latency.
Test connectivity from the kube-apiserver pod to the etcd endpoints.
Test network latency from a client machine to the kube-apiserver.
Inspect CNI plugins for network issues.

  1. Address inefficient API calls
    Some API calls can be inherently slow, especially in large clusters.
    Unoptimized LIST requests: Large clusters with thousands of objects can cause LIST operations to become very slow as the API server retrieves and filters objects in memory. Kubernetes has implemented API Streaming to improve memory usage for large lists, but some calls can still be intensive.
    Large objects: A large average object size (e.g., in ConfigMaps or Secrets) can put pressure on both the API server and etcd. Consider splitting large objects or moving data into a different storage backend.
    Monitoring for Kubernetes API server performance lags

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/940733.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

读人形机器人32读后总结与感想兼导读

读人形机器人32读后总结与感想兼导读1. 基本信息 人形机器人[加]李向明 著中信出版集团,2025年07月出版1.1. 读薄率 书籍总字数23.6万字,笔记总字数115967字。 读薄率115967236000≈49.14% 1.2. 读厚方向当我点击时,…

在AI技术唾手可得的时代,挖掘新需求成为核心竞争力——某知名知识管理工具生态需求洞察

本文深入分析某知名知识管理工具的生态系统,涵盖主题定制、插件扩展、模板资源等核心功能,通过用户反馈揭示界面美化、功能增强和内容更新等持续需求,展现社区驱动的产品演进路径。a.内容描述核心功能定位:该项目是…

语音助手减少不必要澄清问题的技术突破

本文介绍了一种通过机器学习模型减少语音助手不必要澄清问题的新方法。该方法结合语音识别、自然语言理解和上下文信号,在实验中使澄清问题的F1分数提高了81%,有效平衡了误报和漏报问题。减少语音助手不必要澄清问题…

CH32V003

这是沁恒开发的RISCV32单片机,48MHz,16K Rom,2K Ram,非常便宜,只需要5毛钱左右(其实py32f002系列更便宜性能更强大,但是老外玩的比较少,资料也少) 官方github:openwch/ch32v003: CH32V003 is an ultra-cheap…

PRISMS Senior Varsity Training 20250922

Problem 1 Find the number of integer values of \(k\) in the closed interval \([-500,500]\) for which the equation \[\log(kx)=2\log(x+2) \]has exactly one real solution. Solution 1 \[\begin{align} \log(…

高级语言:面向过程和面向对象

用一个简单的比喻来帮助你理解:“如何完成一顿饭?” 1. 面向过程 (Procedural Oriented) 核心思想:关注“步骤”和“流程”。 这就像你拿到一份菜谱。你必须严格按照步骤来执行: 第一步:洗菜。 第二步:切菜(把土…

Codeforces Round 1060 (Div. 2)

A. Notelock 题意:一个二进制串,问有多少位置的前\(k-1\)个位置没有\(1\)。 从前往后扫,维护一个可以包含的最右位置就行。点击查看代码 #include <bits/stdc++.h>using i64 = long long;void solve() {int n…

https://img2024.cnblogs.com/blog/3001825/202510/3001825-20251020014716729-439844091.png

用一个简单的比喻来帮助你理解:“如何完成一顿饭?” 1. 面向过程 (Procedural Oriented) 核心思想:关注“步骤”和“流程”。 这就像你拿到一份菜谱。你必须严格按照步骤来执行: 第一步:洗菜。 第二步:切菜(把土…

Luogu P14260 期待(counting) 题解 [ 蓝 ] [ 前缀和 ] [ 组合计数 ]

期待:按照部分分一步一步去想应该是不难出正解的,这题难点应该在于实现上。 看到题感觉不太好直接入手,于是先考虑特殊性质。特殊性质 A 的做法比较神秘,特殊性质 B 就是个骗分的,没啥启发性。 而特殊性质 C 是真…

golang unique包和字符串内部化

最近在做老系统优化,正好遇到了需要使用字符串内部化的场景,所以今天就来说说字符串内部化这种优化技巧。 什么是字符串内部化 熟悉Java或者python的开发者应该对“内部化”这种技术不陌生。内部化指的是对于内容完全…

EasySQLite 升级到.slnx 格式后的性能优化效果解析

一、升级动因与行业趋势 1.1 传统.sln 文件的技术瓶颈 在.NET 开发领域,解决方案文件 (.sln) 长期作为项目管理核心,但二十余年未变的自定义文本格式逐渐显现技术瓶颈。该格式包含大量重复配置信息与 GUID 引用,简单…

mochi-mqtt/server golang mqtt 包

mochi-mqtt/server golang mqtt 包最近在学习nats 的mqtt 能力,默认nats mqtt 的实现是3.1.1 的,同时想着集成nanomq 的bridge 进行桥接实现共享订阅的能力,但是发现有一个兼容的问题,似乎是nanomq 在发送3.1.1 协…

有了异步i/o的话,还需要协程么

1、异步 I/O 和协程区别 这个其实触及了高并发架构的底层原理:“异步 I/O 和协程有什么区别?如果我已经用异步 I/O(如 NIO、Netty、epoll),还需要协程吗?”我们来一步步拆开讲清楚(这题很多人理解偏差)👇一、…

永久暂停window10更新,不想更新到window11

视频:https://www.bilibili.com/video/BV1jsTMz9EUz?t=144.0 饱受自动更新之苦,现提供一个3分钟可以“永久”关闭Windows更新的思路。具体步骤: 1、Win+R,regedit打开注册表编辑器; 2、找到路径HKEY_LOCAL_MACHI…

102302148谢文杰第一次数据采集作业

第一题 核心代码与运行结果点击查看代码 import requests from bs4 import BeautifulSoup# 目标URL:2020年中国大学排名页面 url="http://www.shanghairanking.cn/rankings/bcur/2020" response=requests.g…

算法第二章作业

找第 k 小的数的分治算法自然语言描述: 找第 k 小的数的分治算法,首先要选择一个基准元素,然后将数组分成两部分,一部分是小于等于基准元素的数,另一部分是大于基准元素的数。假设基准元素在划分后位于数组的第 m…