kubernetes(K8S)学习(九):K8S之日志监控

K8S之日志监控

  • 一、Log and Monitor
    • 1.1 Log
      • 1.1.1 容器级别
      • 1.1.2 Pod级别
      • 1.1.3 组件服务级别
      • 1.1.4 LogPilot + ES + Kibana
    • 1.2 Monitor
      • 1.2.1 Prometheus简介
      • 1.2.2 Prometheus架构
      • 1.2.3 Prometheus知识普及
      • 1.2.4 数据采集
      • 1.2.5 Prometheus + Grafana
  • 二、Trouble Shooting(问题排查)
    • 2.1 Master
    • 2.2 Worker
    • 2.3 Addons
    • 2.4 系统问题排查
    • 2.5 Pod的问题排查

一、Log and Monitor

1.1 Log

1.1.1 容器级别

docker命令查看

docker ps --->containerid
docker logs containerid --->查看容器的日志情况

kubectl命令查看

kubectl logs -f <pod-name> -c <container-name>

1.1.2 Pod级别

kubectl describe pod springboot-demo-68b89b96b6-sl8bq

当然,kubectl describe除了能够查看pod的日志信息,还能查看比如Node、RC、Service、Namespace等信
息。
注意 :要是想查看指定命名空间之下的,可以-n=namespace


1.1.3 组件服务级别

比如kube-apiserver、kube-schedule、kubelet、kube-proxy、kube-controller-manager等

可以使用journalctl进行查看

journalctl -u kubelet

1.1.4 LogPilot + ES + Kibana

网址:https://github.com/AliyunContainerService/log-pilot

在这里插入图片描述

部署logpilot

(1)创建log-pilot.yaml

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:name: log-pilotnamespace: kube-systemlabels:k8s-app: log-pilotkubernetes.io/cluster-service: "true"
spec:template:metadata:labels:k8s-app: log-eskubernetes.io/cluster-service: "true"version: v1.22spec:tolerations:- key: node-role.kubernetes.io/mastereffect: NoSchedulecontainers:- name: log-pilotimage: registry.cn-hangzhou.aliyuncs.com/log-monitor/log-pilot:0.9-filebeatresources:limits:memory: 200Mirequests:cpu: 100mmemory: 200Mienv:- name: "FILEBEAT_OUTPUT"value: "elasticsearch"- name: "ELASTICSEARCH_HOST"value: "elasticsearch-api"- name: "ELASTICSEARCH_PORT"value: "9200"- name: "ELASTICSEARCH_USER"value: "elastic"- name: "ELASTICSEARCH_PASSWORD"value: "changeme"volumeMounts:- name: sockmountPath: /var/run/docker.sock- name: rootmountPath: /hostreadOnly: true- name: varlibmountPath: /var/lib/filebeat- name: varlogmountPath: /var/log/filebeatsecurityContext:capabilities:add:- SYS_ADMINterminationGracePeriodSeconds: 30volumes:- name: sockhostPath:path: /var/run/docker.sock- name: roothostPath:path: /- name: varlibhostPath:path: /var/lib/filebeattype: DirectoryOrCreate- name: varloghostPath:path: /var/log/filebeattype: DirectoryOrCreate

命令:

kubectl apply -f log-pilot.yaml

(2)查看pod和daemonset的信息

kubectl get pods -n kube-system
kubectl get pods -n kube-system -o wide | grep log
kubectl get ds -n kube-system

部署elasticsearch

(1)创建elasticsearch.yaml

---
apiVersion: v1
kind: Service
metadata:name: elasticsearch-apinamespace: kube-systemlabels:name: elasticsearch
spec:selector:app: esports:- name: transportport: 9200protocol: TCP
---
apiVersion: v1
kind: Service
metadata:name: elasticsearch-discoverynamespace: kube-systemlabels:name: elasticsearch
spec:selector:app: esports:- name: transportport: 9300protocol: TCP
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:name: elasticsearchnamespace: kube-systemlabels:kubernetes.io/cluster-service: "true"
spec:replicas: 3serviceName: "elasticsearch-service"selector:matchLabels:app: estemplate:metadata:labels:app: esspec:tolerations:- effect: NoSchedulekey: node-role.kubernetes.io/masterinitContainers:- name: init-sysctlimage: busybox:1.27command:- sysctl- -w- vm.max_map_count=262144securityContext:privileged: truecontainers:- name: elasticsearchimage: registry.cn-hangzhou.aliyuncs.com/log-monitor/elasticsearch:v5.5.1ports:- containerPort: 9200protocol: TCP- containerPort: 9300protocol: TCPsecurityContext:capabilities:add:- IPC_LOCK- SYS_RESOURCEresources:limits:memory: 4000Mirequests:cpu: 100mmemory: 2000Mienv:- name: "http.host"value: "0.0.0.0"- name: "network.host"value: "_eth0_"- name: "cluster.name"value: "docker-cluster"- name: "bootstrap.memory_lock"value: "false"- name: "discovery.zen.ping.unicast.hosts"value: "elasticsearch-discovery"- name: "discovery.zen.ping.unicast.hosts.resolve_timeout"value: "10s"- name: "discovery.zen.ping_timeout"value: "6s"- name: "discovery.zen.minimum_master_nodes"value: "2"- name: "discovery.zen.fd.ping_interval"value: "2s"- name: "discovery.zen.no_master_block"value: "write"- name: "gateway.expected_nodes"value: "2"- name: "gateway.expected_master_nodes"value: "1"- name: "transport.tcp.connect_timeout"value: "60s"- name: "ES_JAVA_OPTS"value: "-Xms2g -Xmx2g"livenessProbe:tcpSocket:port: transportinitialDelaySeconds: 20periodSeconds: 10volumeMounts:- name: es-datamountPath: /dataterminationGracePeriodSeconds: 30volumes:- name: es-datahostPath:path: /es-data

命令:

kubectl apply -f elasticsearch.yaml
kubectl get pods -n kube-system
kubectl get pods -n kube-system -o wide | grep ela

(2)查看kube-system下的svc

kubectl get svc -n kube-system
elasticsearch-api ClusterIP 10.106.65.2 <none> 9200/TCP
elasticsearch-discovery ClusterIP 10.101.117.180 <none> 9300/TCP
kube-dns ClusterIP 10.96.0.10 <none>

(3)查看kube-system下的statefulset

kubectl get statefulset -n kube-system
NAME READY AGE
elasticsearch 3/3 106s

部署kibana

(1)创建kibana.yaml

kibana主要是对外提供访问的,所以这边需要配置Service和Ingress
前提: 要有Ingress Controller的支持,比如Nginx Controller

---
# Deployment
apiVersion: apps/v1beta1
kind: Deployment
metadata:name: kibananamespace: kube-systemlabels:component: kibana
spec:replicas: 1selector:matchLabels:component: kibanatemplate:metadata:labels:component: kibanaspec:containers:- name: kibanaimage: registry.cn-hangzhou.aliyuncs.com/log-monitor/kibana:v5.5.1env:- name: CLUSTER_NAMEvalue: docker-cluster- name: ELASTICSEARCH_URLvalue: http://elasticsearch-api:9200/resources:limits:cpu: 1000mrequests:cpu: 100mports:- containerPort: 5601name: http
---
# Service
apiVersion: v1
kind: Service
metadata:name: kibananamespace: kube-systemlabels:component: kibana
spec:selector:component: kibanaports:- name: httpport: 80targetPort: http
---
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:name: kibananamespace: kube-system
spec:rules:- host: log.k8s.itcrazy2016.comhttp:paths:- path: /backend:serviceName: kibanaservicePort: 80

命令:

kubectl apply -f kibana.yaml

(2)查看pod和deployment的信息

kubectl get pods -n kube-system | grep ki
kubectl get deploy -n kube-system

(3)配置Ingress需要的域名

打开windows上的hosts文件

# 注意这边是worker01的IP
121.41.10.126 kibana.jack.com

(4)在windows访问kibana.jack.com

在这里插入图片描述


1.2 Monitor

1.2.1 Prometheus简介

官网 :https://prometheus.io/
github :https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus


1.2.2 Prometheus架构

在这里插入图片描述


1.2.3 Prometheus知识普及

  • 支持pull、push数据添加方式
  • 支持k8s服务发现
  • 提供查询语言PromQL
  • 时序(time series)是由名字(Metric)以及一组key/value标签定义的
  • 数据类型

1.2.4 数据采集

2.2.4.1 服务器数据

通过NodeExporter:https://github.com/prometheus/node_exporter

在这里插入图片描述


2.2.4.2 组件数据

ETCD:https://ip:2379/metrics
APIServer:https://ip:6443/metrics
ControllerManager:https://ip:10252/metrics
Scheduler:https://ip:10251/metrics


2.2.4.3 容器数据

在这里插入图片描述


1.2.5 Prometheus + Grafana

在master上创建prometheus目录

网盘/课堂源码/*.yaml

namespace.yaml
node-exporter.yaml
prometheus.yaml
grafana.yaml
ingress.yaml

(1)创建命名空间ns-monitor

namespace.yaml

apiVersion: v1
kind: Namespace
metadata: name: ns-monitorlabels:name: ns-monitor

命令:

kubectl apply -f namespace.yaml
kubectl get namespace

(2)创建node-exporter

node-exporter.yaml

kind: DaemonSet
apiVersion: apps/v1beta2
metadata: labels:app: node-exportername: node-exporternamespace: ns-monitor
spec:revisionHistoryLimit: 10selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:containers:- name: node-exporterimage: prom/node-exporter:v0.16.0ports:- containerPort: 9100protocol: TCPname:	httphostNetwork: truehostPID: truetolerations:- effect: NoScheduleoperator: Exists---
kind: Service
apiVersion: v1
metadata:labels:app: node-exportername: node-exporter-servicenamespace: ns-monitor
spec:ports:- name:	httpport: 9100nodePort: 31672protocol: TCPtype: NodePortselector:app: node-exporter

命令:

kubectl apply -f node-exporter.yaml
kubectl get pod -n ns-monitor
kubectl get svc -n ns-monitor
kubectl get ds -n ns-monitor
win浏览器访问集群任意一个ip,比如http://121.41.10.126:31672 查看结果 # 这边是http协议,不能用https

(3)部署prometheus pod(包含rbac认证、ConfigMap等)

创建prometheus.yaml:

注意 :记得修改prometheus.yaml文件中的ip为master的ip和path[PV需要使用到]

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:name: prometheus
rules:- apiGroups: [""] # "" indicates the core API groupresources:- nodes- nodes/proxy- services- endpoints- podsverbs:- get- watch- list- apiGroups:- extensionsresources:- ingressesverbs:- get- watch- list- nonResourceURLs: ["/metrics"]verbs:- get
---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: ns-monitorlabels:app: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:name: prometheus
subjects:- kind: ServiceAccountname: prometheusnamespace: ns-monitor
roleRef:kind: ClusterRolename: prometheusapiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confnamespace: ns-monitorlabels:app: prometheus
data:prometheus.yml: |-# my global configglobal:scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']- job_name: 'grafana'static_configs:- targets:- 'grafana-service.ns-monitor:3000'- job_name: 'kubernetes-apiservers'kubernetes_sd_configs:- role: endpoints# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt# If your node certificates are self-signed or use a different CA to the# master CA, then disable certificate verification below. Note that# certificate verification is an integral part of a secure infrastructure# so this should only be disabled in a controlled environment. You can# disable certificate verification by uncommenting the line below.## insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token# Keep only the default/kubernetes service endpoints for the https port. This# will add targets for each API server which Kubernetes adds an endpoint to# the default/kubernetes service.relabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https# Scrape config for nodes (kubelet).## Rather than connecting directly to the node, the scrape is proxied though the# Kubernetes apiserver.  This means it will work if Prometheus is running out of# cluster, or can't connect to nodes for some other reason (e.g. because of# firewalling).- job_name: 'kubernetes-nodes'# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics# Scrape config for Kubelet cAdvisor.## This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics# (those whose names begin with 'container_') have been removed from the# Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to# retrieve those metrics.## In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with# the --cadvisor-port=0 Kubelet flag).## This job is not necessary and should be removed in Kubernetes 1.6 and# earlier versions, or it will cause the metrics to be scraped twice.- job_name: 'kubernetes-cadvisor'# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor# Scrape config for service endpoints.## The relabeling allows the actual service scrape endpoint to be configured# via the following annotations:## * `prometheus.io/scrape`: Only scrape services that have a value of `true`# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need# to set this to `https` & most likely set the `tls_config` of the scrape config.# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.# * `prometheus.io/port`: If the metrics are exposed on a different port to the# service then set this appropriately.- job_name: 'kubernetes-service-endpoints'kubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_name# Example scrape config for probing services via the Blackbox Exporter.## The relabeling allows the actual service scrape endpoint to be configured# via the following annotations:## * `prometheus.io/probe`: Only probe services that have a value of `true`- job_name: 'kubernetes-services'metrics_path: /probeparams:module: [http_2xx]kubernetes_sd_configs:- role: servicerelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__address__]target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]target_label: kubernetes_name# Example scrape config for probing ingresses via the Blackbox Exporter.## The relabeling allows the actual ingress scrape endpoint to be configured# via the following annotations:## * `prometheus.io/probe`: Only probe services that have a value of `true`- job_name: 'kubernetes-ingresses'metrics_path: /probeparams:module: [http_2xx]kubernetes_sd_configs:- role: ingressrelabel_configs:- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]regex: (.+);(.+);(.+)replacement: ${1}://${2}${3}target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_ingress_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_ingress_name]target_label: kubernetes_name# Example scrape config for pods## The relabeling allows the actual pod scrape endpoint to be configured via the# following annotations:## * `prometheus.io/scrape`: Only scrape pods that have a value of `true`# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the# pod's declared ports (default is a port-free target if none are declared).- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2target_label: __address__- action: labelmapregex: __meta_kubernetes_pod_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name
---
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-rulesnamespace: ns-monitorlabels:app: prometheus
data:cpu-usage.rule: |groups:- name: NodeCPUUsagerules:- alert: NodeCPUUsageexpr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75for: 2mlabels:severity: "page"annotations:summary: "{{$labels.instance}}: High CPU usage detected"description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
---
apiVersion: v1
kind: PersistentVolume
metadata:name: "prometheus-data-pv"labels:name: prometheus-data-pvrelease: stable
spec:capacity:storage: 5GiaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: Recyclenfs:path: /nfs/data/prometheusserver: 121.41.10.13---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-data-pvcnamespace: ns-monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 5Giselector:matchLabels:name: prometheus-data-pvrelease: stable---
kind: Deployment
apiVersion: apps/v1beta2
metadata:labels:app: prometheusname: prometheusnamespace: ns-monitor
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:serviceAccountName: prometheussecurityContext:runAsUser: 0containers:- name: prometheusimage: prom/prometheus:latestimagePullPolicy: IfNotPresentvolumeMounts:- mountPath: /prometheusname: prometheus-data-volume- mountPath: /etc/prometheus/prometheus.ymlname: prometheus-conf-volumesubPath: prometheus.yml- mountPath: /etc/prometheus/rulesname: prometheus-rules-volumeports:- containerPort: 9090protocol: TCPvolumes:- name: prometheus-data-volumepersistentVolumeClaim:claimName: prometheus-data-pvc- name: prometheus-conf-volumeconfigMap:name: prometheus-conf- name: prometheus-rules-volumeconfigMap:name: prometheus-rulestolerations:- key: node-role.kubernetes.io/mastereffect: NoSchedule---
kind: Service
apiVersion: v1
metadata:annotations:prometheus.io/scrape: 'true'labels:app: prometheusname: prometheus-servicenamespace: ns-monitor
spec:ports:- port: 9090targetPort: 9090selector:app: prometheustype: NodePort

命令:

kubectl apply -f grafana.yaml
kubectl get pod -n ns-monitor
kubectl get svc -n ns-monitor
win浏览器访问集群任意一个ip:32405/graph/login
比如http://121.41.10.126:32727用户名密码:admin

(5)增加域名访问[没有域名好像没有灵魂]

创建ingress.yaml

前提 :配置好ingress controller和域名解析

apiVersion: v1
kind: PersistentVolume
metadata:name: "grafana-data-pv"labels:name: grafana-data-pvrelease: stable
spec:capacity:storage: 5GiaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: Recyclenfs:path: /nfs/data/grafanaserver: 121.41.10.13
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-data-pvcnamespace: ns-monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 5Giselector:matchLabels:name: grafana-data-pvrelease: stable
---
kind: Deployment
apiVersion: apps/v1beta2
metadata:labels:app: grafananame: grafananamespace: ns-monitor
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: grafanatemplate:metadata:labels:app: grafanaspec:securityContext:runAsUser: 0containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentenv:- name: GF_AUTH_BASIC_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "false"readinessProbe:httpGet:path: /loginport: 3000volumeMounts:- mountPath: /var/lib/grafananame: grafana-data-volumeports:- containerPort: 3000protocol: TCPvolumes:- name: grafana-data-volumepersistentVolumeClaim:claimName: grafana-data-pvc
---
kind: Service
apiVersion: v1
metadata:labels:app: grafananame: grafana-servicenamespace: ns-monitor
spec:ports:- port: 3000targetPort: 3000selector:app: grafanatype: NodePort

命令:

kubectl apply -f ingress.yaml
kubectl get ingress -n ns-monitor
kubectl describe ingress -n ns-monitor

(6)直接通过域名访问即可



二、Trouble Shooting(问题排查)

2.1 Master

master上的组件共同组成了控制平面

01 若apiserver出问题了
会导致整个K8s集群不可以使用,因为apiserver是K8s集群的大脑
02 若etcd出问题了
apiserver和etcd则无法通信,kubelet也无法更新所在node上的状态
03 当scheduler或者controller manager出现问题时
会导致deploy,pod,service等无法正常运行

解决方案 : 出现问题时,监听到自动重启或者搭建高可用的master集群


2.2 Worker

worker节点挂掉或者上面的kubelet服务出现问题时,w上的pod则无法正常运行。


2.3 Addons

dns和网络插件比如calico发生问题时,集群内的网络无法正常通信,并且无法根据服务名称进行解析。


2.4 系统问题排查

查看Node的状态

kubectl get nodes
kubectl describe node-name

查看集群master和worker组件的日志

journalctl -u apiserver
journalctl -u scheduler
journalctl -u kubelet
journalctl -u kube-proxy
...

2.5 Pod的问题排查

K8s中最小的操作单元是Pod,最重要的操作也是Pod,其他资源的排查可以参照Pod问题的排查

(1)查看Pod运行情况

kubectl get pods -n namespace

(2)查看Pod的具体描述,定位问题

kubectl describe pod pod-name -n namespace

(3)检查Pod对应的yaml是否有误

kubectl get pod pod-name -o yaml

(4)查看Pod日志

kubectl logs ...

Pod可能会出现哪些问题及解决方案

01 处于Pending状态
说明Pod还没有被调度到某个node上,可以describe一下详情。可能因为资源不足,端口被占用等。
 
02 处于Waiting/ContainerCreating状态
可能因为镜像拉取失败,或者是网络插件的问题,比如calico,或者是容器本身的问题,可以检查一下容器的yaml文
件内容和Dockerfile的书写。
 
03 处于ImagePullBackOff状态
镜像拉取失败,可能是镜像不存在,或者没有权限拉取。
 
04 处于CrashLoopBackOff状态
Pod之前启动成功过,但是又失败了,不断在重启。
 
05 处于Error状态
有些内容不存在,比如ConfigMap,PV,没有权限等,需要创建一下。
 
06 处于Terminating状态
说明Pod正在停止
 
07 处于Unknown状态
说明K8s已经失去对Pod的管理监听。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/778942.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

项目Weblogic切换Tomcat-包含数据源配置

目录 准备工作 修改Tomcat配置 Tomcat数据源加密 解密 加密 部署 问题解决 1.执行启停脚本时候&#xff0c;爆出&#xff1a;Cannot find ./catalina.sh The file is absent or does not have... 2.org.apache.catalina.core.StandardService.initInternal Failed to …

UE中:200W个对象单场景实现(待更新)

实现背景&#xff1a;需要显示城市级的行人以及地理市级范围内的路灯的状态&#xff0c;行人需要有状态以及位置的更新&#xff0c;路灯只需要状态的更新&#xff0c;二者都不需要物理 方案1概述&#xff1a;Niagara粒子系统实现 实际效果展示 UE5 集群模拟&#xff08;20W&a…

Oracle 控制文件详解

1、控制文件存储的数据信息 1&#xff09;数据库名称和数据库唯一标识符&#xff08;DBID) 2&#xff09;创建数据库的时间戳 3&#xff09;有关数据文件、联机重做日志文件、归档重做日志文件的信息 4&#xff09;表空间信息 5&#xff09;检查点信息 6&#xff09;日志序列号…

硬件10、从网站获取封装

百度搜索IC封装网或者网址https://www.iclib.com/ 搜索想要的器件&#xff0c;直接下载他的原理图库和封装库

1.Mysql基础入门—MySQL-mysql 8.0.11安装教程

1.Mysql基础入门—MySQL-mysql 8.0.11安装教程 摘要个人简介下载Mysql安装Mysql配置环境变量 摘要 MySQL 8.0.11的安装过程涉及几个关键步骤&#xff0c;首先访问MySQL官方网站下载页面&#xff0c;选择操作系统相对应的MySQL版本进行下载。对于Windows用户&#xff0c;启动下…

SpringCloudConfig 使用git搭建配置中心

一 SpringCloudConfig 配置搭建步骤 1.引入 依赖pom文件 引入 spring-cloud-config-server 是因为已经配置了注册中心 <dependencies><dependency><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-config-server</…

浅谈 kafka

引言 同事在公司内部分享了关于 kafka 技术一些相关的内容&#xff0c;所以有了这篇文章&#xff1b;部分图片选自网络摘抄&#xff1b; 1 Kafka概述 1.1 定义 Kafka传统定义&#xff1a;kafka是一个分布式的基于发布/订阅模式的消息队列。 Kafka最新定义&#xff1a;kafka…

iOS UIFont-实现三方字体的下载和使用

UIFont 系列传送门 第一弹加载本地字体:iOS UIFont-新增第三方字体 第二弹加载线上字体:iOS UIFont-实现三方字体的下载和使用 前言 在上一章我们完成啦如何加载使用本地的字体。如果我们有很多的字体可供用户选择,我们当然可以全部使用本地字体加载方式,可是这样就增加了…

在项目中缓存如何优化?SpringCache接口返回值的缓存【CachePut、CacheEvict、Cacheable】

SpringCache 介绍&#xff08;不同的缓存技术有不同的CacheManager&#xff09;注解入门程序环境准备数据库准备环境准备注入CacheManager引导类上加EnableCaching CachePut注解(缓存方法返回值)1). 在save方法上加注解CachePut2). 测试 CacheEvict注解&#xff08;清理指定缓存…

开源AI引擎|信息抽取与文本分类项目案例:提升12345政务投诉处理效率

一、实际案例介绍 采集员案件上报流程是城市管理和问题解决的关键环节&#xff0c;涉及对案件类别的选择、案件来源的记录、详细案件描述的填写以及现场图片的上传。这一流程要求采集员准确、详细地提供案件信息&#xff0c;以便系统能够自动解析关键数据并填写相关内容&#…

ip地址改变导致nacos无法登录的解决方法

ip地址改变导致nacos无法登录的解决方法 在做黑马的springcloud课程里的黑马商城微服务项目时&#xff0c;发现使用nacos的默认账号密码&#xff08;nacos&#xff0c;nacos&#xff09;无法登录&#xff0c;项目里也没报错信息&#xff0c;虽然猜测和ip地址改变有关&#xff0…

算法---动态规划练习-6(地下城游戏)

地下城游戏 1. 题目解析2. 讲解算法原理3. 编写代码 1. 题目解析 题目地址&#xff1a;点这里 2. 讲解算法原理 首先&#xff0c;定义一个二维数组 dp&#xff0c;其中 dp[i][j] 表示从位置 (i, j) 开始到达终点时的最低健康点数。 初始化数组 dp 的边界条件&#xff1a; 对…

机器学习作业二之KNN算法

KNN&#xff08;K- Nearest Neighbor&#xff09;法即K最邻近法&#xff0c;最初由 Cover和Hart于1968年提出&#xff0c;是一个理论上比较成熟的方法&#xff0c;也是最简单的机器学习算法之一。该方法的思路非常简单直观&#xff1a;如果一个样本在特征空间中的K个最相似&…

数字化运维实战手册:构建高效运维体系的方法与实践

一本书掌握数字化运维方法&#xff0c;构建数字化运维体系 数字化转型已经成为大势所趋&#xff0c;各行各业正朝着数字化方向转型&#xff0c;利用数字化转型方法论和前沿科学技术实现降本、提质、增效&#xff0c;从而提升竞争力。 数字化转型是一项长期工作&#xff0c;包含…

Mybatis中QueryWrapper的复杂查询SQL

最近在使用QueryWrapper编写查询语句时发现复杂的SQL不会写。在网上找了半天&#xff0c;终于得到了点启示。在此做个记录以备忘。 我要实现的SQL是这样的&#xff1a; -- 实现这个复杂查询 -- 查询设备表 select * from oa_device where ((dev_code BSD1003 and dev_status…

[flume$1]记录一个启动flume配置的错误

先总结&#xff1a;Flume配置文件后面&#xff0c;不能跟注释 报错代码&#xff1a; [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:158)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: Failed to open…

Termius for Mac/Win:多协议远程管理利器,你的工作效率提升神器

在数字化飞速发展的今天&#xff0c;远程管理已成为企业运营和个人工作不可或缺的一部分。而Termius&#xff0c;作为一款多协议远程管理软件&#xff0c;正以其卓越的性能和便捷的操作&#xff0c;成为广大用户的心头好。 Termius支持多种协议&#xff0c;无论是SSH、RDP还是…

查询优化-提升子查询-UNION类型

瀚高数据库 目录 文档用途 详细信息 文档用途 剖析UNION类型子查询提升的条件和过程 详细信息 注&#xff1a;图片较大&#xff0c;可在浏览器新标签页打开。 SQL: SELECT * FROM score sc, LATERAL(SELECT * FROM student WHERE sno 1 UNION ALL SELECT * FROM student…

企业微信知识库:从了解到搭建的全流程

你是否也有这样的疑惑&#xff1a;为什么现在的企业都爱创建企业微信知识库&#xff1f;企业微信知识库到底有什么用&#xff1f;如果想要使用企业微信知识库企业应该如何创建&#xff1f;这就是我今天要探讨的问题&#xff0c;感兴趣的话一起往下看吧&#xff01; | 为什么企业…

网站业务对接DDoS高防

准备需要接入的网站域名清单&#xff0c;包含网站的源站服务器IP&#xff08;仅支持公网IP的防护&#xff09;、端口信息等。所接入的网站域名必须已完成ICP备案。如果您的网站支持HTTPS协议访问&#xff0c;您需要准备相应的证书和私钥信息&#xff0c;一般包含格式为.crt的公…