`
lykops
  • 浏览: 89280 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
文章分类
社区版块
存档分类
最新评论

kubernetes监控--Prometheus

 
阅读更多

本文基于kubernetes 1.5.2版本编写

kube-state-metrics

kubectl create ns monitoring
kubectl create sa -n monitoring kube-state-metrics

cat << EOF > kube-state-metrics.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: monitoring
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics
        ports:
        - containerPort: 8080
EOF
kubectl create -f kube-state-metrics.yaml

cat << EOF > kube-state-metrics-svc.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: monitoring
  labels:
    app: kube-state-metrics
spec:
  ports:
  - name: kube-state-metrics
    port: 8080
    protocol: TCP
  selector:
    app: kube-state-metrics
EOF
kubectl create -f kube-state-metrics-svc.yaml

prom-node-exporter

cat << EOF > prom-node-exporter.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  namespace: monitoring
  labels:
    app: prometheus
    component: node-exporter
spec:
  template:
    metadata:
      name: prometheus-node-exporter
      labels:
        app: prometheus
        component: node-exporter
    spec:
      containers:
      - image: docker.io/prom/node-exporter:v0.14.0
        name: prometheus-node-exporter
        ports:
        - name: prom-node-exp
          #^ must be an IANA_SVC_NAME (at most 15 characters, ..)
          containerPort: 9100
          hostPort: 9100
      hostNetwork: true
      hostPID: true
EOF
kubectl create -f prom-node-exporter.yaml 


cat << EOF > prom-node-exporter-svc.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: prometheus-node-exporter
  namespace: monitoring
  labels:
    app: prometheus
    component: node-exporter
spec:
  #clusterIP: None
  ports:
    - name: prometheus-node-exporter
      port: 9100
      protocol: TCP
  selector:
    app: prometheus
    component: node-exporter
  type: ClusterIP
EOF
kubectl create -f prom-node-exporter-svc.yaml 

node-directory-size-metrics

cat node-directory-size-metrics.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-directory-size-metrics
  namespace: monitoring
  annotations:
    description: |
      This `DaemonSet` provides metrics in Prometheus format about disk usage on the nodes.
      The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
      The other container `caddy` just hands out the contents of that file on request via `http` on `/metrics` at port `9102` which are the defaults for Prometheus.
      These are scheduled on every node in the Kubernetes cluster.
      To choose directories from the node to check, just mount them on the `read-du` container below `/mnt`.
spec:
  template:
    metadata:
      labels:
        app: node-directory-size-metrics
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9102'
        description: |
          This `Pod` provides metrics in Prometheus format about disk usage on the node.
          The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
          The other container `caddy` just hands out the contents of that file on request on `/metrics` at port `9102` which are the defaults for Prometheus.
          This `Pod` is scheduled on every node in the Kubernetes cluster.
          To choose directories from the node to check just mount them on `read-du` below `/mnt`.
    spec:
      containers:
      - name: read-du
        image: giantswarm/tiny-tools
        imagePullPolicy: Always
        # FIXME threshold via env var
        # The
        command:
        - fish
        - --command
        - |
          touch /tmp/metrics-temp
          while true
            for directory in (du --bytes --separate-dirs --threshold=100M /mnt)
              echo $directory | read size path
              echo "node_directory_size_bytes{path=\"$path\"} $size" \
                >> /tmp/metrics-temp
            end
            mv /tmp/metrics-temp /tmp/metrics
            sleep 300
          end
        volumeMounts:
        - name: host-fs-var
          mountPath: /mnt/var
          readOnly: true
        - name: metrics
          mountPath: /tmp
      - name: caddy
        image: dockermuenster/caddy:latest
        command:
        - "caddy"
        - "-port=9102"
        - "-root=/var/www"
        ports:
        - containerPort: 9102
        volumeMounts:
        - name: metrics
          mountPath: /var/www
      volumes:
      - name: host-fs-var
        hostPath:
          path: /var
      - name: metrics
        emptyDir:
          medium: Memory
kubectl create -f node-directory-size-metrics.yaml

prometheus

cat prometheus-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: prometheus-core
  namespace: monitoring
data:
  prometheus.yaml: |
    global:
      scrape_interval: 30s
      scrape_timeout: 30s
      evaluation_interval: 30s
    rule_files:
      - "/etc/prometheus-rules/*.rules"
    scrape_configs:

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37
      - job_name: 'kubernetes-nodes'
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:10255'
            target_label: __address__

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
      - job_name: 'kubernetes-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: (.+)(?::\d+);(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L119
      - job_name: 'kubernetes-services'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: kubernetes_name

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L156
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)
            replacement: ${1}:${2}
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
          - source_labels: [__meta_kubernetes_pod_container_port_number]
            action: keep
            regex: 9\d{3}
kubectl create -f prometheus-configmap.yaml 


cat prometheus-rules.yaml 
apiVersion: v1
data:
  cpu-usage.rules: |
    ALERT NodeCPUUsage
      IF (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
      FOR 2m
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
        SUMMARY = "{{$labels.instance}}: High CPU usage detected",
        DESCRIPTION = "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
      }
  instance-availability.rules: |
    ALERT InstanceDown
      IF up == 0
      FOR 1m
      LABELS { severity = "page" }
      ANNOTATIONS {
        summary = "Instance {{ $labels.instance }} down",
        description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.",
      }
  low-disk-space.rules: |
    ALERT NodeLowRootDisk
      IF ((node_filesystem_size{mountpoint="/root-disk"} - node_filesystem_free{mountpoint="/root-disk"} ) / node_filesystem_size{mountpoint="/root-disk"} * 100) > 75
      FOR 2m
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
        SUMMARY = "{{$labels.instance}}: Low root disk space",
        DESCRIPTION = "{{$labels.instance}}: Root disk usage is above 75% (current value is: {{ $value }})"
      }

    ALERT NodeLowDataDisk
      IF ((node_filesystem_size{mountpoint="/data-disk"} - node_filesystem_free{mountpoint="/data-disk"} ) / node_filesystem_size{mountpoint="/data-disk"} * 100) > 75
      FOR 2m
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
        SUMMARY = "{{$labels.instance}}: Low data disk space",
        DESCRIPTION = "{{$labels.instance}}: Data disk usage is above 75% (current value is: {{ $value }})"
      }
  mem-usage.rules: |
    ALERT NodeSwapUsage
      IF (((node_memory_SwapTotal-node_memory_SwapFree)/node_memory_SwapTotal)*100) > 75
      FOR 2m
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
        SUMMARY = "{{$labels.instance}}: Swap usage detected",
        DESCRIPTION = "{{$labels.instance}}: Swap usage usage is above 75% (current value is: {{ $value }})"
      }

    ALERT NodeMemoryUsage
      IF (((node_memory_MemTotal-node_memory_MemFree-node_memory_Cached)/(node_memory_MemTotal)*100)) > 75
      FOR 2m
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
        SUMMARY = "{{$labels.instance}}: High memory usage detected",
        DESCRIPTION = "{{$labels.instance}}: Memory usage is above 75% (current value is: {{ $value }})"
      }
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: prometheus-rules
  namespace: monitoring
kubectl create -f prometheus-rules.yaml

kubectl create sa prometheus-k8s -n monitoring


cat << EOF > prometheus-core-deploy.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-core
  namespace: monitoring
  labels:
    app: prometheus
    component: core
spec:
  replicas: 1
  template:
    metadata:
      name: prometheus-main
      labels:
        app: prometheus
        component: core
    spec:
      serviceAccountName: prometheus-k8s
      containers:
      - name: prometheus
        image: prom/prometheus:v1.7.1
        args:
          - '-storage.local.retention=12h'
          - '-storage.local.memory-chunks=500000'
          - '-config.file=/etc/prometheus/prometheus.yaml'
          - '-alertmanager.url=http://alertmanager:9093/'
        ports:
        - name: webui
          containerPort: 9090
        resources:
          requests:
            cpu: 500m
            memory: 500M
          limits:
            cpu: 500m
            memory: 500M
        volumeMounts:
        - name: config-volume
          mountPath: /etc/prometheus
        - name: rules-volume
          mountPath: /etc/prometheus-rules
      volumes:
      - name: config-volume
        configMap:
          name: prometheus-core
      - name: rules-volume
        configMap:
          name: prometheus-rules
EOF
kubectl create -f prometheus-core-deploy.yaml 


cat << EOF > prometheus-core-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
    component: core
  annotations:
    prometheus.io/scrape: 'true'
spec:
  type: NodePort
  ports:
    - port: 9090
      protocol: TCP
      name: webui
  selector:
    app: prometheus
    component: core
EOF
kubectl create -f prometheus-core-service.yaml

grafana

cat << EOF > grafana-core-deployment.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: grafana-core
  namespace: monitoring
  labels:
    app: grafana
    component: core
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: grafana
        component: core
    spec:
      containers:
        - image: docker.io/grafana/grafana:latest
          name: grafana-core
          # env:
          resources:
            # keep request = limit to keep this container in guaranteed class
            limits:
              cpu: 100m
              memory: 100Mi
            requests:
              cpu: 100m
              memory: 100Mi
          ports:
            - name: grafana
              containerPort: 3000
          env:
            # This variable is required to setup templates in Grafana.
              # The following env variables are required to make Grafana accessible via
              # the kubernetes api-server proxy. On production clusters, we recommend
              # removing these env variables, setup auth for grafana, and expose the grafana
              # service using a LoadBalancer or a public IP.
            - name: GF_AUTH_BASIC_ENABLED
              value: "false"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
              value: Admin
            # - name: GF_SERVER_ROOT_URL
            #   value: /api/v1/proxy/namespaces/monitoring/services/grafana/
          volumeMounts:
          - name: grafana-persistent-storage
            mountPath: /var
      volumes:
      - name: grafana-persistent-storage
        hostPath:
          emptyDir: {}
          #path: /grafanaData
EOF
kubectl create -f grafana-core-deployment.yaml 


cat << EOF > grafana-core-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
  labels:
    app: grafana
    component: core
  # annotations:
  #   prometheus.io/scrape: 'true'
spec:
  type: NodePort
  ports:
    - port: 3000
      nodePort: 31000
  selector:
    app: grafana
    component: core
EOF
kubectl create -f grafana-core-service.yaml 

grafana模板

{
  "annotations": {
    "list": []
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "hideControls": false,
  "id": 21,
  "links": [],
  "refresh": false,
  "rows": [
    {
      "collapse": false,
      "height": 282,
      "panels": [
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "fill": 0,
          "height": "",
          "hideTimeOverride": false,
          "id": 1,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": true,
            "hideEmpty": false,
            "hideZero": false,
            "max": true,
            "min": false,
            "rightSide": true,
            "show": true,
            "sideWidth": null,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "minSpan": null,
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(container_memory_usage_bytes{pod_name=\"$pod\", namespace=\"$namespace\"}) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "total",
              "refId": "B",
              "step": 30
            },
            {
              "expr": "sum(container_memory_rss{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "hide": false,
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "rss",
              "metric": "",
              "refId": "A",
              "step": 30
            },
            {
              "expr": "sum(container_memory_cache{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "cache",
              "refId": "D",
              "step": 30
            },
            {
              "expr": "sum(container_memory_swap{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "swap",
              "refId": "C",
              "step": 30
            },
            {
              "expr": "sum(container_memory_failures_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 2,
              "legendFormat": "failures_total",
              "refId": "E",
              "step": 20
            },
            {
              "expr": "sum(container_memory_failcnt{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 2,
              "legendFormat": "failcnt",
              "refId": "F",
              "step": 20
            },
            {
              "expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 2,
              "legendFormat": "working_set",
              "refId": "G",
              "step": 20
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "内存使用量",
          "tooltip": {
            "shared": false,
            "sort": 0,
            "value_type": "individual"
          },
          "transparent": false,
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": [
              "total"
            ]
          },
          "yaxes": [
            {
              "format": "bytes",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "bytes",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "decimals": null,
          "description": "获取CPU资源使用情况,判断方式现在时刻和一分钟前的数据进行对比。",
          "fill": 0,
          "id": 2,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": true,
            "hideEmpty": false,
            "hideZero": false,
            "max": true,
            "min": false,
            "rightSide": true,
            "show": true,
            "sideWidth": null,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "minSpan": null,
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": true,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(rate(container_cpu_system_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "hide": false,
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "system",
              "metric": "",
              "refId": "A",
              "step": 30
            },
            {
              "expr": "sum(rate(container_cpu_user_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "user",
              "refId": "C",
              "step": 30
            },
            {
              "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "total",
              "refId": "B",
              "step": 30
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "CPU使用量(核)",
          "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
          },
          "transparent": false,
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": [
              "total"
            ]
          },
          "yaxes": [
            {
              "format": "short",
              "label": "",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        }
      ],
      "repeat": null,
      "repeatIteration": null,
      "repeatRowId": null,
      "showTitle": false,
      "title": "所有pod",
      "titleSize": "h6"
    },
    {
      "collapse": false,
      "height": 199,
      "panels": [
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "fill": 1,
          "id": 6,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "total": false,
            "values": false
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(rate(container_network_receive_packets_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "1m",
              "intervalFactor": 2,
              "legendFormat": "in",
              "refId": "A",
              "step": 120
            },
            {
              "expr": "sum(rate(container_network_transmit_packets_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "1m",
              "intervalFactor": 2,
              "legendFormat": "out",
              "refId": "B",
              "step": 120
            },
            {
              "expr": "sum(rate(container_network_receive_errors_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_transmit_errors_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_receive_packets_dropped_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_transmit_packets_dropped_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "1m",
              "intervalFactor": 1,
              "legendFormat": "error",
              "refId": "C",
              "step": 60
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "数据包",
          "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "pps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "pps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "fill": 0,
          "id": 5,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "total": false,
            "values": false
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(rate(container_network_receive_bytes_total{pod_name=\"$pod\", namespace=\"$namespace\"}[1m])*1) by (namespace,container_name)",
              "format": "time_series",
              "interval": "1m",
              "intervalFactor": 1,
              "legendFormat": "in",
              "metric": "",
              "refId": "A",
              "step": 60
            },
            {
              "expr": "sum(rate(container_network_transmit_bytes_total{pod_name=\"$pod\", namespace=\"$namespace\"}[1m])*1) by (namespace,container_name)",
              "format": "time_series",
              "interval": "1m",
              "intervalFactor": 1,
              "legendFormat": "out",
              "refId": "B",
              "step": 60
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "网络流量",
          "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "bps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "bps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        }
      ],
      "repeat": null,
      "repeatIteration": null,
      "repeatRowId": null,
      "showTitle": false,
      "title": "Dashboard Row",
      "titleSize": "h6"
    },
    {
      "collapse": false,
      "height": 163,
      "panels": [
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "fill": 0,
          "hideTimeOverride": false,
          "id": 3,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "sortDesc": true,
            "total": false,
            "values": false
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(container_fs_limit_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "intervalFactor": 2,
              "legendFormat": "total",
              "refId": "A",
              "step": 4
            },
            {
              "expr": "sum(container_fs_usage_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "intervalFactor": 2,
              "legendFormat": "usage",
              "refId": "B",
              "step": 4
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "硬盘使用量",
          "tooltip": {
            "shared": false,
            "sort": 0,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "decbytes",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": null,
          "fill": 0,
          "id": 4,
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "total": false,
            "values": false
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "span": 6,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(container_fs_reads_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "read",
              "refId": "A",
              "step": 30
            },
            {
              "expr": "sum(container_fs_writes_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "interval": "30s",
              "intervalFactor": 1,
              "legendFormat": "write",
              "refId": "B",
              "step": 30
            },
            {
              "expr": "sum(container_fs_io_current{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
              "format": "time_series",
              "intervalFactor": 2,
              "legendFormat": "current",
              "refId": "C",
              "step": 4
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeShift": null,
          "title": "IO",
          "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ]
        }
      ],
      "repeat": null,
      "repeatIteration": null,
      "repeatRowId": null,
      "showTitle": false,
      "title": "Dashboard Row",
      "titleSize": "h6"
    }
  ],
  "schemaVersion": 14,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": "monitoring",
          "value": "monitoring"
        },
        "datasource": "prometheus",
        "hide": 0,
        "includeAll": false,
        "label": "namespace",
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": "label_values(kube_pod_info{namespace=~\".+\"},namespace)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": "kube-state-metrics-2802505745",
          "value": "kube-state-metrics-2802505745"
        },
        "datasource": "prometheus",
        "hide": 0,
        "includeAll": false,
        "label": "创建者",
        "multi": false,
        "name": "created_by_name",
        "options": [],
        "query": "label_values(kube_pod_info{created_by_name=~\".+\",namespace=\"$namespace\"} ,created_by_name)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": "kube-state-metrics-2802505745-k24zk",
          "value": "kube-state-metrics-2802505745-k24zk"
        },
        "datasource": "prometheus",
        "hide": 0,
        "includeAll": false,
        "label": "pod",
        "multi": false,
        "name": "pod",
        "options": [],
        "query": "label_values(kube_pod_info{created_by_name=\"$created_by_name\",namespace=\"$namespace\",pod=~\".+\"} ,pod)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-30m",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "基于资源对象对pod进行监控",
  "version": 5
}
分享到:
评论

相关推荐

    kubernetes-server-linux-amd64-v1.15.4.tar.gz

    7. **监控和日志**:Kubernetes 1.15提供了基本的监控指标,但通常需要集成Prometheus、Grafana等工具来获取详细的性能数据。日志管理同样重要,可利用Elasticsearch和Kibana进行收集和分析。 8. **应用部署**:...

    基于kube-prometheus-stack部署监控K8S告警系统资源合集

    `kube-prometheus-stack` 是一个流行的 Kubernetes 监控解决方案,它结合了 Prometheus 和 Grafana 的强大功能,提供了全面的 K8S 集群监控和告警能力。本资源合集将详细介绍如何基于 `kube-prometheus-stack` 部署...

    kube-prometheus-0.8.0.tar.gz

    【标题】"kube-prometheus-0.8.0.tar.gz" 是一个压缩包,其中包含了 Kubernetes 集群监控解决方案 kube-prometheus 的0.8.0版本。这个工具集是 Prometheus 和 Kubernetes 监控生态的重要组成部分,它允许用户在 ...

    kube-prometheus-release-0.12部署yaml文件

    总的来说,`kube-prometheus-release-0.12`提供了一套完整的解决方案,用于在Kubernetes 1.24和1.25版本上快速部署和配置Prometheus和Grafana监控。通过这个方案,你可以深入洞察你的集群,及时发现和解决问题,从而...

    kubernetes-server-linux-amd64.tar.gz

    此外,Kubernetes社区还发展了一大批周边工具和服务,如Helm包管理器、Prometheus监控系统、Fluentd日志收集、Istio服务网格等,共同构成了强大的云原生生态。学习和掌握Kubernetes对于现代云基础设施的管理和运维至...

    带你快速了解kubernetes部署prometheus监控kube-prometheus-stack-62.6.0.tgz

    带你快速了解kubernetes部署prometheus监控kube-prometheus-stack-62.6.0.tgz

    kubernetes-dashboard-v1.8.3 yaml文件

    Kubernetes Dashboard 是 Kubernetes 集群的一个重要组件,它提供了一个图形化的用户界面,使得集群...在实际工作中,结合 Dashboard 与其他工具,如 Helm、Prometheus 等,可以构建出强大的集群管理和监控解决方案。

    k8s-prometheus-配置文件.zip

    - `prometheus-k8s-service-monitor.yaml`:这个文件可能包含了 Kubernetes ServiceMonitor 资源定义,ServiceMonitor 是一个 CRD(Custom Resource Definition),用于告诉 Prometheus 如何发现和拉取 Kubernetes ...

    prometheus-webhook-dingtalk-1.4.0告警插件一键部署

    "prometheus-webhook-dingtalk-1.4.0" 是一个针对 Prometheus 的告警插件,专门用于将告警信息推送至 DingTalk(钉钉)平台,使得运维团队可以及时收到警告并进行响应。 一键部署这个插件,意味着用户可以通过简单...

    grafana-kubernetes-app-31da38a.zip

    Grafana Kubernetes App可监控Kubernetes集群的性能。它包括4个仪表板,即集群,节点,Pod/容器和部署。它允许自动部署所需的Prometheus导出器,并使用默认的scrape配置与您的集群内Prometheus部署一起使用。收集的...

    Prometheus+Grafana监控Kubernetes-配套yaml20200524.zip

    Kubernetes监控是确保集群稳定运行的关键。Prometheus可以监控以下关键领域: 1. 节点健康:监控每个节点的CPU、内存、磁盘空间和网络资源。 2. Pod状态:跟踪Pod的启动、重启、运行时间和资源消耗。 3. 应用性能:...

    Prometheus+Grafana监控Kubernetes-配套yaml.zip

    Prometheus+Grafana监控Kubernetes-配套yaml.zip grafana.yaml、Kubernetes Pod Resources.json、namespace.yaml、node-exporter.yaml、prometheus.yaml

    带你快速了解kubernetes部署prometheus监控prometheus-62.6.0.tar 

    在容器化的大潮下,结合Kubernetes与Prometheus的使用变得越来越普遍,特别是在服务网格化、微服务架构越来越流行的今天,高效的监控与预警系统变得至关重要。 为了在Kubernetes环境中部署Prometheus,首先需要理解...

    prometheus-3.2.1.linux-amd64安装包

    由于Prometheus提供了跨平台的支持,因此在不同操作系统上安装时会有不同的安装包,而这里的文件名“prometheus-3.2.1.linux-amd64”明确指出了这是一个适用于64位Linux系统的安装包。 Prometheus的工作方式是通过...

    prometheus-2.54.1.linux-amd64.tar

    在Kubernetes中,Prometheus与ServiceMonitor资源配合使用,可以动态发现并监控Kubernetes中的服务。 由于其灵活、可扩展的特性,Prometheus已经成为云原生监控解决方案的核心组件。它在运维工程师和开发人员中非常...

    K8S主机Prometheus监控blackbox-exporter(kubernetes_sd_files)资源清单及镜像文件

    首先,从GitHub仓库下载最新的blackbox-exporter资源清单文件(yaml),或者你可以使用提供的`prometheus-blackbox-exporter`压缩包。资源清单通常包含一个Deployment和一个Service,定义了blackbox-exporter的实例...

    kubernetes-server-linux-amd64 .tar.gz

    值得注意的是,Kubernetes强调可扩展性和模块化设计,允许用户根据自身需求定制和集成各种工具,如Ingress Controller(提供外部访问服务)、Prometheus(监控与报警)、Jaeger(追踪)等,构建出强大的微服务架构。...

    kubernetes-in-action-master.zip

    5. **监控和日志**:讲解如何集成Prometheus、Grafana等工具进行Kubernetes集群的性能监控,以及如何设置日志收集和分析。 6. **安全性与认证**:讨论Kubernetes的认证和授权机制,如ServiceAccount、RBAC(Role-...

Global site tag (gtag.js) - Google Analytics