- 浏览: 86107 次
- 性别:
- 来自: 深圳
文章分类
最新评论
kubernetes监控--Prometheus
本文基于kubernetes 1.5.2版本编写
kube-state-metrics
kubectl create ns monitoring
kubectl create sa -n monitoring kube-state-metrics
cat << EOF > kube-state-metrics.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: monitoring
spec:
replicas: 1
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics
ports:
- containerPort: 8080
EOF
kubectl create -f kube-state-metrics.yaml
cat << EOF > kube-state-metrics-svc.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: monitoring
labels:
app: kube-state-metrics
spec:
ports:
- name: kube-state-metrics
port: 8080
protocol: TCP
selector:
app: kube-state-metrics
EOF
kubectl create -f kube-state-metrics-svc.yaml
prom-node-exporter
cat << EOF > prom-node-exporter.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: prometheus-node-exporter
namespace: monitoring
labels:
app: prometheus
component: node-exporter
spec:
template:
metadata:
name: prometheus-node-exporter
labels:
app: prometheus
component: node-exporter
spec:
containers:
- image: docker.io/prom/node-exporter:v0.14.0
name: prometheus-node-exporter
ports:
- name: prom-node-exp
#^ must be an IANA_SVC_NAME (at most 15 characters, ..)
containerPort: 9100
hostPort: 9100
hostNetwork: true
hostPID: true
EOF
kubectl create -f prom-node-exporter.yaml
cat << EOF > prom-node-exporter-svc.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: prometheus-node-exporter
namespace: monitoring
labels:
app: prometheus
component: node-exporter
spec:
#clusterIP: None
ports:
- name: prometheus-node-exporter
port: 9100
protocol: TCP
selector:
app: prometheus
component: node-exporter
type: ClusterIP
EOF
kubectl create -f prom-node-exporter-svc.yaml
node-directory-size-metrics
cat node-directory-size-metrics.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-directory-size-metrics
namespace: monitoring
annotations:
description: |
This `DaemonSet` provides metrics in Prometheus format about disk usage on the nodes.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request via `http` on `/metrics` at port `9102` which are the defaults for Prometheus.
These are scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check, just mount them on the `read-du` container below `/mnt`.
spec:
template:
metadata:
labels:
app: node-directory-size-metrics
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
description: |
This `Pod` provides metrics in Prometheus format about disk usage on the node.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request on `/metrics` at port `9102` which are the defaults for Prometheus.
This `Pod` is scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check just mount them on `read-du` below `/mnt`.
spec:
containers:
- name: read-du
image: giantswarm/tiny-tools
imagePullPolicy: Always
# FIXME threshold via env var
# The
command:
- fish
- --command
- |
touch /tmp/metrics-temp
while true
for directory in (du --bytes --separate-dirs --threshold=100M /mnt)
echo $directory | read size path
echo "node_directory_size_bytes{path=\"$path\"} $size" \
>> /tmp/metrics-temp
end
mv /tmp/metrics-temp /tmp/metrics
sleep 300
end
volumeMounts:
- name: host-fs-var
mountPath: /mnt/var
readOnly: true
- name: metrics
mountPath: /tmp
- name: caddy
image: dockermuenster/caddy:latest
command:
- "caddy"
- "-port=9102"
- "-root=/var/www"
ports:
- containerPort: 9102
volumeMounts:
- name: metrics
mountPath: /var/www
volumes:
- name: host-fs-var
hostPath:
path: /var
- name: metrics
emptyDir:
medium: Memory
kubectl create -f node-directory-size-metrics.yaml
prometheus
cat prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: null
name: prometheus-core
namespace: monitoring
data:
prometheus.yaml: |
global:
scrape_interval: 30s
scrape_timeout: 30s
evaluation_interval: 30s
rule_files:
- "/etc/prometheus-rules/*.rules"
scrape_configs:
# https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37
- job_name: 'kubernetes-nodes'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__
# https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
- job_name: 'kubernetes-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
# https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L119
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L156
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: 9\d{3}
kubectl create -f prometheus-configmap.yaml
cat prometheus-rules.yaml
apiVersion: v1
data:
cpu-usage.rules: |
ALERT NodeCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
FOR 2m
LABELS {
severity="page"
}
ANNOTATIONS {
SUMMARY = "{{$labels.instance}}: High CPU usage detected",
DESCRIPTION = "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
}
instance-availability.rules: |
ALERT InstanceDown
IF up == 0
FOR 1m
LABELS { severity = "page" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.",
}
low-disk-space.rules: |
ALERT NodeLowRootDisk
IF ((node_filesystem_size{mountpoint="/root-disk"} - node_filesystem_free{mountpoint="/root-disk"} ) / node_filesystem_size{mountpoint="/root-disk"} * 100) > 75
FOR 2m
LABELS {
severity="page"
}
ANNOTATIONS {
SUMMARY = "{{$labels.instance}}: Low root disk space",
DESCRIPTION = "{{$labels.instance}}: Root disk usage is above 75% (current value is: {{ $value }})"
}
ALERT NodeLowDataDisk
IF ((node_filesystem_size{mountpoint="/data-disk"} - node_filesystem_free{mountpoint="/data-disk"} ) / node_filesystem_size{mountpoint="/data-disk"} * 100) > 75
FOR 2m
LABELS {
severity="page"
}
ANNOTATIONS {
SUMMARY = "{{$labels.instance}}: Low data disk space",
DESCRIPTION = "{{$labels.instance}}: Data disk usage is above 75% (current value is: {{ $value }})"
}
mem-usage.rules: |
ALERT NodeSwapUsage
IF (((node_memory_SwapTotal-node_memory_SwapFree)/node_memory_SwapTotal)*100) > 75
FOR 2m
LABELS {
severity="page"
}
ANNOTATIONS {
SUMMARY = "{{$labels.instance}}: Swap usage detected",
DESCRIPTION = "{{$labels.instance}}: Swap usage usage is above 75% (current value is: {{ $value }})"
}
ALERT NodeMemoryUsage
IF (((node_memory_MemTotal-node_memory_MemFree-node_memory_Cached)/(node_memory_MemTotal)*100)) > 75
FOR 2m
LABELS {
severity="page"
}
ANNOTATIONS {
SUMMARY = "{{$labels.instance}}: High memory usage detected",
DESCRIPTION = "{{$labels.instance}}: Memory usage is above 75% (current value is: {{ $value }})"
}
kind: ConfigMap
metadata:
creationTimestamp: null
name: prometheus-rules
namespace: monitoring
kubectl create -f prometheus-rules.yaml
kubectl create sa prometheus-k8s -n monitoring
cat << EOF > prometheus-core-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus-core
namespace: monitoring
labels:
app: prometheus
component: core
spec:
replicas: 1
template:
metadata:
name: prometheus-main
labels:
app: prometheus
component: core
spec:
serviceAccountName: prometheus-k8s
containers:
- name: prometheus
image: prom/prometheus:v1.7.1
args:
- '-storage.local.retention=12h'
- '-storage.local.memory-chunks=500000'
- '-config.file=/etc/prometheus/prometheus.yaml'
- '-alertmanager.url=http://alertmanager:9093/'
ports:
- name: webui
containerPort: 9090
resources:
requests:
cpu: 500m
memory: 500M
limits:
cpu: 500m
memory: 500M
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: rules-volume
mountPath: /etc/prometheus-rules
volumes:
- name: config-volume
configMap:
name: prometheus-core
- name: rules-volume
configMap:
name: prometheus-rules
EOF
kubectl create -f prometheus-core-deploy.yaml
cat << EOF > prometheus-core-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
component: core
annotations:
prometheus.io/scrape: 'true'
spec:
type: NodePort
ports:
- port: 9090
protocol: TCP
name: webui
selector:
app: prometheus
component: core
EOF
kubectl create -f prometheus-core-service.yaml
grafana
cat << EOF > grafana-core-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana-core
namespace: monitoring
labels:
app: grafana
component: core
spec:
replicas: 1
template:
metadata:
labels:
app: grafana
component: core
spec:
containers:
- image: docker.io/grafana/grafana:latest
name: grafana-core
# env:
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
ports:
- name: grafana
containerPort: 3000
env:
# This variable is required to setup templates in Grafana.
# The following env variables are required to make Grafana accessible via
# the kubernetes api-server proxy. On production clusters, we recommend
# removing these env variables, setup auth for grafana, and expose the grafana
# service using a LoadBalancer or a public IP.
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
# - name: GF_SERVER_ROOT_URL
# value: /api/v1/proxy/namespaces/monitoring/services/grafana/
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var
volumes:
- name: grafana-persistent-storage
hostPath:
emptyDir: {}
#path: /grafanaData
EOF
kubectl create -f grafana-core-deployment.yaml
cat << EOF > grafana-core-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
component: core
# annotations:
# prometheus.io/scrape: 'true'
spec:
type: NodePort
ports:
- port: 3000
nodePort: 31000
selector:
app: grafana
component: core
EOF
kubectl create -f grafana-core-service.yaml
grafana模板
{
"annotations": {
"list": []
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": 21,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"height": 282,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 0,
"height": "",
"hideTimeOverride": false,
"id": 1,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": true,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [],
"minSpan": null,
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_usage_bytes{pod_name=\"$pod\", namespace=\"$namespace\"}) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "total",
"refId": "B",
"step": 30
},
{
"expr": "sum(container_memory_rss{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"hide": false,
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "rss",
"metric": "",
"refId": "A",
"step": 30
},
{
"expr": "sum(container_memory_cache{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "cache",
"refId": "D",
"step": 30
},
{
"expr": "sum(container_memory_swap{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "swap",
"refId": "C",
"step": 30
},
{
"expr": "sum(container_memory_failures_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"hide": true,
"intervalFactor": 2,
"legendFormat": "failures_total",
"refId": "E",
"step": 20
},
{
"expr": "sum(container_memory_failcnt{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"hide": true,
"intervalFactor": 2,
"legendFormat": "failcnt",
"refId": "F",
"step": 20
},
{
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"hide": true,
"intervalFactor": 2,
"legendFormat": "working_set",
"refId": "G",
"step": 20
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "内存使用量",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transparent": false,
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
"total"
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"decimals": null,
"description": "获取CPU资源使用情况,判断方式现在时刻和一分钟前的数据进行对比。",
"fill": 0,
"id": 2,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": true,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [],
"minSpan": null,
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_cpu_system_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"hide": false,
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "system",
"metric": "",
"refId": "A",
"step": 30
},
{
"expr": "sum(rate(container_cpu_user_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "user",
"refId": "C",
"step": 30
},
{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "total",
"refId": "B",
"step": 30
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "CPU使用量(核)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"transparent": false,
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
"total"
]
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "所有pod",
"titleSize": "h6"
},
{
"collapse": false,
"height": 199,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 1,
"id": 6,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_network_receive_packets_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "in",
"refId": "A",
"step": 120
},
{
"expr": "sum(rate(container_network_transmit_packets_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "out",
"refId": "B",
"step": 120
},
{
"expr": "sum(rate(container_network_receive_errors_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_transmit_errors_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_receive_packets_dropped_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name) + sum(rate(container_network_transmit_packets_dropped_total{namespace=\"$namespace\",pod_name=\"$pod\"}[1m])) by (namespace,pod_name)",
"format": "time_series",
"interval": "1m",
"intervalFactor": 1,
"legendFormat": "error",
"refId": "C",
"step": 60
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "数据包",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 0,
"id": 5,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_network_receive_bytes_total{pod_name=\"$pod\", namespace=\"$namespace\"}[1m])*1) by (namespace,container_name)",
"format": "time_series",
"interval": "1m",
"intervalFactor": 1,
"legendFormat": "in",
"metric": "",
"refId": "A",
"step": 60
},
{
"expr": "sum(rate(container_network_transmit_bytes_total{pod_name=\"$pod\", namespace=\"$namespace\"}[1m])*1) by (namespace,container_name)",
"format": "time_series",
"interval": "1m",
"intervalFactor": 1,
"legendFormat": "out",
"refId": "B",
"step": 60
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "网络流量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"height": 163,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 0,
"hideTimeOverride": false,
"id": 3,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sortDesc": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_fs_limit_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "total",
"refId": "A",
"step": 4
},
{
"expr": "sum(container_fs_usage_bytes{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "usage",
"refId": "B",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "硬盘使用量",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 0,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_fs_reads_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "read",
"refId": "A",
"step": 30
},
{
"expr": "sum(container_fs_writes_total{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"interval": "30s",
"intervalFactor": 1,
"legendFormat": "write",
"refId": "B",
"step": 30
},
{
"expr": "sum(container_fs_io_current{namespace=\"$namespace\",pod_name=\"$pod\"}) by (namespace,pod_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "current",
"refId": "C",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "IO",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": null,
"current": {
"selected": true,
"text": "monitoring",
"value": "monitoring"
},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": "label_values(kube_pod_info{namespace=~\".+\"},namespace)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"selected": true,
"text": "kube-state-metrics-2802505745",
"value": "kube-state-metrics-2802505745"
},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "创建者",
"multi": false,
"name": "created_by_name",
"options": [],
"query": "label_values(kube_pod_info{created_by_name=~\".+\",namespace=\"$namespace\"} ,created_by_name)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"selected": true,
"text": "kube-state-metrics-2802505745-k24zk",
"value": "kube-state-metrics-2802505745-k24zk"
},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "pod",
"multi": false,
"name": "pod",
"options": [],
"query": "label_values(kube_pod_info{created_by_name=\"$created_by_name\",namespace=\"$namespace\",pod=~\".+\"} ,pod)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-30m",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "基于资源对象对pod进行监控",
"version": 5
}
相关推荐
7. **监控和日志**:Kubernetes 1.15提供了基本的监控指标,但通常需要集成Prometheus、Grafana等工具来获取详细的性能数据。日志管理同样重要,可利用Elasticsearch和Kibana进行收集和分析。 8. **应用部署**:...
`kube-prometheus-stack` 是一个流行的 Kubernetes 监控解决方案,它结合了 Prometheus 和 Grafana 的强大功能,提供了全面的 K8S 集群监控和告警能力。本资源合集将详细介绍如何基于 `kube-prometheus-stack` 部署...
总的来说,`kube-prometheus-release-0.12`提供了一套完整的解决方案,用于在Kubernetes 1.24和1.25版本上快速部署和配置Prometheus和Grafana监控。通过这个方案,你可以深入洞察你的集群,及时发现和解决问题,从而...
此外,Kubernetes社区还发展了一大批周边工具和服务,如Helm包管理器、Prometheus监控系统、Fluentd日志收集、Istio服务网格等,共同构成了强大的云原生生态。学习和掌握Kubernetes对于现代云基础设施的管理和运维至...
带你快速了解kubernetes部署prometheus监控kube-prometheus-stack-62.6.0.tgz
Kubernetes Dashboard 是 Kubernetes 集群的一个重要组件,它提供了一个图形化的用户界面,使得集群...在实际工作中,结合 Dashboard 与其他工具,如 Helm、Prometheus 等,可以构建出强大的集群管理和监控解决方案。
- `prometheus-k8s-service-monitor.yaml`:这个文件可能包含了 Kubernetes ServiceMonitor 资源定义,ServiceMonitor 是一个 CRD(Custom Resource Definition),用于告诉 Prometheus 如何发现和拉取 Kubernetes ...
【标题】"kube-prometheus-0.8.0.tar.gz" 是一个压缩包,其中包含了 Kubernetes 集群监控解决方案 kube-prometheus 的0.8.0版本。这个工具集是 Prometheus 和 Kubernetes 监控生态的重要组成部分,它允许用户在 ...
"prometheus-webhook-dingtalk-1.4.0" 是一个针对 Prometheus 的告警插件,专门用于将告警信息推送至 DingTalk(钉钉)平台,使得运维团队可以及时收到警告并进行响应。 一键部署这个插件,意味着用户可以通过简单...
带你快速了解kubernetes部署prometheus监控prometheus-62.6.0.tar
Grafana Kubernetes App可监控Kubernetes集群的性能。它包括4个仪表板,即集群,节点,Pod/容器和部署。它允许自动部署所需的Prometheus导出器,并使用默认的scrape配置与您的集群内Prometheus部署一起使用。收集的...
Kubernetes监控是确保集群稳定运行的关键。Prometheus可以监控以下关键领域: 1. 节点健康:监控每个节点的CPU、内存、磁盘空间和网络资源。 2. Pod状态:跟踪Pod的启动、重启、运行时间和资源消耗。 3. 应用性能:...
Prometheus+Grafana监控Kubernetes-配套yaml.zip grafana.yaml、Kubernetes Pod Resources.json、namespace.yaml、node-exporter.yaml、prometheus.yaml
kubernetes部署prometheus监控nginx-exporter-1.3.0-debian-12-r2.tar
首先,从GitHub仓库下载最新的blackbox-exporter资源清单文件(yaml),或者你可以使用提供的`prometheus-blackbox-exporter`压缩包。资源清单通常包含一个Deployment和一个Service,定义了blackbox-exporter的实例...
带你快速了解kubernetes部署prometheus监控kube-state-metrics-2.13.0.tar
值得注意的是,Kubernetes强调可扩展性和模块化设计,允许用户根据自身需求定制和集成各种工具,如Ingress Controller(提供外部访问服务)、Prometheus(监控与报警)、Jaeger(追踪)等,构建出强大的微服务架构。...
5. **监控和日志**:讲解如何集成Prometheus、Grafana等工具进行Kubernetes集群的性能监控,以及如何设置日志收集和分析。 6. **安全性与认证**:讨论Kubernetes的认证和授权机制,如ServiceAccount、RBAC(Role-...