Jonnyan的原创笔记
alpine
alpine里python安装mssql笔记
Alpine linux如何配置和管理自定义服务
windows
window server2012远程授权重置
window获取本机所有IP
window远程桌面RDP加速方案
远程监控 Win10 资源占用
windows 下 mysql 区分大小写敏感问题
window下navicat无限试用脚本
win11恢复win10右键菜单样式
永久禁止windows更新
强制本地账户安装win10/11
sqlserver(2012)在线清理tempdb
Linux
解决openvpn的CRL has expired笔记
centos7.x配置时间服务器(chrony)
centos7.x下安装wireguard
解决influxdb的log日志输出位置
保存 iptable 规则并开机自动加载 | SA-Logs
kafka笔记
kafka的server.properties 配置文件参数说明
CentOS 和 RedHat 下 8 个最常用的 YUM 库
外网IP查询网站
VirtualBox Ubuntu20/centos7 命令行如何扩容分区磁盘
如何备份sqlite数据库
yum 安装 redis5/mq/consul
centos7.x 安装 docker-ce
zabbix4.2 的 yum+mariadb 方式部署安装
如何在 Linux 中查找最大的 10 个文件
mongodb 备份与还原操作
Linux 高频工具快速教程
yum 安装 influxdb/telegraf
ubuntu 14.04/16.04/18.04 yum 安装 zabbix-agent 教程
逃不掉的 mysql 数据库安装方式大全 yum rpm 源码
VIM 配置入门
find 命令结合 cp bash mv 命令使用的 4 种方式
Tomcat nginx log 日志按天分割切割
linux 和 pycharm 下终端彩色打印输出
centos5/6/7 下 yum 安装 zabbix-agent(被控端)
shell 脚本头,#!/bin/sh 与 #!/bin/bash 的区别.
electerm/tabby在执行screen命令后不显示滚动条
aws ec2 安装caddy2
No usable version of libssl was found
python
python virtualenv笔记
python配置文件INI/TOML/YAML/ENV的区别
python限制函数的执行时间
python里and和or的理解
SQLite is not a toy database | Anton Zhiyanov
四行代码实现 Python 管道 - Aber's blog
systemd管理虚拟环境Django+uwsgi+nginx配置教程
Linux shell命令创建python django用户
nginx子路径下反代运行多个django
django web 应用 runserver 模式下 cpu 占用高解决办法
解决 pip 安装模块报错 Cannot fetch index base URL http://pypi.python.org/simple/
docker
仅在首次启动时在Docker容器中运行命令
Docker多平台架构镜像构建
解决cadvisor监控内存值与docker stats命令值不一致问题
docker 清理指定日期之前的镜像
docker 部署 graylog 使用教程
docker 一键搭建 zerotier-moon 节点
alpine的docker镜像安装mysql/mariadb/redis
dockerfile 多阶段构建参考
Warning: Stopping docker.service, but it can still be activated by: docker.socket
nginx
Nginx限制并发连接数与下载速度
nginx仅允许域名访问禁止IP访问
Nginx 强制跳转 Https
nginx强制跳转https无限301循环问题
万字总结,带你全面系统的认识 Nginx
linux 下编译安装 nginx 完整版
解决 nginx 同端口强制跳转 https 配置 ssl 证书问题
nginx 关闭日志功能 access_log 关闭
基于 nginx 的 token 认证
杂记
小米手机MIUI12安装Google服务
使用sphinx+markdown+readthedocs+github来编写文档
N1由armbian直刷openwrt
N1安装docker版本的openwrt做旁路由
NUC10 i3/i5/i7系列开启局域网wol唤醒
威联通qnap安装nginx
威联通qnap配置开机自启动项
telegram bot python使用示例教程
四款paste临时文本分享平台
docker部署微力同步(verysync)
Android和IOS自部署通知程序
苹果M1如何科学上网
M1 mac iterm2配置lrzsz命令
漫威轮播
网件XR500/R7800刷机
DIY 编译 openwrt 固件
苹果 mac 版微软官方远程连接工具下载 Microsoft Remote Desktop For Mac
wireguard 实现 peer 互联, NAT to NAT
学习本来的样子
解决 aws ec2 的 centos7 设置时区无效
redis 问题优化
N1 如何完美刷入 armbian 系统教程
v2rayN 的 pac 简单规则
博客园 markdown 使用折叠语法和颜色标签
十年感悟之 python 之路
在浏览器输入 URL 回车后发生了什么?
grafana 里 prometheus 查询语法
国内开源镜像站点汇总
解决阿里云部署 office web apps ApplicationFailedException 报错问题
解决 mac 休眠睡眠异常耗电方法
jira 集成 fisheye 代码深度查看工具安装绿色版
阿里云 ecs 开启 x11 图形化桌面
markdown 完整语法规范 3.0 + 编辑工具介绍
pycharm 重置设置,恢复默认设置
[已解决]window 下 Can't connect to MySQL server on'localhost' (10061) 与无法启动 MYSQL 服务”1067 进程意外终止”
解决 xshell6 评估过期, 需采购问题
[已解决]pycharm 报错: AttributeError: module 'pip' has no attribute 'main'
[已解决]windows 下 python3.x 与 python2.7 共存版本 pip 使用报错问题
局域网共享工具总结
云策文档think配置https教程
MIUI12-14百度输入法小米版使用森林集皮肤办法
Jenkins 构建后通知到飞书
简易的openvpn安装
keychron V1键盘改键教程
caddy2配置SSE单向websock(How to proxy Server Sent Events caddy2)
cleanmymacx 一直要求输入密码问题解决
Mac配置鼠须管输入法(Rime)
sorry this adobe app is not available(mac版本的Photoshop)
caddy2配置websocks
解决 all DNS requests failed, first error: dns: bad rdata
小米hyperos系统关闭5G信号开关
机器监控告警
zabbix
yum / 编译安装 Zabbix 5.0 LTS
zabbix 监控 AWS-SQS 队列
Zabbix-agent 端配置文件说明
Prometheus+grafana
prometheus+grafana安装和配置
node_exporter主机监控
cadvisor容器监控
redis_exporter监控
rabbitmq_exporter监控
consul_exporter监控
windows_exporter
Open-Falcon
falcon 数据丢失处理方法参考
日志监控告警
graylog
graylog 通过 python 实现钉钉 / 微信 / webhook 告警
loki+grafana
Loki简介
Loki安装
Loki查询语法
grafana面板pannel语法
内网穿透
frp(推荐一)
zerotier(推荐二)
zerotier充当网关实现内网互联,访问其它节点内网
一分钟自建zerotier-plant
tailscale(推荐三)
N2N
nps
anylink
OmniEdge
quickvlan(昆浪智能)
tg每日自动签到
本文档发布于https://mrdoc.fun
-
+
首页
consul_exporter监控
# 1.下载 node_exporter 访问官网地址下载<https://prometheus.io/download/#node_exporter> ```bash mkdir -p /opt/agent cd /opt/agent wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz ``` # 2.配置 systemd 管理 ## 2.1 创建启动用户和用户组 ```bash useradd -M -s /sbin/nologin prometheus ``` ## 2.2 创建node_exporter.service ```bash # vim /etc/systemd/system/node-exporter.service [Unit] Description=node-export service agent by jonnyan404 Requires=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Restart=on-failure ExecStart=/path/to/node_exporter --collector.tcpstat ExecReload=/bin/kill -HUP $MAINPID KillMode=process TimeoutStopSec=5 [Install] WantedBy=multi-user.target ``` ## 2.3 设置开机自启并启动 ```bash systemctl enable node-exporter.service systemctl start node-exporter.service ``` ## 2.4 查看日志 ```bash journalctl -u node-exporter.service ``` # 3. 配置自动发现的主机列表 基于 file_sd_configs 有 yaml和json两种格式,这里我们采用yaml - yaml格式 ```yaml # vim /opt/jonnyan404/prometheus/target/linux.yml 文件名字自己取 - targets: ['192.168.1.220:9100'] labels: app: 'app1' env: 'game1' region: 'us-west-2' - targets: ['192.168.1.221:9100'] labels: app: 'app2' env: 'game2' region: 'ap-southeast-1' ``` - json格式 ```json [ { "targets": [ "192.168.1.221:29090"], "labels": { "app": "app1", "env": "game1", "region": "us-west-2" } }, { "targets": [ "192.168.1.222:29090" ], "labels": { "app": "app2", "env": "game2", "region": "ap-southeast-1" } } ] ``` # 4. 配置告警规则 - vim /opt/jonnyan404/prometheus/rules/node-exporter-record.yml ```yaml groups: - name: node_exporter-record rules: - expr: up record: node_exporter:up labels: desc: "节点是否在线, 在线1,不在线0" unit: " " job: "aws_ec2" - expr: time() - node_boot_time_seconds{}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:node_uptime labels: desc: "节点的运行时间" unit: "s" job: "aws_ec2" ############################################################################################## # cpu # - expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode="idle"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:total:percent labels: desc: "节点的cpu总消耗百分比" unit: "%" job: "aws_ec2" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode="idle"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:idle:percent labels: desc: "节点的cpu idle百分比" unit: "%" job: "aws_ec2" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode="iowait"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:iowait:percent labels: desc: "节点的cpu iowait百分比" unit: "%" job: "aws_ec2" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode="system"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:system:percent labels: desc: "节点的cpu system百分比" unit: "%" job: "aws_ec2" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode="user"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:user:percent labels: desc: "节点的cpu user百分比" unit: "%" job: "aws_ec2" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="aws_ec2",mode=~"softirq|nice|irq|steal"}[5m]))) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:other:percent labels: desc: "节点的cpu 其他的百分比" unit: "%" job: "aws_ec2" ############################################################################################## # memory # - expr: node_memory_MemTotal_bytes{job="aws_ec2"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:total labels: desc: "节点的内存总量" unit: byte job: "aws_ec2" - expr: node_memory_MemFree_bytes{job="aws_ec2"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:free labels: desc: "节点的剩余内存量" unit: byte job: "aws_ec2" - expr: node_memory_MemTotal_bytes{job="aws_ec2"} - node_memory_MemFree_bytes{job="aws_ec2"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:used labels: desc: "节点的已使用内存量" unit: byte job: "aws_ec2" - expr: node_memory_MemTotal_bytes{job="aws_ec2"} - node_memory_MemAvailable_bytes{job="aws_ec2"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:actualused labels: desc: "节点用户实际使用的内存量" unit: byte job: "aws_ec2" - expr: (1-(node_memory_MemAvailable_bytes{job="aws_ec2"} / (node_memory_MemTotal_bytes{job="aws_ec2"})))* 100* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:used:percent labels: desc: "节点的内存使用百分比" unit: "%" job: "aws_ec2" - expr: ((node_memory_MemAvailable_bytes{job="aws_ec2"} / (node_memory_MemTotal_bytes{job="aws_ec2"})))* 100* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:memory:free:percent labels: desc: "节点的内存剩余百分比" unit: "%" job: "aws_ec2" ############################################################################################## # load # - expr: sum by (instance) (node_load1{job="aws_ec2"})* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:load:load1 labels: desc: "系统1分钟负载" unit: " " job: "aws_ec2" - expr: sum by (instance) (node_load5{job="aws_ec2"})* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:load:load5 labels: desc: "系统5分钟负载" unit: " " job: "aws_ec2" - expr: sum by (instance) (node_load15{job="aws_ec2"})* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:load:load15 labels: desc: "系统15分钟负载" unit: " " job: "aws_ec2" ############################################################################################## # disk # - expr: node_filesystem_size_bytes{job="aws_ec2" ,fstype=~"ext4|xfs"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:usage:total labels: desc: "节点的磁盘总量" unit: byte job: "aws_ec2" - expr: node_filesystem_avail_bytes{job="aws_ec2",fstype=~"ext4|xfs"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:usage:free labels: desc: "节点的磁盘剩余空间" unit: byte job: "aws_ec2" - expr: node_filesystem_size_bytes{job="aws_ec2",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{job="aws_ec2",fstype=~"ext4|xfs"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:usage:used labels: desc: "节点的磁盘使用的空间" unit: byte job: "aws_ec2" - expr: (1 - node_filesystem_avail_bytes{job="aws_ec2",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="aws_ec2",fstype=~"ext4|xfs"}) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:used:percent labels: desc: "节点的磁盘的使用百分比" unit: "%" job: "aws_ec2" - expr: irate(node_disk_reads_completed_total{job="aws_ec2"}[1m])* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:read:count:rate labels: desc: "节点的磁盘读取速率" unit: "次/秒" job: "aws_ec2" - expr: irate(node_disk_writes_completed_total{job="aws_ec2"}[1m])* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:write:count:rate labels: desc: "节点的磁盘写入速率" unit: "次/秒" job: "aws_ec2" - expr: (irate(node_disk_written_bytes_total{job="aws_ec2"}[1m]))/1024/1024* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:read:mb:rate labels: desc: "节点的设备读取MB速率" unit: "MB/s" job: "aws_ec2" - expr: (irate(node_disk_read_bytes_total{job="aws_ec2"}[1m]))/1024/1024* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:disk:write:mb:rate labels: desc: "节点的设备写入MB速率" unit: "MB/s" job: "aws_ec2" ############################################################################################## # filesystem # - expr: (1 -node_filesystem_files_free{job="aws_ec2",fstype=~"ext4|xfs"} / node_filesystem_files{job="aws_ec2",fstype=~"ext4|xfs"}) * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:filesystem:used:percent labels: desc: "节点的inode的剩余可用的百分比" unit: "%" job: "aws_ec2" ############################################################################################# # filefd # - expr: node_filefd_allocated{job="aws_ec2"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:filefd_allocated:count labels: desc: "节点的文件描述符打开个数" unit: "%" job: "aws_ec2" - expr: node_filefd_allocated{job="aws_ec2"}/node_filefd_maximum{job="aws_ec2"} * 100 * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:filefd_allocated:percent labels: desc: "节点的文件描述符打开百分比" unit: "%" job: "aws_ec2" ############################################################################################# # network # - expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netin:bit:rate labels: desc: "节点网卡eth0每秒接收的比特数" unit: "bit/s" job: "aws_ec2" - expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netout:bit:rate labels: desc: "节点网卡eth0每秒发送的比特数" unit: "bit/s" job: "aws_ec2" - expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netin:packet:rate labels: desc: "节点网卡每秒接收的数据包个数" unit: "个/秒" job: "aws_ec2" - expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netout:packet:rate labels: desc: "节点网卡发送的数据包个数" unit: "个/秒" job: "aws_ec2" - expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netin:error:rate labels: desc: "节点设备驱动器检测到的接收错误包的数量" unit: "个/秒" job: "aws_ec2" - expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:netout:error:rate labels: desc: "节点设备驱动器检测到的发送错误包的数量" unit: "个/秒" job: "aws_ec2" - expr: node_tcp_connection_states{job="aws_ec2", state="established"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:tcp:established:count labels: desc: "节点当前established的个数" unit: "个" job: "aws_ec2" - expr: node_tcp_connection_states{job="aws_ec2", state="time_wait"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:tcp:timewait:count labels: desc: "节点timewait的连接数" unit: "个" job: "aws_ec2" - expr: sum by (environment,instance) (node_tcp_connection_states{job="aws_ec2"})* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:network:tcp:total:count labels: desc: "节点tcp连接总数" unit: "个" job: "aws_ec2" ############################################################################################# # process # - expr: node_processes_state{state="Z"}* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:process:zoom:total:count labels: desc: "节点当前状态为zoom的个数" unit: "个" job: "aws_ec2" ############################################################################################# # other # - expr: abs(node_timex_offset_seconds{job="aws_ec2"})* on(instance) group_left(nodename) (node_uname_info) record: node_exporter:time:offset labels: desc: "节点的时间偏差" unit: "s" job: "aws_ec2" ############################################################################################# # - expr: count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{ mode='system'}) ) * on(instance) group_left(nodename) (node_uname_info) record: node_exporter:cpu:count ``` - vim /opt/jonnyan404/prometheus/rules/node-exporter-alert.yml ```yaml # node-exporter-alert-rules.yml # 定义告警规则 # 通过前一个 rules 文件拿到定义的 record 别名来编写 expr 判断式 # 这里定义的告警规则,在触发的时候,都会传递到 alertmanager,最后从传递的信息中抽取所需数据发送给目标人。 groups: - name: node-alert rules: - alert: node-down expr: node_exporter:up == 0 for: 1m labels: severity: critical annotations: summary: "instance: {{ $labels.instance }} 宕机了" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: Prometheus无法连接Alertmanager expr: prometheus_notifications_alertmanagers_discovered < 1 for: 0m labels: severity: critical annotations: summary: Prometheus not connected to alertmanager description: "Prometheus cannot connect the alertmanager\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: Alertmanager发送通知失败 expr: rate(alertmanager_notifications_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus AlertManager notification failing description: "Alertmanager is failing sending notifications\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: node-cpu-high expr: node_exporter:cpu:total:percent > 80 for: 3m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} cpu 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-cpu-iowait-high expr: node_exporter:cpu:iowait:percent >= 12 for: 3m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} cpu iowait 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-load-load1-high expr: (node_exporter:load:load1) > (node_exporter:cpu:count) * 1.2 for: 3m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} load1 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-memory-high expr: node_exporter:memory:used:percent > 85 for: 3m labels: severity: info annotations: summary: "内存使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-disk-high expr: node_exporter:disk:used:percent > 80 for: 3m labels: severity: info annotations: summary: "{{ $labels.device }}:{{ $labels.mountpoint }} 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-disk-read:count-high expr: node_exporter:disk:read:count:rate > 3000 for: 2m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} iops read 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-disk-write-count-high expr: node_exporter:disk:write:count:rate > 3000 for: 2m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} iops write 使用率高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-disk-read-mb-high expr: node_exporter:disk:read:mb:rate > 60 for: 2m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 读取字节数 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-disk-write-mb-high expr: node_exporter:disk:write:mb:rate > 60 for: 2m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 写入字节数 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-filefd-allocated-percent-high expr: node_exporter:filefd_allocated:percent > 80 for: 10m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 打开文件描述符 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-network-netin-error-rate-high expr: node_exporter:network:netin:error:rate > 4 for: 1m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 包进入的错误速率 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-network-netin-packet-rate-high expr: node_exporter:network:netin:packet:rate > 35000 for: 1m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 包进入速率 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-network-netout-packet-rate-high expr: node_exporter:network:netout:packet:rate > 35000 for: 1m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 包流出速率 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-network-tcp-total-count-high expr: node_exporter:network:tcp:total:count > 40000 for: 1m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} tcp连接数量 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-process-zoom-total-count-high expr: node_exporter:process:zoom:total:count > 10 for: 10m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} 僵死进程数量 高于 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: node-time-offset-high expr: node_exporter:time:offset > 0.03 for: 2m labels: severity: info annotations: summary: "instance: {{ $labels.instance }} {{ $labels.desc }} {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " - alert: 磁盘剩余空间不足 expr: node_exporter:disk:used:percent > 80 for: 2m labels: severity: warn annotations: summary: "instance: {{ $labels.instance }} 磁盘使用率已超过 {{ $value }}{{ $labels.unit }}" grafana: "http://x.x.x.x:3000/d/9CWBz0bik/zhu-ji-jian-kong?orgId=1&var-node={{ $labels.instance }} " ``` # 5.重启prometheus,使规则生效 ``` docker restart prometheus ```
Jonny
May 20, 2021, 10:55 a.m.
1034
0 条评论
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
如遇文档失效,可评论告知,便后续更新!
【腾讯云】2核2G云服务器新老同享 99元/年,续费同价
【阿里云】2核2G云服务器新老同享 99元/年,续费同价(不要✓自动续费)
【腾讯云】2核2G云服务器新老同享 99元/年,续费同价
【阿里云】2核2G云服务器新老同享 99元/年,续费同价(不要✓自动续费)
Markdown文件
Word文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码
有效期