热门搜索 Zabbix技术资料 Zabbix常见问、答讨论成功案例 Zabbix交流区 Prometheus交流区

Prometheus技术分享——prometheus的函数与计算公式详解

2022/12/28 Prometheus技术资料 prometheus函数 prometheus技术分享 prometheus计算公式7833

Prometheus与zabbix相比，它的强大之处就在于可以它可以使用的很多计算公式去获取自己需要的数据。当然，这里所涉及到的计算公式，也是我们普遍认为的难点所在。比如，我们要获取CPU使用率，使用zabbix就可以轻易获得，但是在Prometheus中却需要通过计算公式来完成CPU使用率的计算。

如果要统计CPU的使用：node_exporter会抓取CPU常用你的8种状态的累计工作时间，然后再用（所有非空闲状态的CPU时间总和）/（所有状态的CPU时间总和）= CPU使用率。而如果想要获取中间某一分钟的CPU使用时间还需要用到Counter数据类型。由于Counter的数据一致是增量，所以需要截取其中一段增量值，然后再拿这个值去套用公式进行计算。

一、常用函数

Prometheus为不同的数据提供了非常多的计算函数，其中有个小技巧就是遇到counter数据类型，在做任何操作之前，先套上一个rate()或者increase()函数。下面是一些比较常用的函数：

1、rate函数

rate() 函数是专门搭配counter数据类型使用函数，功能是取counter在这个时间段中平均每秒的增量。

例如：获取eth0网卡1m内每秒流量的平均值

rate(node_network_receive_bytes_total{device="eth0"}[1m])

2、increase函数

increase() 函数表示某段时间内数据的增量

rate() 函数则表示某段时间内数据的平均值

两个函数如何选取使用？

当我们获取数据比较精细的时候类似于1m取样推荐使用rate()

当我们获取数据比较粗糙的时候类似于5m，10m甚至更长时间取样推荐使用increase()

例如：获取eth0网卡1m内流量的增量

increase(node_network_receive_bytes_total{device="eth0"}[1m])

3、sum函数

sum()函数就是求和函数,注意点是当你使用sum后是将所有的监控的服务器的值进行取和，所以当我们只看某一台时需要进行拆分

拆分常用方法：

by increase()
2 by (cluster_name) 属于自定义标签不是标准标签，我们可以手动将不痛功能的服务器进行分组展示
例如：获取所有主机eth0网卡1m内每秒流量的平均值的和

sum(rate(node_network_receive_bytes_total{device="eth0"}[1m]))

4、topk函数

topk() 函数的定义是：取前面x位的最高值,最简单理解就是数学的top3 ，当我们有很多服务器我们想要获取某个key的数据排在前3位的服务器。

Gauge类型使用方式：

topk(3,key)

Counter类型使用方式

topk(3,rate(key[1m]))

注意：此种函数获得数据并不是很适用图形化展示

5、count函数

count() 是找出当前或者历史数据中某个key的数值大于或小于某个值的统计，

例如：

count(node_netstat_Tcp_CurrEstab >50)

6、irate函数

irate(v range-vector)计算范围向量中时间序列的每秒即时增长率。这基于最后两个数据点。单调性中断（例如由于目标重启而导致的计数器重置）会自动调整

例如：5m内http请求的每秒速率

irate(http_requests_total{job=”linux-01″}[5m])

irate只应在绘制易失性快速移动计数器时使用。使用rate警报和缓慢移动的柜台，因为在房价短暂变化可以重设FOR条款和图表完全由罕见尖峰难以阅读。

注意，当irate()与聚合运算符（例如sum()）或随时间聚合的函数（以任何结尾的函数_over_time）组合时，总是先取irate()第一个，然后聚合。否则irate()在目标重启时无法检测到计数器重置。

二、CPU使用率的计算方法

1、CPU模式
一颗CPU要通过分时复用的方式运行于不同的模式中，这些模式可以用我们常用的top命令进行查看，其中包括：

us：用户进程使用cpu的时间
sy：内核进程使用cpu的时间
ni：用户进程空间内改变过优先级的进程使用的cpu时间
id：空闲（没人用）的cpu时间
wa：等待io的cpu时间
hi：硬中断的cpu时间
si：软中断的cpu时间
st：虚拟机管理程序使用的cpu时间
这些时间加在一起是总的cpu时间。

2、CPU时间
通过node-exporter抓取的指标中cpu相关主要是各个node_cpu_seconds_total，可以通过如下的方式查看所有的metrics。

curl http://localhost:9100/metrics

在请求之后，会返回各种监控的内容，这里只截取出cpu相关的部分。

# HELP node_cpu_seconds_total Seconds the cpus spent in each mode. # TYPE node_cpu_seconds_total counter node_cpu_seconds_total{cpu="0",mode="idle"} 26659.41 node_cpu_seconds_total{cpu="0",mode="iowait"} 4.79 node_cpu_seconds_total{cpu="0",mode="irq"} 0 node_cpu_seconds_total{cpu="0",mode="nice"} 0 node_cpu_seconds_total{cpu="0",mode="softirq"} 2.69 node_cpu_seconds_total{cpu="0",mode="steal"} 0 node_cpu_seconds_total{cpu="0",mode="system"} 31.65 node_cpu_seconds_total{cpu="0",mode="user"} 8.67 node_cpu_seconds_total{cpu="1",mode="idle"} 26634.43 node_cpu_seconds_total{cpu="1",mode="iowait"} 54.14 node_cpu_seconds_total{cpu="1",mode="irq"} 0 node_cpu_seconds_total{cpu="1",mode="nice"} 0.02 node_cpu_seconds_total{cpu="1",mode="softirq"} 1.23 node_cpu_seconds_total{cpu="1",mode="steal"} 0 node_cpu_seconds_total{cpu="1",mode="system"} 34.07 node_cpu_seconds_total{cpu="1",mode="user"} 9 node_cpu_seconds_total{cpu="2",mode="idle"} 26629.89 node_cpu_seconds_total{cpu="2",mode="iowait"} 6.57 node_cpu_seconds_total{cpu="2",mode="irq"} 0 node_cpu_seconds_total{cpu="2",mode="nice"} 0 node_cpu_seconds_total{cpu="2",mode="softirq"} 1.95 node_cpu_seconds_total{cpu="2",mode="steal"} 0 node_cpu_seconds_total{cpu="2",mode="system"} 24.66 node_cpu_seconds_total{cpu="2",mode="user"} 7.2 node_cpu_seconds_total{cpu="3",mode="idle"} 26699.96 node_cpu_seconds_total{cpu="3",mode="iowait"} 5.72 node_cpu_seconds_total{cpu="3",mode="irq"} 0 node_cpu_seconds_total{cpu="3",mode="nice"} 0.01 node_cpu_seconds_total{cpu="3",mode="softirq"} 1.27 node_cpu_seconds_total{cpu="3",mode="steal"} 0 node_cpu_seconds_total{cpu="3",mode="system"} 22.32 node_cpu_seconds_total{cpu="3",mode="user"} 7.33

上面的某一行就是某一核cpu的某个模式的运行时间，单位是秒。把某一核各个模式的cpu时间加起来就是执行uptime得到的系统开机以来运行运行的总的秒数了。例如：

node_cpu_seconds_total{cpu=”0″,mode=”idle”} 26659.41

3、推导CPU使用率的公式

1）cpu0 5分钟内处于空闲状态的时间

increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m])
1
increase表示增量，所以这个公式表示的是当前时间点的node_cpu_seconds_total减去5分钟之前的node_cpu_seconds_total的值，也就是这5分钟内处于idle状态的cpu时间。

2）cpu0 5分钟内处于空闲状态的时间占比：

increase(node_cpu_seconds_total{cpu=”0″,mode=”idle”}[5m]) / increase(node_cpu_seconds_total{cpu=”0″}[5m])
3）一台主机所有cpu 5分钟内处于空闲状态的时间占比：

sum (increase(node_cpu_seconds_total{mode=”idle”}[5m])) / sum (increase(node_cpu_seconds_total{mode=”idle”}[5m]))

4）如果 Prometheus 监控多台主机，要根据每台主机做 sum：

sum (increase(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) / sum (increase(node_cpu_seconds_total[5m])) by (instance)
1
5）cpu使用率 = 1 – cpu空闲率

100 * (1 – sum (increase(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) / sum (increase(node_cpu_seconds_total[5m])) by (instance))
1
6）根据irate()函数，可以简化计算公式为：

100 – (avg(irate(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) * 100)

三、常用计算公式

1、CPU使用率

100 – (avg(irate(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) * 100)
2、空闲内存剩余率

(node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100
3、内存使用率

100 – (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100
4、磁盘使用率

100 – (node_filesystem_free_bytes{mountpoint=”/”,fstype=~”ext4|xfs”} / node_filesystem_size_bytes{mountpoint=”/

我是乐乐，关注尊龙时凯社区，学习prometheus不迷路，专注zabbix和prometheus技术研究与分享，更多开源技术内容敬请留意后续文章，或查阅尊龙时凯技术文档。如有prometheus问题还可以到尊龙时凯社区提问留言，也可以加入社区有问有答技术交流QQ群:617295020，一起交流开源技术心得。

The prev: Prometheus技术分享——prometheus高可用架构介绍The next: 尊龙时凯监控 x Prometheus：解锁高效运维新技能

Related recommendations

Prometheus技术分享——如何监控宿主机和容器
2022/12/14 6756
prometheus监控宿主机，使用node_exporter工具来暴露主机和因公程序上的指标； prometheus监控docker容器，通过Cadviso
View details
Prometheus技术分享——监控各个指标的含义，类型，以及格式
2022/11/16 10572
前面几期尊龙时凯君已经跟大家介绍了prometheus的安装配置、告警规则等等，本期将重点介绍prometheus监控各个指标的含义、类型以及格式。
View details
Prometheus 简介
2022/11/08 5115
Prometheus是一个最初在SoundCloud上构建的开源系统监视和警报工具包。
View details
Prometheus技术分享——Prometheus特点，组件，局限探讨
2022/11/11 6517
这一期尊龙时凯君主要跟大家来探讨新一代的开源监控prometheus，我们知道 zabbix 在监控界占有不可撼动的地位，功能强大。但是对容器监控显得力不从心。为解决监...
View details

Expand more!

快速导航

首页
产品介绍
成功案例
行业方案
- 行业大屏
- 银行
- 金融保险
- 先进制造
- 智慧城市
- 运营商
- 教育
- 医疗
- 混合云
技术白皮书
- 纳管能力
- 技术文档
- zabbix技术分享
- Prometheus技术分享
关于尊龙时凯
- 运维如诗
- 企业动态
- 视频中心
- 行业新闻
- 招聘精英
尊龙时凯社区
免费下载
免费体验

成功案例

【实践】有效告警提升75%！电信巨头爱上尊龙时凯多Server+多Proxy架构
2022/06/07 9283
采用分布式架构：多server + 多 proxy 架构，服务器优化、增加表分区、采集方式优化等。
View details
案例解读 | 某大型国际企业智能运维平台建设实践
2024/03/08 6895
基于企业IT系统结构特点，结合客户运维痛点与实际需求，尊龙时凯为该客户打造了涵盖全局监控、资产梳理、大屏视图、专线链路、管理门户、告警中心等于一...
View details
案例解读 | 某三甲医院运维监控体系升级实例
2024/01/17 7680
基于客户医院原有的运维体系、运维痛点和对监控的需求，尊龙时凯为其量身打造了一套一站式智能运维监控解决方案，搭建统一监控平台，引入智能化告警管理系统、可...
View details
武汉市某医院项目案例
2022/06/07 9251
尊龙时凯建立监控平台，做到及早发现故障、合理利用信息化基础资源，达到最大化资源使用，使得医院系统信息化建设健康发展。
View details

View all

扫码咨询
微信公众号
热线电话
- 咨询热线：
  13631560190
  020-28192830
回到顶部

Privacy Overview

本网站使用cookie来改善您浏览本网站时的体验。除此之外，被归类为必要的cookie存储在你的浏览器中，因为它们对网站的基本功能的工作至关重要。我们也使用第三方cookie来帮助我们分析和了解您如何使用本网站。只有在您同意的情况下，这些cookie才会存储在您的浏览器中。您还可以选择退出这些cookie。但选择退出其中一些cookie可能会影响你的浏览体验。

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	此cookie由GDPR cookie Consent插件设置。该cookie用于在“分析”类别中存储用户对cookie的同意。
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	此cookie由GDPR cookie Consent插件设置。该cookie用于存储用户在“其他”类别中对cookie的同意。
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	该cookie由GDPR cookie Consent插件设置，用于存储用户是否同意使用cookie。它不存储任何个人数据。

Functional

Performance

Analytics

Others

扫码咨询
微信公众号
热线电话
- 咨询热线：
  13631560190
  020-28192830
回到顶部

尊龙时凯

Prometheus技术分享——prometheus的函数与计算公式详解

一、常用函数

二、CPU使用率的计算方法

三、常用计算公式

Related recommendations

Prometheus技术分享——如何监控宿主机和容器

Prometheus技术分享——监控各个指标的含义，类型，以及格式

Prometheus 简介

Prometheus技术分享——Prometheus特点，组件，局限探讨

快速导航

成功案例

【实践】有效告警提升75%！电信巨头爱上尊龙时凯多Server+多Proxy架构

案例解读 | 某大型国际企业智能运维平台建设实践

案例解读 | 某三甲医院运维监控体系升级实例

武汉市某医院项目案例

产品

解决方案

关于我们

尊龙时凯自媒体号

关注我们

一、常用函数

二、CPU使用率的计算方法

三、常用计算公式

Related recommendations

快速导航

成功案例

一、常用函数

二、CPU使用率的计算方法

三、常用计算公式