2.5 快速安装并启动Prometheus_Prometheus云原生监控：运维与开发实战-QQ阅读男生都市网

上QQ阅读APP看书，第一时间看更新

2.5　快速安装并启动Prometheus

介绍完Prometheus的相关概念，接下来介绍如何在Mac电脑上迅速安装并启动Prometheus，对于Docker方式和Prometheus Operator方式，读者可以自行查阅官方文档。

在Prometheus的官方下载页面https://prometheus.io/download/中可以看到，Prometheus提供了独立的二进制文件的tar包，其中主要包括prometheus、alertmanager、blackbox_exporter、consul_exporter、graphite_exporter、haproxy_exporter、memcached_exporter、mysqld_exporter、node_exporter、Pushgateway、statsd_exporter等组件。

Prometheus组件主要支持Darwin、Linux、Windows等操作系统。本节介绍通过Darwin版本在Mac电脑上迅速安装并启动Prometheus的方法。

如图2-7所示，根据当前最新的软件版本，在Mac下选择Darwin版本下载。

图2-7　Prometheus下载

下载完成以后通过如下命令进行解压和启动。

tar xvfz prometheus-2.16.0.darwin-amd64.tar.gz
cd  prometheus-2.16.0.darwin-amd64.tar.gz
./prometheus -config.file=prometheus.yml

启动后的Prometheus端口号是9090，可以访问localhost：9090/metrics，该地址返回与Prometheus Server状态相关的监控信息，其返回数据如下所示。

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.2661e-05
go_gc_duration_seconds{quantile="0.25"} 1.8689e-05
go_gc_duration_seconds{quantile="0.5"} 3.466e-05
go_gc_duration_seconds{quantile="0.75"} 0.000183214
go_gc_duration_seconds{quantile="1"} 0.00082742

localhost:9090/graph是Prometheus的默认查询界面，界面截图如图2-8所示。

图2-8　Prometheus Graph页面

在Graph页面可以输入PromQL表达式，比如输入“up”，就可以查看监控的每个Job的健康状态，1表示健康，0表示不健康。如果prometheus.yml文件中配置的scrape_configs如下所示（其中8080的服务是没有启动的）：

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'springboot-demo'
    metrics_path: '/actuator/prometheus'
    static_configs:
    - targets: ['localhost:8080']

那么在Graph页面上就会得到图2-9所示的结果，这表示Prometheus服务是正常的，8080端口的服务是不正常的。

图2-9　在Prometheus Graph页面输入“up”命令

在Graph页面上还可以看到与告警相关的信息，在图2-9所示的Status菜单里，提供了Running&Build Information、Command-Line Flags、Configuration、Rules、Target、Service Discovery等与可视化功能相关的模块。

图2-10所示为与Command-Line Flags相关的信息。其中的信息在实际使用中大多数都是可配置的，需要用户根据实际的部署环境进行修改，部分配置项的选项及说明如表2-2所示^[1]，更多关于存储的参数配置可以参见第10章。

图2-10　Prometheus Graph命令行标志信息

表2-2　Prometheus可修改的配置选项及说明

在图2-10所示的参数中，web.enable-lifecycle在部署Spring Boot微服务的过程中是最常见的，它支持Prometheus通过Web端点动态更新配置。

./prometheus --config.file=prometheus.yml --web.enable-lifecycle

如果开启了这个命令行参数，每当prometheus.yml文件发生改变时，就不需要对Prometheus进行关闭再重新启动的操作了，而是可以通过如下命令让配置进行重新加载。

curl -X POST http:// localhost:9090/-/reload

注意

Prometheus中还可以通过--web.external-url使用外部URL和代理，或者通过--web.route-prefix等参数进行更精确的控制，例如：

./prometheus --web.external-url http:// localhost:19090/prometheus/
./prometheus --web.external-url http:// localhost:19090/prometheus/ --web.
  route-prefix=/

如果用CTRL+Z等方式强制关闭了Prometheus在命令行窗口的进程，那么重新启动Prometheus时会出现如下错误。

level=error ts=2020-02-24T06:41:41.749Z caller=main.go:727 err="error starting web 
  server: listen tcp 0.0.0.0:9090: bind: address already in use"

遇到这样的问题，解决方式是先通过lsof-i tcp：9090命令找到9090端口的占用情况。

prometheus-2.16.0.darwin-amd64 lsof -i tcp:9090
COMMAND    PID    USER  FD   TYPE             DEVICE SIZE/OFF NODE NAME
prometheu 3411 charles  10u  IPv4 0xfedca3ddafcff0cb  0t0  TCP 
  localhost:57498->localhost:websm (ESTABLISHED)
prometheu 3411 charles  11u  IPv6 0xfedca3dd8f43c783  0t0  TCP *:websm (LISTEN)
prometheu 3411 charles  12u  IPv6 0xfedca3dd8f43c1c3  0t0  TCP 
  localhost:websm->localhost:57498 (ESTABLISHED)
prometheu 3411 charles  13u  IPv6 0xfedca3dd8f43b083  0t0  TCP 
  localhost:57513->localhost:websm (ESTABLISHED)
prometheu 3411 charles  39u  IPv6 0xfedca3dd8f43cd43  0t0  TCP 
  localhost:websm->localhost:57513 (ESTABLISHED)

如上所示，只要使用sudo kill-9 3411，就可以“杀掉”原有Prometheus进程，重新执行./prometheus-config.file=prometheus.yml命令就可以再次启动Prometheus。

注意

当终端关闭或按下Ctrl+C组合键时，Prometheus服务会自动关闭。在Linux中，可以直接执行命令nohup./prometheus&使其后台运行。但是如果对进程进行关闭、重启、查看进程状态等操作，还需要配合各种Linux命令才能完成。为方便起见，可以将Prometheus添加为系统服务且开机自启动。如在CentOS Linux release 7操作系统中可以用命令systemctl来管理守护进程，在/usr/lib/systemd/system目录下添加一个系统服务启动文件来配置prometheus.service，参考配置方式如下所示。

# vi /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus server daemon
After=network.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/data/prometheus/prometheus \
  --config.file "/data/prometheus/prometheus.yml" \
  --web.listen-address "0.0.0.0:9090"
Restart=on-failure
[Install]
WantedBy=multi-user.agent

用户也可以使用和reload类似的HTTP的方式关闭Prometheus，从Prometheus 2.0开始必须开启--web.enable-lifecycle后才能使用这种方式，且要执行如下所示的命令才能关闭。

curl -X POST http:// localhost:9090/-/quit

上述命令除了在Prometheus自身的日志中会出现各个模块的关闭信息外，还会返回如下结果。

Requesting termination... Goodbye!

需要注意的是，告警及其阈值是在Prometheus中配置的，而不是在Alertmanager中配置的。以下是prometheus.yml文件配置各个部分的含义和功能的注释。

# global模块是全局配置信息，它定义的内容会被scrape_configs模块中的每个Job单独覆盖
global:
  scrape_interval: 15s # 抓取target的时间间隔，设置为15秒，默认值为1分钟。经验值为10～60s
  evaluation_interval: 15s #Prometheus计算一条规则配置的时间间隔，设置为15秒，
 #默认值为1分钟
  # scrape_timeout         # 抓取target的超时事件，默认值为10秒
  # external_labels        # 与外部系统通信时添加到任意时间序列或告警所用的外部标签

  # 告警模块，Prometheus Server发送请求给Alertmanager之前也会触发一次relabel操作
  # aler子模块下也可以配置alert_relabel_configs
alerting:
  alertmanagers:
    - static_configs:    # 静态配置Alertmanager的地址，也可以依赖服务发现动态识别
      - targets:         # 可以配置多个IP地址
        - localhost:9093

# Prometheus自定义的rule主要分为Recording rule和Alerting rule两类
rule_files:
  - "alertmanager_rules.yml"
  - "prometheus_rules.yml"

scrape_configs:
  # Job名称很重要，Prometheus会将该名称作为Label追加到抓取的每条时序中
  - job_name: 'prometheus'
  # metrics_path defaults to '/metrics' # metrics_path默认值是/metrics
 # 可以自定义，表示抓取时序的http path
  # scheme defaults to 'http'. # scheme默认是http，表示抓取时序数据时使用的网络协议
  # param 抓取时序的相关参数，可以自定义
  static_configs:             # 静态方式
  - targets: ['localhost:9090']
  
- job_name: 'springboot-demo'             # 第二个微服务Spring Boot作业

  metrics_path: '/actuator/prometheus'
  static_configs:
  - targets: ['localhost:8080']

通过上述注释可知，scrape_configs主要用于配置采集数据节点的操作，它和global重合的配置部分会覆盖global部分，每一个采集配置的具体参数及说明如表2-3所示。

表2-3　scape_configs具体参数及说明

[1] 可以通过./prometheus-h命令查看帮助内容。