25

通过Consul+Prometheus自动注册node-exporter实现自动监控OpenStack的VM

 3 years ago
source link: https://studygolang.com/articles/29330
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

1. 提出问题

在工作中OpenStack集群的vm需要解决基础性能指标的监控,如果每台的启动再去手动添加监控node_exporter,再写prometheus.yml的话对于吾等懒程序员简直就是噩梦,由此开始设计基于Prometheus+Consul的监控方案。

2. 解决方案

1. 通过将node_exporter打包进Image实现强制自动部署
2. 通过开发一个小程序自动注册node_exporter到consul,同时小程序也与node_exporter一样打包进Image
3. 配置Prometheus发现node_exporter

3. 部署Consul集群

3.1 集群规划

系统 主机名 IP Centos-7.7 compute-7-1 172.16.100.71 Centos-7.7 compute-7-2 172.16.100.72 Centos-7.7 compute-7-3 172.16.100.73

3.1 自行下载Consul并安装

Consul v1.7.2

3.1.1 配置master token

$ curl \
    --request PUT \
    http://172.16.100.71:8500/v1/acl/bootstrap

3.1.2 配置获取到的master token

compute-7-1:

{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-1",
    "bind_addr": "172.16.100.71",
    "advertise_addr": "172.16.100.71",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-2

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-2",
    "bind_addr": "172.16.100.72",
    "advertise_addr": "172.16.100.72",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-3

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-3",
    "bind_addr": "172.16.100.73",
    "advertise_addr": "172.16.100.73",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

在三个节点中启动

3.1.3 三个节点都执行

$ sudo useradd consul
$ sudo vim /usr/lib/systemd/system/consul.service
Description=consul: the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
$ sudo systemctl daemon-reload

3.1.4 在compute-7-2和compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

3.1.5 在compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

启动后我们会查看到服务器日志中出现与权限有关的错误,根据官方文档的说法是因为未配置agent的token导致的,因此还需要创建agent的token:

$ curl \
    --request PUT \
    --header "X-Consul-Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" \
    --data \
    '{ 
    "Name": "Agent Token", 
    "Type": "client",
    "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }" }'http://172.16.100.71:8500/v1/acl/create

3.1.6 配置获取到的agent token

compute-7-1:

{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-1",
    "bind_addr": "172.16.100.71",
    "advertise_addr": "172.16.100.71",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-2

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-2",
    "bind_addr": "172.16.100.72",
    "advertise_addr": "172.16.100.72",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-3

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-3",
    "bind_addr": "172.16.100.73",
    "advertise_addr": "172.16.100.73",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

3.1.7 在compute-7-2和compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

3.1.8 在compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

待集群稳定后即可访问UI, http://172.16.100.71 :8500

4. 集成Prometheus

$ sudo vim /etc/prometheus/prometheus.yml
...
  - job_name: 'OpenStack-vms'
    consul_sd_configs:
      - server: "172.16.100.71:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
      - server: "172.16.100.72:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
      - server: "172.16.100.73:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: ".*OpenStack-vms.*"
        replacement: OpenStack-vms
        action: keep
        target_label: env
      - regex: __meta_consul_service_metadata_(.+)
        action: labelmap
...
$ sudo systemctl restart prometheus

启动后,在prometheus UI就可以找到刚才配置的job_name了:

Rr2Inq7.png!web

5. VMS自动注册

问题:关于自动注册,原生的组件中都没有较美好的方案。我刚开始使用curl的方式通过shell写入rc.local的方式自动注册,但是发现有时还是会出现没有注册的情况。同时发现consul并不是强一致性的注册中心,有时会出现相同的serviceid同时被注册到不同的节点的情况:

iAfYzif.png!web

所以使用go语言开发了一个 小程序 自动注册node_exporter,并使用systemd设置开机自启动来达到自动注册的效果,并通过一套算法来避免重复注册以及实现均衡注册。

$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip
$ unzip consulR.zip
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
$ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/
$ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter

Node_Exporter安装与开机自启动

$ vim 
[Unit]
Description=node_exporter: the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter

Consul安装与开机自启动

$ vim /etc/consul/consul.yaml
System:
  ServiceName: consul-registy-service
  ListenAddress: 0.0.0.0
  Port: 9984
  #通过此IP与端口来检索出口网卡IP地址
  FindAddress: 8.8.8.8:80
Logs:
  LogFilePath: /data/consul/consul.log
  LogLevel: info
Consul:
  Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500
  Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c
  CheckTimeout: 5s
  CheckInterval: 5s
  CheckDeregisterCriticalServiceAfter: true
  CheckDeregisterCriticalServiceAfterTime: 5s
Service:
  Tag: node-exporter
  #Address空则默认通过FindAddress配置来检索出口网卡IP地址
  Address:
  Port: 9100
$ vim /usr/lib/systemd/system/consul.service 
[Unit]
Description=Consul
After=network-online.target

[Service]
User=nobody
ExecStart=/usr/local/consul --confpath=/etc/consul/consul.yaml
Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul

创建镜像后,用这个镜像就能被prometheus自动发现了。

欢迎关注我们的微信公众号,每天学习Go知识

FveQFjN.jpg!web

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK