

Telegraf和Grafana监控多平台上的SQL Server
source link: http://www.cnblogs.com/Joe-T/p/12973362.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

问题
SQL Server在很多企业中部署在多个平台上(Windows,Linux和Container),需要一种能支持多平台的解决方案用于收集和展示相关的监控指标。
我选择企业中比较流行的监控展示工具Grafana和监控指标收集工具Telegraf进行实现。这也是为了方便与企业中已经在存在监控平台进行整合和对接。

如上图所示,Telegraf部署在SQL所在host,收集数据发送给时序数据库Influxdb存储,然后Grafana用于展示数据。
解决方案
-
安装和配置InfluxDB
我将InfluxDB和Grafana安装在同一台CentOS主机上,生产环境中最好是分开。
# 下载1.8的stable version后进行安装 wget https://dl.influxdata.com/influxdb/releases/influxdb-1.8.0.x86_64.rpm chmod 755 influxdb-1.8.0.x86_64.rpm yum localinstall influxdb-1.8.0.x86_64.rpm # 启动并设置自启动 systemctl start influxdb systemctl enable influxdb # 8086用于客户端的HTTP连接,8088用于CLI调用RPC进行备份和还原操作 firewall-cmd --zone=public --add-port=8086/tcp --permanent firewall-cmd --zone=public --add-port=8088/tcp --permanent firewall-cmd --reload # 连接到influxdb并创建用户 fluxdb > CREATE USER admin WITH PASSWORD '<password>' WITH ALL PRIVILEGES # 启用http用户验证,修改influxdb.conf中http section中auth-enabled = true vim /etc/influxdb/influxdb.conf systemctl restart influxdb # 创建用于存储监控数据的数据库,保存6个月的数据 influx -username 'admin' -password '<password>' > CREATE DATABASE telegraf > CREATE RETENTION POLICY telegraf_6m ON telegraf DURATION 180d REPLICATION 1 DEFAULT > SHOW DATABASES
-
安装和配置Grafana
# 下载并安装Grafana wget https://dl.grafana.com/oss/release/grafana-7.0.1-1.x86_64.rpm chmod 775 grafana-7.0.1-1.x86_64.rpm yum localinstall grafana-7.0.1-1.x86_64.rpm # 设置自启动 systemctl start grafana-server.service systemctl enable grafana-server.service # 允许Grafana默认的端口3000 firewall-cmd --zone=public --add-port=3000/tcp --permanent firewall-cmd --reload
然后在Browser中访问http:// :3000,第一次访问时默登录认账号和密码都为admin,登录后会提示修改密码。
-
在客户端主机安装和配置Telegraf
所谓客户端,就是SQL所在主机
Telegraf连接到SQL,需要一个login,具有 VIEW SERVER STATE and VIEW ANY DEFINITION的权限,所以在每个被监控的实例上都需要创建之。
USE master; GO CREATE LOGIN [telegraf] WITH PASSWORD = N'1qaz@WSX'; GO GRANT VIEW SERVER STATE TO [telegraf]; GO GRANT VIEW ANY DEFINITION TO [telegraf]; GO
-
Telegraf on Linux
wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.3-1.x86_64.rpm sudo yum localinstall telegraf-1.14.3-1.x86_64.rpm
安装完成后,先要修改Telegraf的配置文件,再启动。在配置文件中主要配置两个部分:inputs和outputs。 inputs表示监控数据从哪里来,outputs表示监控要发送到哪里去。
打开/etc/telegraf/telegraf.conf,找到[[outputs.influxdb]]部分,所有配置项默认都被注释了。我们需要删除注释并配置一些项。主要是Influxdb的地址,用户名、密码和数据库名等。
[[outputs.influxdb]] ## The full HTTP or UDP URL for your InfluxDB instance. ## ## Multiple URLs can be specified for a single cluster, only ONE of the ## urls will be written to each interval. # urls = ["unix:///var/run/influxdb.sock"] # urls = ["udp://127.0.0.1:8089"] urls = ["http://172.17.2.4:8086"] ## The target database for metrics; will be created as needed. ## For UDP url endpoint database needs to be configured on server side. database = "telegraf" ## The value of this tag will be used to determine the database. If this ## tag is not set the 'database' option is used as the default. # database_tag = "" ## If true, the 'database_tag' will not be included in the written metric. # exclude_database_tag = false ## If true, no CREATE DATABASE queries will be sent. Set to true when using ## Telegraf with a user without permissions to create databases or when the ## database already exists. skip_database_creation = true ## Name of existing retention policy to write to. Empty string writes to ## the default retention policy. Only takes effect when using HTTP. retention_policy = "" ## The value of this tag will be used to determine the retention policy. If this ## tag is not set the 'retention_policy' option is used as the default. # retention_policy_tag = "" ## If true, the 'retention_policy_tag' will not be included in the written metric. # exclude_retention_policy_tag = false ## Write consistency (clusters only), can be: "any", "one", "quorum", "all". ## Only takes effect when using HTTP. write_consistency = "any" ## Timeout for HTTP messages. timeout = "5s" ## HTTP Basic Auth username = "admin" password = "<password>"
- 找到[[inputs.sqlserver]]部分,取消相关配置项的注释,servers部分连接到本地实例。
Telegraf默认的Plugin中包括了对SQL Server的实现, 这个Plugin还包括了对Azure SQL PaaS的实现
# # Read metrics from Microsoft SQL Server [[inputs.sqlserver]] # ## Specify instances to monitor with a list of connection strings. # ## All connection parameters are optional. # ## By default, the host is localhost, listening on default port, TCP 1433. # ## for Windows, the user is the currently running AD user (SSO). # ## See https://github.com/denisenkom/go-mssqldb for detailed connection # ## parameters, in particular, tls connections can be created like so: # ## "encrypt=true;certificate=<cert>;hostNameInCertificate=<SqlServer host fqdn>" servers = [ "Server=localhost;Port=1433;User Id=telegraf;Password=<yourPassword>;app name=telegraf;log=1;" ] # # ## Optional parameter, setting this to 2 will use a new version # ## of the collection queries that break compatibility with the original # ## dashboards. query_version = 2 # # ## If you are using AzureDB, setting this to true will gather resource utilization metrics # # azuredb = false # # ## Possible queries: # ## - PerformanceCounters # ## - WaitStatsCategorized # ## - DatabaseIO # ## - DatabaseProperties # ## - CPUHistory # ## - DatabaseSize # ## - DatabaseStats # ## - MemoryClerk # ## - VolumeSpace # ## - PerformanceMetrics # ## - Schedulers # ## - AzureDBResourceStats # ## - AzureDBResourceGovernance # ## - SqlRequests # ## - ServerProperties # ## A list of queries to include. If not specified, all the above listed queries are used. # # include_query = [] # # ## A list of queries to explicitly ignore. # exclude_query = [ 'Schedulers' , 'SqlRequests']
启动Telegraf之后,可以看到时已经加载的inputs和收集间隔
[root@SQL19N1 log]# systemctl status telegraf ● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-05-26 14:19:07 UTC; 19min ago Docs: https://github.com/influxdata/telegraf Main PID: 12359 (telegraf) CGroup: /system.slice/telegraf.service └─12359 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d May 26 14:19:07 SQL19N1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB. May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Starting Telegraf 1.14.3 May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Loaded inputs: system cpu disk diskio kernel mem processes swap sqlserver May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Loaded aggregators: May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Loaded processors: May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Loaded outputs: influxdb May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! Tags enabled: host=SQL19N1 May 26 14:19:07 SQL19N1 telegraf[12359]: 2020-05-26T14:19:07Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"SQL19N1", Flush Interval:10s
-
Telegraf on Windows
以管理员身份执行如下PowerShell命令
# 下载软件 wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.3_windows_amd64.zip · -OutFile "c:\temp\telegraf-1.14.3_windows_amd64.zip" # 解压缩到C:\Program Files\Telegraf Expand-Archive "c:\temp\telegraf-1.14.3_windows_amd64.zip", "C:\Program Files" # 将telegraf安装为windows服务 C:\"Program Files"\Telegraf\telegraf.exe --service install
修改telegraf.conf中outputs.influxdb和添加inputs.sqlserver部分,这些内容和在Linux上的配置一样,就不赘述了。
conf修改完成后,可以先测试一下telegraf是否能正常启动,没问题的话就启动telegraf服务。
# 测试 C:\"Program Files"\Telegraf\telegraf.exe --config C:\"Program Files"\Telegraf\telegraf.conf --test # 启动服务 C:\"Program Files"\Telegraf\telegraf.exe --service start
-
配置Grafana的数据源和Dashboard
登录Grafana后,在左侧的Configuration->Data Source中配置InfluxDB数据源,填写地址、账号、密码并设置为默认数据源,如下图

Dashboard,可以自己创建,也可以在采用公开社区的(感谢热心无私的大佬们)。这里,我采用 SQL Servers by Jonathan Rioux 。这个Dashboard中使用的Piechart不是Grafana预置的,所以还需要安装:
# Grafana所在Host安装,重启服务生效 grafana-cli plugins install grafana-piechart-panel systemctl restart grafana-server.service
然后在Grafana界面,选择左侧的Dashboard->Import->填入Dashboard ID->Import,如下图:

配置完成后的,可以看这个Dashboard提供的信息还比较丰富的,您也可以根据自己的需要修改和添加相关内容.



总结
-
实际情况中,自带的数据收集和报表不能完全满足业务需求,自定义的数据收集和自定义的Dashboard,也是非常容易实现的,下次再写
-
如果已经在使用Zabbix了,Grafana可以直接对接到Zabbix的数据输出。
-
Telegraf能非常好的支持Cloud环境,下次说说对Azure SQL PaaS的监控
-
本文内容仅代表个人观点,与任何公司和组织无关
Recommend
-
168
技术成就梦想51CTO-中国领先的IT技术网站 您浏览的内容找不到或已被删除
-
37
前一段时间自家养的几只猫经常出问题,由于没有有效的监控预警手段,以至于问题出现或者许久一段时间才会被通知到。凌晨一点这个锅可谁都不想背,为此基于目前的情况搭建了以下这么一套监控预警系统。
-
31
-
54
最近在做大数据监控 平台的 方案 调研 ,做了一些开源解决方案的尝试,今天分享一下基于Telegraf+InfluxDB+Grafana的监控平台整体部署过程。文章开始会简单介绍下 TICK 技术栈,接下来就是本...
-
42
双11未至,电子烟遭遇多个电商平台的下架,近期,多个政府部门发布了与电子烟有关的敦促、禁令和相关通知。11月7日,阿里巴巴宣布配合监管,关闭平台上电子烟店铺、电子烟产品下架,并将禁止电子烟相关产品的广告投放。此前,京东表示已屏蔽并逐步下架电
-
11
概述:Telegraf收集信息,influxdb时序数据库存储数据,grafana平台展示数据,并进行监控告警,组成一个自动化运维监控平台。 一、influxdb InfluxDB是一个由InfluxData开发的开源时序型数据。它由Go写成,着力于高...
-
11
多平台下录屏方式 随便记录下,想得起来的多少写多少: Windows: GDI 全屏:fullscreen desktop BitBlt 速度是 20ms / 帧 GDI 窗口:Win7+可以录制游戏和非游戏,XP以前只能录制普通界面,截不到游戏窗口 ...
-
4
Monitor Linux System with Grafana and TelegrafIn this article, we’re going to look at how to Monitor a Linux System with Grafana and Telegraf. Telegraf metrics will be stored on InfluxDB, then we can visualize them on Grafana using a system d...
-
8
Monitor Apache Kafka with Telegraf and Grafana in Azure T...
-
9
<?xml encoding="utf-8" ??>InfluxDB is a time-series database, which is often used to store performance metrics. When combined with Telegraf collection and Grafana visualization services, it makes an easy...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK