2

使用 sysdig 进行系统分析

 2 years ago
source link: https://syaning.github.io/2020/12/14/sysdig/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

使用 sysdig 进行系统分析

Dec 14, 2020

什么是 sysdig?

sysdig 是一个 Linux 系统诊断工具,并且提供了对容器的原生支持。可以认为它涵盖了 strace + tcpdump + htop + iftop + lsof + … 等一系列系统工具的功能。

相比于其它工具,sysdig 的优势在于:

  • 功能强大,并且使用方法和输出格式统一,无需使用不同的命令在不同的输出格式之间进行转换
  • 云原生支持,可以对容器和 k8s 集群进行监控
$ curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

sysdig 输出

默认可以直接执行

# 需要 root 权限
$ sudo sysdig
34378 12:02:36.269753803 2 echo (7896) > close fd=3(/usr/lib/locale/locale-archive)
34379 12:02:36.269754164 2 echo (7896) < close res=0
34380 12:02:36.269781699 2 echo (7896) > fstat fd=1(/dev/pts/3)
34381 12:02:36.269783882 2 echo (7896) < fstat res=0
34382 12:02:36.269784970 2 echo (7896) > mmap
34383 12:02:36.269786575 2 echo (7896) < mmap
34384 12:02:36.269827674 2 echo (7896) > write fd=1(/dev/pts/3) size=12
34385 12:02:36.269839477 2 echo (7896) < write res=12 data=hello world.
34386 12:02:36.269843986 2 echo (7896) > close fd=1(/dev/pts/3)
34387 12:02:36.269844466 2 echo (7896) < close res=0
34388 12:02:36.269844816 2 echo (7896) > munmap
34389 12:02:36.269850803 2 echo (7896) < munmap
34390 12:02:36.269851915 2 echo (7896) > close fd=2(/dev/pts/3)
34391 12:02:36.269852314 2 echo (7896) < close res=0

默认的输出格式为:

*%evt.num %evt.time %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.args
  • evt.num 是一个递增的序号
  • evt.time 是事件的发生时间
  • evt.cpu 是事件的cpu序号
  • proc.name 是进程名称
  • thread.top 是线程 id
  • evt.dir 是事件方向,> 为进入事件,< 为退出事件
  • evt.type 是事件类型,例如 read,open,write 等
  • evt.args 是事件参数列表

输出到文件

默认输出是在 terminal,可以将输出保存到文件中,然后 sysdig 再加载文件来进行分析。

# sysdig 输出到文件
$ sudo sysdig -w dump.scap

# rotation 参考 https://sysdig.com/blog/sysdig-continuous-capture-with-file-rotation/
# 例如每个文件大小为 1M,保留最新的 5 个文件
$ sudo sysdig -C 1 -W 5 -w dump.scap

# 读入文件
$ sysdig -r dump.scap

可以参考 Sysdig Continuous Capture with File Rotation 查看更多输出到文件的示例。

格式化输出

通过 sysdig -j 可以输出 JSON 格式,例如:

$ sudo sysdig -j
{"evt.cpu":1,"evt.dir":">","evt.info":"next=0 pgft_maj=0 pgft_min=1385 vm_size=229324 vm_rss=15784 vm_swap=0 ","evt.num":7,"evt.outputtime":1607781847452944277,"evt.type":"switch","proc.name":"sysdig","thread.tid":14550}
{"evt.cpu":0,"evt.dir":">","evt.info":"interval=4000000(0.004s) ","evt.num":8,"evt.outputtime":1607781847453039462,"evt.type":"nanosleep","proc.name":"falco","thread.tid":12073}
{"evt.cpu":0,"evt.dir":">","evt.info":"next=0 pgft_maj=70 pgft_min=3664 vm_size=407148 vm_rss=36156 vm_swap=4596 ","evt.num":9,"evt.outputtime":1607781847453046982,"evt.type":"switch","proc.name":"falco","thread.tid":12073}
{"evt.cpu":1,"evt.dir":">","evt.info":"next=14550(sysdig) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0 ","evt.num":10,"evt.outputtime":1607781847453471349,"evt.type":"switch","proc.name":null,"thread.tid":0}
{"evt.cpu":1,"evt.dir":">","evt.info":"next=14549(head) pgft_maj=0 pgft_min=1389 vm_size=229324 vm_rss=15784 vm_swap=0 ","evt.num":24,"evt.outputtime":1607781847453669133,"evt.type":"switch","proc.name":"sysdig","thread.tid":14550}

通过 sysdig -p <format> 可以自定义输出格式,例如:

$ sudo sysdig -p"user:%user.name dir:%evt.arg.path"
user:messagebus dir:/usr/share/dbus-1/system-services/org.freedesktop.nm_dispatcher.service
user:messagebus dir:/usr/share/dbus-1/system-services/org.freedesktop.Avahi.service
user:messagebus dir:/usr/share/dbus-1/system-services/org.freedesktop.UPower.service
user:messagebus dir:/usr/share/dbus-1/system-services/fi.w1.wpa_supplicant1.service
user:messagebus dir:/usr/share/dbus-1/system-services/org.freedesktop.ModemManager1.service
user:messagebus dir:/usr/share/dbus-1/system-services/org.freedesktop.RealtimeKit1.service
  • 变量前面需要添加 % 
  • 默认情况下,只有在所有变量都存在的情况下才会打印出来啊。如果需要允许字段不存在,使用 *%user.name 类似格式,即前面加一个 * ,此时不存在的字段会打印出 <NA> 

csysdig

通过 sudo csysdig 可以查看图形化的展示,类似于 top 命令看到的效果,如图所示:

image.png

Filter

sysdig 提供了强大的过滤功能,用来进行筛选。例如:

sudo sysdig proc.name=cat

sudo sysdig proc.name=cat and evt.type=read

操作符支持:=,!=,<,<=,>,>=,contains,icontains,in,exists 逻辑操作支持:and,or,not

通过 sysdig -l 可以查看所有支持的 filter,有如下几类:

  • process
  • group
  • syslog
  • container
  • fdlist
  • mesos
  • evtin

Chisels 

sysdig chisel 是 Lua 编写的脚本,可以用来对事件进行分析。通过 sysdig -cl 可以查看 chisesl 列表。主要有以下几类:

  • Application
  • CPU Usage
  • Errors
  • Performance
  • Security
  • System State
  • Tracers

通过 sysdig -i <chiselnaame> 可以查看 chisel 信息,通过 sysdig -c <chiselname> [args] 可以运行一个 chisel。例如:

$ sysdig -i ps
Category: System State
----------------------
ps              List (and optionally filter) the machine processes.

List the running processes, with an output that is similar to the one of ps. Ou
tput is at a point in time; adjust this in the filter. It defaults to time of e
vt.num=0
Args:
[filter] filter - A sysdig-like filter expression that allows r
                estricting the FD list. For example 'fd.name contains /etc' sho
                ws all the processes that have files open under /etc.

$ sudo sysdig -c ps proc.name=bash
TID     PID     USER      VIRT       RES        FDLIMIT   CMD                 
16201   16201   admin     29.26M     5.61M      1024      bash
20480   20480   admin     29.26M     5.59M      1024      bash

云原生支持

sysdig 对云原生场景有很好的支持,Filter 包含 container 和 k8s,例如:

$ sudo sysdig container.name=nginx
1 23:12:55.568063000 0 container:eda15d667287 (-1) > container json={"container":{"Mounts":[],"cpu_period":100000,"cpu_quota":0,"cpu_shares":1024,"cpuset_cpu_count":0,"created_time":1607784243,"env":[],"full_id":"eda15d667287f94c26fde54f725f63b348f932455f2af80fdf6a3ae3eb70a04f","id":"eda15d667287","image":"nginx:1.17-alpine","imagedigest":"sha256:763e7f0188e378fef0c761854552c70bbd817555dc4de029681a2e972e25e30e","imageid":"89ec9da682137d6b18ab8244ca263b6771067f251562f884c7510c8f1e5ac910","imagerepo":"nginx","imagetag":"1.17-alpine","ip":"172.17.0.2","is_pod_sandbox":false,"labels":{"maintainer":"NGINX Docker Maintainers <[email protected]>"},"lookup_state":1,"memory_limit":0,"metadata_deadline":0,"name":"nginx","port_mappings":[{"ContainerPort":80,"HostIp":0,"HostPort":8080}],"privileged":false,"swap_limit":0,"type":0}}

94652 23:13:04.245968331 1 nginx (14766) < epoll_pwait 
94653 23:13:04.245981938 1 nginx (14766) > accept flags=0 
94654 23:13:04.245987609 1 nginx (14766) < accept fd=7(<4t>172.17.0.1:47566->172.17.0.2:80) tuple=172.17.0.1:47566->172.17.0.2:80 queuepct=0 queuelen=0 queuemax=511 
94655 23:13:04.245994319 1 nginx (14766) > epoll_ctl 
94656 23:13:04.245996892 1 nginx (14766) < epoll_ctl 
94657 23:13:04.245998144 1 nginx (14766) > epoll_pwait 
94658 23:13:04.246001866 1 nginx (14766) > switch next=15088(curl) pgft_maj=0 pgft_min=153 vm_size=6420 vm_rss=1868 vm_swap=0 
94681 23:13:04.246077541 1 nginx (14766) < epoll_pwait 
94682 23:13:04.246081494 1 nginx (14766) > recvfrom fd=7(<4t>172.17.0.1:47566->172.17.0.2:80) size=1024 
94683 23:13:04.246084084 1 nginx (14766) < recvfrom res=78 data=GET / HTTP/1.1..Host: localhost:8080..User-Agent: curl/7.58.0..Accept: */*.... tuple=NULL 

sysdig 对容器也有比较好的支持,例如:

$ sysdig -cl | grep container
topcontainers_cpu
                Top containers by CPU usage
topcontainers_error
                Top containers by number of errors
topcontainers_file
                Top containers by R+W disk bytes
topcontainers_net
                Top containers by network I/O
lscontainers    List the running containers

$ sudo sysdig -c lscontainers
container.type container.image container.name      container.id 
-------------- --------------- ------------------- ------------ 
docker         nginx:1.17-alpi nginx               eda15d667287

$ sudo sysdig -c topcontainers_cpu
CPU%                container.name      
--------------------------------------------------------------------------------
10.90%              host
0.00%               nginx

可以参考 Let there be light – Sysdig adds container visibility 查看更多介绍。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK