5

容器内的Linux诊断工具0x.tools - 扣钉日记

 2 years ago
source link: https://www.cnblogs.com/codelogs/p/16242999.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

原创:扣钉日记(微信公众号ID:codelogs),欢迎分享,转载请保留出处。

简介#

Linux上有大量的问题诊断工具,如perf、bcc等,但这些诊断工具,虽然功能强大,但却需要很高的权限才可以使用。

而0x.tools这个工具提供了一个很好的思路,通过采样/proc目录来诊断问题,对被测量程序几乎无性能影响,且只要与目标进程拥有同等级的权限,即可正常使用。

不要小看这个权限区别,在互联网大厂,开发同学一般只能获取到一个受限于容器内的shell环境,想要获取机器的root权限几乎是不可能的。

安装#

# 下载源码
$ git clone https://github.com/tanelpoder/0xtools.git

# 安装编译器
$ yum install -y make gcc

# 编译并安装程序
$ make && make install

实际上0x.tools里的工具大多数是脚本,如psn工具是python脚本,因此直接将代码clone下来,然后执行bin/psn也是可以的。

psn工具#

psn工具用来观测系统中当前活跃的线程正在做什么,如线程在做什么系统调用、写什么文件、阻塞在哪个内核函数下?

查看活跃线程#

[tanel@linux01 ~]$ psn

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.


=== Active Threads ================================================

 samples | avg_threads | comm             | state                  
-------------------------------------------------------------------
   10628 |     3542.67 | (kworker/*:*)    | Disk (Uninterruptible) 
      37 |       12.33 | (oracle_*_l)     | Running (ON CPU)       
      17 |        5.67 | (oracle_*_l)     | Disk (Uninterruptible) 
       2 |        0.67 | (xcapture)       | Running (ON CPU)       
       1 |        0.33 | (ora_lg*_xe)     | Disk (Uninterruptible) 
       1 |        0.33 | (ora_lgwr_lin*)  | Disk (Uninterruptible) 
       1 |        0.33 | (ora_lgwr_lin*c) | Disk (Uninterruptible) 


samples: 3 (expected: 100)
total processes: 10470, threads: 11530
runtime: 6.13, measure time: 6.03

如上,默认情况下,psn采样/proc目录下每个线程的/proc/$pid/stat文件,采样5秒钟,将R(正在运行)或D(不可中断休眠)状态的线程的数据记录下来,并做汇总。

由于R或D状态的线程都是活跃线程,被采样到的次数越多,则越说明这些线程运行得更慢或更频繁。

查看线程读写文件#

[tanel@linux01 ~]$ sudo psn -G syscall,filenamesum

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/syscall, stat for 5 seconds... finished.


=== Active Threads =======================================================================================================

 samples | avg_threads | comm            | state                  | syscall         | filenamesum                         
--------------------------------------------------------------------------------------------------------------------------
    2027 |      506.75 | (kworker/*:*)   | Disk (Uninterruptible) | [kernel_thread] |                                     
    1963 |      490.75 | (oracle_*_l)    | Disk (Uninterruptible) | pread64         | /data/oracle/LIN*C/soe_bigfile.dbf
      87 |       21.75 | (oracle_*_l)    | Running (ON CPU)       | [running]       |                                     
      13 |        3.25 | (kworker/*:*)   | Running (ON CPU)       | [running]       |                                     
       4 |        1.00 | (oracle_*_l)    | Running (ON CPU)       | read            | socket:[*]                          
       2 |        0.50 | (collectl)      | Running (ON CPU)       | [running]       |                                     
       1 |        0.25 | (java)          | Running (ON CPU)       | futex           |                                     
       1 |        0.25 | (ora_ckpt_xe)   | Disk (Uninterruptible) | pread64         | /data/oracle/XE/control*.ctl        
       1 |        0.25 | (ora_m*_linprd) | Running (ON CPU)       | [running]       |                                     
       1 |        0.25 | (ora_m*_lintes) | Running (ON CPU)       | [running]       |                                     

通过-G可以指定需要查看的列,syscall表示线程正在执行的系统调用,filenamesum表示正在读写的文件,一般来说,线程处于D状态时在做文件io操作,如果D状态线程频繁出现,那么我们肯定想知道线程正在读写哪个文件。

查看线程的内核栈#

[tanel@linux01 ~]$ sudo psn -p -G syscall,wchan,kstack

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/wchan, stack, syscall, stat for 5 seconds... finished.


=== Active Threads =======================================================================================================================================================================================================================================================================================================================================================================================

 samples | avg_threads | comm          | state                  | syscall         | wchan                        | kstack                                                                                                                                                                                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     281 |      140.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | blkdev_issue_flush           | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
     211 |      105.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_read_failed  | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_ilock()->call_rwsem_down_read_failed()                                                                            
     169 |       84.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | call_rwsem_down_write_failed | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()                                                                       
      64 |       32.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | xfs_log_force_lsn            | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_log_force_lsn()                                                                                                     
      24 |       12.00 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | call_rwsem_down_read_failed  | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()->xfs_get_blocks_direct()->__xfs_get_blocks()->xfs_ilock_data_map_shared()->xfs_ilock()->call_rwsem_down_read_failed() 
       5 |        2.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | do_blockdev_direct_IO        | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()                                                                                                                       
       3 |        1.50 | (oracle_*_li) | Running (ON CPU)       | [running]       | 0                            | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()                                                                                                                       
       2 |        1.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_write_failed | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->xfs_end_io_direct_write()->xfs_iomap_write_unwritten()->xfs_ilock()->call_rwsem_down_write_failed()                                                             
       2 |        1.00 | (kworker/*:*) | Running (ON CPU)       | [running]       | 0                            | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
       2 |        1.00 | (oracle_*_li) | Disk (Uninterruptible) | io_submit       | call_rwsem_down_write_failed | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()                                                                                 
       1 |        0.50 | (java)        | Running (ON CPU)       | futex           | futex_wait_queue_me          | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                                                                                                                                   
       1 |        0.50 | (ksoftirqd/*) | Running (ON CPU)       | [running]       | 0                            | ret_from_fork_nospec_begin()->kthread()->smpboot_thread_fn()                                                                                                                                                                                                                           
       1 |        0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread                | ret_from_fork_nospec_begin()->kthread()->worker_thread()                                                                                                                                                                                                                               
       1 |        0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread                | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
       1 |        0.50 | (ora_lg*_xe)  | Disk (Uninterruptible) | io_submit       | inode_dio_wait               | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_write()->xfs_file_dio_aio_write()->inode_dio_wait()                                                                                                                                                              
       1 |        0.50 | (oracle_*_li) | Disk (Uninterruptible) | [running]       | 0                            | -                                                                                                                                                                                                                                                                                      

同理,通过wchan字段可以查看线程阻塞在什么内核方法上,而kstack字段则可以查看线程阻塞时的内核调用栈是什么。

psn的原理#

其实psn和ps命令一样,是通过遍历/proc目录来获取线程信息的,如下:
state:取自/proc/$pid/stat文件。
syscall:取自/proc/$pid/syscall文件。
wchan:取自/proc/$pid/wchan文件。
kstack:取自/proc/$pid/stack文件。
与perf、bcc等工具的区别是,读取这些文件只需要与进程同等级的权限即可,不需要使用root账号。

其它工具#

除了psn外,0x.tools里面还有一些其它工具,如xcapture、schedlat等,这里就不一一介绍了,感兴趣可以访问 https://0x.tools/ 查看。

另外,由于psn是通过遍历/proc目录实现的,因此我们也可自己编写脚本来实现同样的功能,如下:

active_thread_kstack(){
  # 打印当前系统活跃java线程的内核栈
  ps h -Lo pid,tid,s,pcpu,comm,wchan:32,min_flt,maj_flt -C java|grep '[RD] '| awk '
    BEGIN{
        syscall_files["/usr/include/asm/unistd_64.h"]=1;
        syscall_files["/usr/include/x86_64-linux-gnu/asm/unistd_64.h"]=1;
        syscall_files["/usr/include/asm-x86_64/unistd.h"]=1;
        for(tfile in syscall_files){
            cmd="test -f "tfile
            if(system(cmd)==0){
                hfile=tfile;
                break;
            }
        }
        if(hfile){
            while (getline <hfile){
                if($0 ~ /__NR_/){
                    syscall_map[$3]=gensub(/__NR_/,"","g",$2)
                }
            }
        }else{
            syscall_str="0:read 1:write 2:open 3:close 4:stat 5:fstat 6:lstat 7:poll 8:lseek 9:mmap 10:mprotect 11:munmap 12:brk 13:rt_sigaction 14:rt_sigprocmask 15:rt_sigreturn 16:ioctl 17:pread64 18:pwrite64 19:readv 20:writev 21:access 22:pipe 23:select 24:sched_yield 25:mremap 26:msync 27:mincore 28:madvise 29:shmget 30:shmat 31:shmctl 32:dup 33:dup2 34:pause 35:nanosleep 36:getitimer 37:alarm 38:setitimer 39:getpid 40:sendfile 41:socket 42:connect 43:accept 44:sendto 45:recvfrom 46:sendmsg 47:recvmsg 48:shutdown 49:bind 50:listen 51:getsockname 52:getpeername 53:socketpair 54:setsockopt 55:getsockopt 56:clone 57:fork 58:vfork 59:execve 60:exit 61:wait4 62:kill 63:uname 64:semget 65:semop 66:semctl 67:shmdt 68:msgget 69:msgsnd 70:msgrcv 71:msgctl 72:fcntl 73:flock 74:fsync 75:fdatasync 76:truncate 77:ftruncate 78:getdents 79:getcwd 80:chdir 81:fchdir 82:rename 83:mkdir 84:rmdir 85:creat 86:link 87:unlink 88:symlink 89:readlink 90:chmod 91:fchmod 92:chown 93:fchown 94:lchown 95:umask 96:gettimeofday 97:getrlimit 98:getrusage 99:sysinfo 100:times 101:ptrace 102:getuid 103:syslog 104:getgid 105:setuid 106:setgid 107:geteuid 108:getegid 109:setpgid 110:getppid 111:getpgrp 112:setsid 113:setreuid 114:setregid 115:getgroups 116:setgroups 117:setresuid 118:getresuid 119:setresgid 120:getresgid 121:getpgid 122:setfsuid 123:setfsgid 124:getsid 125:capget 126:capset 127:rt_sigpending 128:rt_sigtimedwait 129:rt_sigqueueinfo 130:rt_sigsuspend 131:sigaltstack 132:utime 133:mknod 134:uselib 135:personality 136:ustat 137:statfs 138:fstatfs 139:sysfs 140:getpriority 141:setpriority 142:sched_setparam 143:sched_getparam 144:sched_setscheduler 145:sched_getscheduler 146:sched_get_priority_max 147:sched_get_priority_min 148:sched_rr_get_interval 149:mlock 150:munlock 151:mlockall 152:munlockall 153:vhangup 154:modify_ldt 155:pivot_root 156:_sysctl 157:prctl 158:arch_prctl 159:adjtimex 160:setrlimit 161:chroot 162:sync 163:acct 164:settimeofday 165:mount 166:umount2 167:swapon 168:swapoff 169:reboot 170:sethostname 171:setdomainname 172:iopl 173:ioperm 174:create_module 175:init_module 176:delete_module 177:get_kernel_syms 178:query_module 179:quotactl 180:nfsservctl 181:getpmsg 182:putpmsg 183:afs_syscall 184:tuxcall 185:security 186:gettid 187:readahead 188:setxattr 189:lsetxattr 190:fsetxattr 191:getxattr 192:lgetxattr 193:fgetxattr 194:listxattr 195:llistxattr 196:flistxattr 197:removexattr 198:lremovexattr 199:fremovexattr 200:tkill 201:time 202:futex 203:sched_setaffinity 204:sched_getaffinity 205:set_thread_area 206:io_setup 207:io_destroy 208:io_getevents 209:io_submit 210:io_cancel 211:get_thread_area 212:lookup_dcookie 213:epoll_create 214:epoll_ctl_old 215:epoll_wait_old 216:remap_file_pages 217:getdents64 218:set_tid_address 219:restart_syscall 220:semtimedop 221:fadvise64 222:timer_create 223:timer_settime 224:timer_gettime 225:timer_getoverrun 226:timer_delete 227:clock_settime 228:clock_gettime 229:clock_getres 230:clock_nanosleep 231:exit_group 232:epoll_wait 233:epoll_ctl 234:tgkill 235:utimes 236:vserver 237:mbind 238:set_mempolicy 239:get_mempolicy 240:mq_open 241:mq_unlink 242:mq_timedsend 243:mq_timedreceive 244:mq_notify 245:mq_getsetattr 246:kexec_load 247:waitid 248:add_key 249:request_key 250:keyctl 251:ioprio_set 252:ioprio_get 253:inotify_init 254:inotify_add_watch 255:inotify_rm_watch 256:migrate_pages 257:openat 258:mkdirat 259:mknodat 260:fchownat 261:futimesat 262:newfstatat 263:unlinkat 264:renameat 265:linkat 266:symlinkat 267:readlinkat 268:fchmodat 269:faccessat 270:pselect6 271:ppoll 272:unshare 273:set_robust_list 274:get_robust_list 275:splice 276:tee 277:sync_file_range 278:vmsplice 279:move_pages 280:utimensat 281:epoll_pwait 282:signalfd 283:timerfd_create 284:eventfd 285:fallocate 286:timerfd_settime 287:timerfd_gettime 288:accept4 289:signalfd4 290:eventfd2 291:epoll_create1 292:dup3 293:pipe2 294:inotify_init1 295:preadv 296:pwritev 297:rt_tgsigqueueinfo 298:perf_event_open 299:recvmmsg 300:fanotify_init 301:fanotify_mark 302:prlimit64 303:name_to_handle_at 304:open_by_handle_at 305:clock_adjtime 306:syncfs 307:sendmmsg 308:setns 309:getcpu 310:process_vm_readv 311:process_vm_writev 312:kcmp 313:finit_module 314:sched_setattr 315:sched_getattr 316:renameat2 317:seccomp 318:getrandom 319:memfd_create 320:kexec_file_load 321:bpf 322:execveat 323:userfaultfd 324:membarrier 325:mlock2 326:copy_file_range 327:preadv2 328:pwritev2 329:pkey_mprotect 330:pkey_alloc 331:pkey_free 332:statx 333:io_pgetevents 334:rseq 424:pidfd_send_signal 425:io_uring_setup 426:io_uring_enter 427:io_uring_register 428:open_tree 429:move_mount 430:fsopen 431:fsconfig 432:fsmount 433:fspick 434:pidfd_open 435:clone3"
            split(syscall_str, syscall_arr, " ")
            for(i in syscall_arr){
                split(syscall_arr[i], idname_arr, ":")
                syscall_map[idname_arr[1]]=idname_arr[2]
            }
        }

        syscall_with_fd_map["read"]=1;
        syscall_with_fd_map["write"]=1;
        syscall_with_fd_map["pread64"]=1;
        syscall_with_fd_map["pwrite64"]=1;
        syscall_with_fd_map["fsync"]=1;
        syscall_with_fd_map["fdatasync"]=1;
        syscall_with_fd_map["recvfrom"]=1;
        syscall_with_fd_map["sendto"]=1;
        syscall_with_fd_map["recvmsg"]=1;
        syscall_with_fd_map["sendmsg"]=1;
        syscall_with_fd_map["epoll_wait"]=1;
        syscall_with_fd_map["ioctl"]=1;
        syscall_with_fd_map["accept"]=1;
        syscall_with_fd_map["accept4"]=1;
        special_fd_map["0"]="(stdin)";
        special_fd_map["1"]="(stdout)";
        special_fd_map["2"]="(stderr)";
    }
    {
        RS="^$";
        getline wchan <("/proc/"$1"/task/"$2"/wchan");
        close("/proc/"$1"/task/"$2"/wchan");
        getline stack <("/proc/"$1"/task/"$2"/stack");
        close("/proc/"$1"/task/"$2"/stack");
        getline syscall <("/proc/"$1"/task/"$2"/syscall");
        close("/proc/"$1"/task/"$2"/syscall");
        split(syscall, syscall_arr, /\s+/);
        syscall_id=syscall_arr[1]
        syscall_name=syscall_map[syscall_id];
        if(syscall_name in syscall_with_fd_map){
            fd=strtonum(syscall_arr[2])
            cmd="readlink /proc/"$1"/fd/"fd;
            cmd|getline filename;
            close(cmd);
            if(fd in special_fd_map){
                filename=filename special_fd_map[syscall_id]
            }
        }
        printf "pid:%s,tid:%s,stat:%s,pcpu:%s,comm:%s,wchan:%s,min_flt:%s,maj_flt:%s,syscall:%s,filename:%s\n",$1,$2,$3,$4,$5,wchan,$7,$8,syscall_name,filename;
        print stack;
        RS="\n"
    }'
}

这样,我们不用安装0x.tools,就也能得到类似于psn命令的功能了!

往期内容#

Linux命令拾遗-入门篇
Linux命令拾遗-文本处理篇
Linux命令拾遗-软件资源观测
mysql的timestamp会存在时区问题?
真正理解可重复读事务隔离级别
字符编码解惑


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK