

Python / PyTorch 性能调优指南 | damaoooo's blog
source link: https://damaoooo.github.io/2023/09/15/PythonPyTorch%20%E8%B0%83%E4%BC%98%E6%8C%87%E5%8D%97/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

在炼丹的时候,总是会遇到预处理/训练的速度慢的情况,尤其是遇到大型数据集,一个不小心就是两三天,要是代码还有bug,跑了两天之后发现又得重来,那就更痛苦了,这里介绍一下如何使用profiler找到程序的瓶颈
简单易用的 Line Profiler
Line Profiler还是带有一点侵入性的,需要你手动更改代码
首先最简单的,安装
pip install line_profiler
安装好之后,在程序里插入@profile就行了(有些编辑器会提示找不到符号),如下
from line_profiler import profile
@profile
def calculation():
x = 1
for i in range(10000):
x += i
return x
print(calculation())
然后不要直接运行,而是运行
kernprof -l <filename>.py
python -m line_profiler <filename.py>.lprof
运行kernprof
会生成一些profile文件,这个不是ASCII文件,打不开的,需要用line_profiler来打印输出
$ python -m line_profiler .\exp.py.lprof
Timer unit: 1e-06 s
Total time: 0.0036551 s
File: ./exp.py
Function: calculation at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def calculation():
6 1 0.4 0.4 0.0 x = 1
7 10001 1724.8 0.2 47.2 for i in range(10000):
8 10000 1929.7 0.2 52.8 x += i
9 1 0.2 0.2 0.0 return x
是不是很直观呢,但是由于每次都必须手动在要跟踪的的函数里面输入@profile
,这是十分不方便的
cProfile - 更加详细
cProfile就更加详细了,甚至可以做到C级别的profile
有侵入式和非侵入式两种用法
如果想要使用非侵入式
python -m cProfile -o myLog.profile <myScript.py> arg1 arg2
其中-o
是profile文件保存名字,不想要也可以不指定,结果如下
5 function calls in 0.001 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 exp.py:2(calculation)
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.001 0.001 {built-in method builtins.exec}
1 0.000 0.000 0.001 0.001 exp.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
如果你想要看看火焰图,就需要导出profile文件了
flameprof pipeline.prof > pipeline.svg
下面的图是call graph,也比较直观
gprof2dot -f pstats myLog.profile -o callingGraph.dot
py-spy 超级方便
生成火焰图,即便是使用cProfile
,都要两行命令,如何使用一行命令就生成火焰图呢,py-spy
也是可以非侵入式的
py-spy record -o profile.svg -- python myprogram.py
torch.profiler
PyTorch自带一个profiler,可以对神经网络的计算部分进行profile,查找瓶颈,但是我发现我的瓶颈根本不在GPU,在单核CPU上,太菜了
from torch.profiler import profile, record_function, ProfilerActivity
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], with_stack=True, record_shapes=True) as prof:
with torch.no_grad():
start_time = time.time()
for i in range(100):
s = model(dummy_graph)
end_time = time.time()
print("CUDA FP32 time: ", (end_time - start_time) / 10)
print(prof.key_averages(group_by_stack_n=5).table(sort_by="cpu_time_total",row_limit=10))
print(prof.key_averages( group_by_stack_n=5).table(sort_by="cuda_time_total", row_limit=10))
prof.export_stacks("./profiler_stacks.txt", "self_cuda_time_total")
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK