10

运行 mlperf-inference v3.0 的 dlrm 多卡测试

 2 years ago
source link: https://wu-kan.cn/2023/07/07/mlperf-inference-dlrm/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

运行 mlperf-inference v3.0 的 dlrm 多卡测试

07 Jul 2023

1927字

7分

CC BY 4.0 (除特别声明或转载文章外)

如果这篇博客帮助到你,可以请我喝一杯咖啡~

尝试跑了一下 mlperf,发现文档写的有亿点点烂,并在上面花费了几天跑通多卡,以下记录一下最佳实践。

clone 源码。

mkdir -p $HOME/mlcommons
cd $HOME/mlcommons
git clone --recurse-submodules --depth=1 https://github.com/mlcommons/training.git
git clone --recurse-submodules -b v3.0 --depth=1 https://github.com/mlcommons/inference.git

下载预训练模型。

mkdir -p $HOME/mlcommons/model
cd $HOME/mlcommons/model
wget https://dlrm.s3-us-west-1.amazonaws.com/models/tb00_40M.pt # 这个贼大,有 90G
# wget https://dlrm.s3-us-west-1.amazonaws.com/models/tb0875_10M.pt
cp tb00_40M.pt dlrm_terabyte.pytorch
# cp tb0875_10M.pt dlrm_kaggle.pytorch

只支持 py3.7。

spack load [email protected] cuda cmake
spack load py-pip ^ [email protected]
cd $HOME/mlcommons/inference/loadgen
python3 -m pip install --prefix $HOME/mlcommons/software-python-3.7 .
python3 -m pip install --prefix $HOME/mlcommons/software-python-3.7 torch torchvision scikit-learn numpy pydot torchviz protobuf tqdm onnxruntime onnx opencv-python

生成数据。由于是测试推理,这里直接生成即可。

spack load [email protected] cuda cmake
spack load py-mlperf-logging ^ [email protected]
spack load py-pip ^ [email protected]

export PYTHONPATH=$HOME/mlcommons/software-python-3.7/lib/python3.7/site-packages:$PYTHONPATH
rm -rf $HOME/mlcommons/fake_criteo
cd $HOME/mlcommons/inference/recommendation/dlrm/pytorch/tools
./make_fake_criteo.sh terabyte
mv ./fake_criteo $HOME/mlcommons

然后就可以开跑啦~

spack load [email protected] cuda cmake
spack load py-mlperf-logging ^ [email protected]
spack load py-pip ^ [email protected]

export DATA_DIR=$HOME/mlcommons/fake_criteo
export MODEL_DIR=$HOME/mlcommons/model
export DLRM_DIR=$HOME/mlcommons/training/recommendation/dlrm
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 # 根据机器上的显卡自行修改!
export PYTHONPATH=$HOME/mlcommons/software-python-3.7/lib/python3.7/site-packages:$PYTHONPATH

# python3 -c 'import sys; print(sys.path); import mlperf_loadgen;'

cd $HOME/mlcommons/inference/recommendation/dlrm/pytorch

./run_local.sh pytorch dlrm terabyte gpu --scenario Offline --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt --max-batchsize=2048 --samples-per-query-offline=204800 --accuracy --mlperf-bin-loader

</div


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK