A simple scheduler for running commands on multi-GPU workstations/servers

simple_gpu_scheduler

A simple scheduler to run your commands on individual GPUs. Following the KISS principle , this script simply accepts commands via stdin and executes them on a specific GPU by setting the CUDA_VISIBLE_DEVICES variable.

The commands read are executed using the login shell, thus redirections > pipes | and all other kinds of shell magic can be used.

Installation

The package can simply be installed from pypi

$ pip install simple_gpu_scheduler

Simple Example

Suppose you have a file gpu_commands.txt with commands that you would like to execute on the GPUs 0, 1 and 2 in parallel:

$ cat gpu_commands.txt
python train_model.py --lr 0.001 --output run_1
python train_model.py --lr 0.0005 --output run_2
python train_model.py --lr 0.0001 --output run_3

Then you can do so by simply piping the command into the simple_gpu_scheduler script

$ simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
Processing command `python train_model.py --lr 0.001 --output run_1` on gpu 2
Processing command `python train_model.py --lr 0.0005 --output run_2` on gpu 1
Processing command `python train_model.py --lr 0.0001 --output run_3` on gpu 0

Hyperparameter search

In order to allow user friendly utilization of the scheduler in the common scenario of hyperparameter search, a convenience script simple_hypersearch is included in the package. The output can directly be piped into simple_gpu_scheduler or appended to the "queue file" (seeSimple scheduler for jobs).

Grid of all possible parameter configurations in random order:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

5 uniformly sampled parameter configurations:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

Simple scheduler for jobs

Combined with some basic command line tools, one can set up a very basic scheduler which waits for new jobs to be "submitted" and executes them in order of submission.

Setup and start scheduler in background or in a separate permanent session (using for example tmux ):

touch gpu.queue
tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2

the command tail -f -n 0 follows the end of the gpu.queue file. Thus if there was anything written into gpu.queue prior to the execution of the command it will not be passed to simple_gpu_scheduler .

Then submitting commands boils down to appending text to the gpu.queue file:

echo "my_command_with | and stuff > logfile" >> gpu.queue

TODO

Multi line jobs (evtl. we would then need a submission script after all)
Stop, but let commands finish when receiving a defined signal
Tests would be nice, until now the project is still very small but if it grows tests should be added

simple_gpu_scheduler

Installation

Simple Example

Hyperparameter search

Grid of all possible parameter configurations in random order:

5 uniformly sampled parameter configurations:

Simple scheduler for jobs

TODO

Recommend

The Web Share API in Safari on iOS

生成排列的算法汇总

Containerized AI for Anomaly Detection

Hacks for Doing Black Magic of Deep Learning

The AI Behind OpenAI’s Robotic Hand that can a Solve Rubik’s Cube One-Handed

与全球程序员过招的 7 个编程挑战网站！ - 知乎

Required update to recent libarchive

意大利银行业协会通过区块链成功测试数据对账

Making UI Components with your Template Engine

Ten-Ton Widgets | CSS-Tricks

About Joyk