

Instructions to build a singularity container with popular data science and chem...
source link: https://gist.github.com/rohitfarmer/af486e346900534274cdc1153a764c90
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Building a Singularity Container for Machine Learning, Data Science, & Chemistry
Learning Objectives
- Build a Linux based Singularity container.
- First build a writable sandbox with essential elements.
- Inspect the container.
- Install additional software.
- Convert the sandbox to a read-only SquashFS container image.
- Install software & packages from multiple sources.
- Using
apt-get
package management system. - Compiling from source code.
- Using
Python pip
. - Using
install.packages()
function in R.
- Using
- Software highlight.
- Jupyter notebook.
- Tensorflow GPU version.
- OpenMPI.
- Popular datascience packages in Python and R.
- Chemistry/chemoinformatics software: RDkit, OpenBabel, Pybel, & Mordred.
- Test the container.
- Test the GPU version of Tensorflow.
Core Container Build
First we will build a writable Singularity sandbox with the essential software, languages, and developmental libraries. To build a writable sandbox copy the recipe below to a container.def
text file and then execute:
sudo singularity build --sandbox container/ container.def
Recipe/Definition File
BootStrap: docker
From: ubuntu:bionic
%labels
APPLICATION_NAME Data Science and Chemistry
AUTHOR_NAME Rohit Farmer
AUTHOR_EMAIL [email protected]
YEAR 2021
%help
Container for data science and chemistry with packages from Python 3 & R 3.6.
It also includes CUDA and MPI for Tensorflow GPU and parallel processing respectively.
%environment
# Set system locale
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
RDBASE=/usr/local/share/rdkit
CUDA=/usr/local/cuda/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.2/lib64
LD_LIBRARY_PATH=/.singularity.d/libs:$RDBASE/lib:$CUDA
PYTHONPATH=modules:$RDBASE:/usr/local/share/rdkit/rdkit:/usr/local/lib/python3.6/dist-packages/
LANG=C.UTF-8 LC_ALL=C.UTF-8
%post
# Change to tmp directory to download temporary files.
cd /tmp
# Install essential software, languages and libraries.
apt-get -qq -y update
export DEBIAN_FRONTEND=noninteractive
apt-get -qq install -y --no-install-recommends tzdata apt-utils
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
dpkg-reconfigure --frontend noninteractive tzdata
apt-get -qq -y update
apt-get -qq install -y --no-install-recommends \
autoconf \
automake \
build-essential \
bzip2 \
ca-certificates \
cmake \
gcc \
g++ \
gfortran \
git \
gnupg2 \
libtool \
libjpeg-dev \
libpng-dev \
libtiff-dev \
libatlas-base-dev \
libxml2-dev \
zlib1g-dev \
libcairo2-dev \
libeigen3-dev \
libcupti-dev \
libpcre3-dev \
libssl-dev \
libcurl4-openssl-dev \
libboost-all-dev \
libboost-dev \
libboost-system-dev \
libboost-thread-dev \
libboost-serialization-dev \
libboost-regex-dev \
libgtk2.0-dev \
libreadline-dev \
libbz2-dev \
liblzma-dev \
libpcre++-dev \
libpango1.0-dev \
libmariadb-client-lgpl-dev \
libopenblas-dev \
liblapack-dev \
libxt-dev \
neovim \
openjdk-8-jdk \
python \
python-pip \
python-dev \
python3-dev \
python3-pip \
python3-wheel \
swig \
texlive \
texlive-fonts-extra \
texinfo \
vim \
wget \
xvfb \
xauth \
xfonts-base \
zip
export LANG=C.UTF-8 LC_ALL=C.UTF-8
# Add NVIDIA package repositories.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
apt-get -qq install -y --no-install-recommends ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
apt-get update
# Install NVIDIA driver (optional)
# apt-get install --no-install-recommends nvidia-driver-430
# Install development and runtime libraries.
apt-get install -y --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1
# Install TensorRT. Requires that libcudnn7 is installed above.
apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
# Update python pip.
python3 -m pip --no-cache-dir install --upgrade pip
python3 -m pip --no-cache-dir install setuptools --upgrade
python -m pip --no-cache-dir install setuptools --upgrade
# Install R 3.6.
echo "deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/" >> /etc/apt/sources.list
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
apt-get update
apt-get install -y --no-install-recommends r-base
apt-get install -y --no-install-recommends r-base-dev
# Install Jupyter notebook with Python and R support.
python3 -m pip --no-cache-dir install jupyter
R --quiet --slave -e 'install.packages(c("IRkernel"), repos="https://cloud.r-project.org/")'
# Install MPI (match the version with the cluster).
mkdir -p /tmp/mpi
cd /tmp/mpi
wget -O openmpi-2.1.0.tar.bz2 https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.0.tar.bz2
tar -xjf openmpi-2.1.0.tar.bz2
cd openmpi-2.1.0
./configure --prefix=/usr/local --with-cuda
make -j $(nproc)
make install
ldconfig
# Cleanup
apt-get -qq clean
rm -rf /var/lib/apt/lists/*
rm -rf /tmp/mpi
Inspect Container
To get a list of the labels defined for the container singularity inspect --labels container/
To print the container's help section singularity inspect --helpfile container/
To show container’s environment singularity inspect --environment container/
To retrieve the definition file used to build the container singularity inspect --deffile container/
Install Data Science and Chemistry Packages
Once the core writable sandbox is built we will install the additional data science and chemistry packages.
To do that execute:sudo singularity shell --writable container/
Then execute the following lines in the shell environment.
# Install Python packages.
python3 -m pip --no-cache-dir install numpy pandas h5py pyarrow sklearn statsmodels matplotlib seaborn plotly
# Install Tensorflow.
python3 -m pip --no-cache-dir install tensorflow==2.2.0
# Install R packages.
R --quiet --slave -e 'install.packages("tidyverse", version = "1.3.0", repos="https://cloud.r-project.org/")'
R --quiet --slave -e 'install.packages("tidymodels", version = "0.1.0", repos="https://cloud.r-project.org/")'
R --quiet --slave -e 'install.packages(c("lme4", "glmnet", "yaml", "jsonlite", "rlang"), repos="https://cloud.r-project.org/")'
# Install RDKit
export RDBASE=/usr/local/share/rdkit
export LD_LIBRARY_PATH="$RDBASE/lib:$LD_LIBRARY_PATH"
export PYTHONPATH="$RDBASE:$PYTHONPATH"
mkdir -p /tmp/rdkit
cd /tmp/rdkit
wget https://github.com/rdkit/rdkit/archive/2020_03_3.tar.gz
tar zxf 2020_03_3.tar.gz
mv rdkit-2020_03_3 $RDBASE
mkdir $RDBASE/build
cd $RDBASE/build
cmake -DPYTHON_EXECUTABLE=/usr/bin/python3 ..
make -j $(nproc)
make install
ln -s /usr/local/share/rdkit/rdkit /usr/local/lib/python3.6/dist-packages/
# Install OpenBabel.
apt-get -qq -y update
apt-get -qq install -y --no-install-recommends openbabel python-openbabel
# Install Mordred Molecular Descriptor Calculator.
python3 -m pip --no-cache-dir install mordred
# Cleanup
rm -rf /tmp/rdkit
Convert a Writable Sandbox to a Read Only Compressed Container
Once you are satisfied that you have installed all the required packages you can convert the writable sandbox to a read only squashfs filesystem. Squashfs is a compressed read-only file system for Linux.
sudo singularity build container.sif container/
Install Kernel Spces for Jupyter Notebook for R
Kernel specs are installed from outside the container in the host's home environment.
singularity exec container.sif R --quiet --slave -e 'IRkernel::installspec()'
NOTE: You only have to do it once per host to install kernelspec
.
Test Script(s)
Tensorflow GPU
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_physical_devices('GPU')
if gpus:
with tf.device('/GPU:0'):
tf.random.set_seed(123)
a = tf.random.normal([10000,20000], 0, 1, tf.float32, seed=1)
b = tf.random.normal([20000,10000], 0, 1, tf.float32, seed=1)
c = tf.matmul(a, b)
print(c)
else:
print("No GPUs found.")
print("Num GPUs:", len(gpus))
To execute the script singularity exec --nv container.sif python3 tf_gpu.py
To monitor NVIDIA GPU usage nvidia-smi
Recommend
-
60
GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 80 million projects.
-
42
README.md Singularity
-
6
the branching singularityThe NieR story, part 3: the branching singularity home NieR So, to recap the scene after the
-
5
The Bee Movie Singularity 10 Feb 2018 For whatever reason, people on youtube really like to come up with weird ways to watch The Bee Movie. Case in point:
-
7
Want a Better PC? Try Building Your OwnAssembling a computer yourself is a good way to learn about how it works. It’s also a lost art—one due for a revival.
-
7
Plus, we’re looking for speakers for SMX Create Carolyn Lyden on February 11, 2021 at 10:00 am ...
-
9
Singularity Singularity is an open source container platform designed to be simple, fast, and secure. Singularity is optimized for compute focused enterprise and HPC workloads, allowing untrusted users to run untrusted containers in...
-
12
DevSecOps Introduction: Clear Instructions on How to Build a DevSecOps Pipeline in AWS [Part 1]April 14th 2021 new story8
-
5
Backup WD MyCloud to S3/Glacier with duplicity (build instructions included) April 3, 2015 How to back up your precious files...
-
9
Question New AOSP build instructions for Android 13 ...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK