34

GitHub - IntelLabs/hpat

 6 years ago
source link: https://github.com/IntelLabs/hpat
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Intel® Scalable Dataframe Compiler

Numba* Extension For Pandas* Operations Compilation

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that enables compilation of Pandas* operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.

Intel® SDC documentation can be found here.

For maximum performance and stability, please use numba from intel/label/beta channel.

Installing Binary Packages (conda and wheel)

Intel® SDC is available on the Anaconda Cloud intel/label/beta channel. Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.

Intel® SDC conda package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> -c anaconda -c conda-forge
> conda activate sdc-env
> conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels

Intel® SDC wheel package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> pip -c anaconda -c conda-forge
> conda activate sdc-env
> pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc

Building Intel® SDC from Source on Linux

We use Anaconda distribution of Python for setting up Intel® SDC build environment.

If you do not have conda, we recommend using Miniconda3:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

For maximum performance and stability, please use numba from intel/label/beta channel.

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.

Building on Linux with conda-build

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env python=$PYVER conda-build
source activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Linux with setuptools

export PYVER=<3.6 or 3.7>
export NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER tbb-devel tbb4py numba=0.54.1 pandas=1.3.4 pyarrow=4.0.1 gcc_linux-64 gxx_linux-64
source activate sdc-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

In case of issues, reinstalling in a new conda environment is recommended.

Building Intel® SDC from Source on Windows

Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Windows.

Building on Windows with conda-build

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64
conda activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Windows with setuptools

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% tbb-devel tbb4py numba=0.54.1 pandas=1.3.4 pyarrow=4.0.1
conda activate sdc-env
set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

Troubleshooting Windows Build

  • If the cl compiler throws the error fatal error LNK1158: cannot run 'rc.exe', add Windows Kits to your PATH (e.g. C:\Program Files (x86)\Windows Kits\8.0\bin\x86).
  • Some errors can be mitigated by set DISTUTILS_USE_SDK=1.
  • For setting up Visual Studio, one might need go to registry at HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7, and add a string value named 14.0 whose data is C:\Program Files (x86)\Microsoft Visual Studio 14.0\.
  • Sometimes if the conda version or visual studio version being used are not latest then building Intel® SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.

Building documentation

Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.

Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.

Use pip to install Sphinx* and extensions:

pip install sphinx sphinxcontrib-programoutput

Currently the build precedure is based on make located at ./sdc/docs/ folder. While it is not generally required we recommended that you clean up the system from previous documentaiton build by running:

make clean

To build HTML documentation you will need to run:

make html

The built documentation will be located in the ./sdc/docs/build/html directory. To preview the documentation open index.html file.

More information about building and adding documentation can be found here.

Running unit tests

python sdc/tests/gen_test_data.py
python -m unittest

References

Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK