

GitHub - nok/sklearn-porter: Transpile trained scikit-learn estimators to C, Jav...
source link: https://github.com/nok/sklearn-porter
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

readme.md
sklearn-porter
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.
Machine learning algorithms
Algorithm Programming language Classifier Java * JS C Go PHP Ruby svm.SVC ✓, ✓ ᴵ ✓ ✓ ✓ ✓ svm.NuSVC ✓, ✓ ᴵ ✓ ✓ ✓ ✓ svm.LinearSVC ✓, ✓ ᴵ ✓ ✓ ✓ ✓ ✓ tree.DecisionTreeClassifier ✓, ✓ ᴱ, ✓ ᴵ ✓, ✓ ᴱ ✓, ✓ ᴱ ✓, ✓ ᴱ ✓, ✓ ᴱ ✓, ✓ ᴱ ensemble.RandomForestClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ ensemble.ExtraTreesClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ ✓ ᴱ ✓ ᴱ ✓ ᴱ ensemble.AdaBoostClassifier ✓ ᴱ, ✓ ᴵ ✓ ᴱ, ✓ ᴵ ✓ ᴱ neighbors.KNeighborsClassifier ✓, ✓ ᴵ ✓, ✓ ᴵ naive_bayes.GaussianNB ✓, ✓ ᴵ ✓ naive_bayes.BernoulliNB ✓, ✓ ᴵ ✓ neural_network.MLPClassifier ✓, ✓ ᴵ ✓, ✓ ᴵ Regressor neural_network.MLPRegressor ✓✓ = is full-featured, ᴱ = with embedded model data, ᴵ = with imported model data, * = default language
Installation
$ pip install sklearn-porter
If you want the latest changes, you can install the module from the master branch:
$ pip uninstall -y sklearn-porter $ pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
Usage
Export
The following example demonstrates how you can transpile a decision tree estimator to Java:
from sklearn.datasets import load_iris from sklearn.tree import tree from sklearn_porter import Porter # load data and train the classifier: samples = load_iris() X, y = samples.data, samples.target clf = tree.DecisionTreeClassifier() clf.fit(X, y) # export: porter = Porter(clf, language='java') output = porter.export(embed_data=True) print(output)
The exported result matches the official human-readable version of the decision tree.
Integrity
You should always check and compute the integrity between the original and the transpiled estimator:
# ... porter = Porter(clf, language='java') # accuracy: integrity = porter.integrity_score(X) print(integrity) # 1.0
Prediction
You can compute the prediction(s) in the target programming language:
# ... porter = Porter(clf, language='java') # prediction(s): Y_java = porter.predict(X) y_java = porter.predict(X[0]) y_java = porter.predict([1., 2., 3., 4.])
Notebooks
You can run and test all notebooks by starting a Jupyter notebook server locally:
$ make open.examples $ make stop.examples
Command-line interface
In general you can use the porter on the command line. Either you use python -m sklearn_porter [-h]
or you install an executable to use porter [-h]
directly:
$ make link
$ porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
[--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
[--c] [--java] [--js] [--go] [--php] [--ruby] \
[--export] [--checksum] [--data] [--pipe]
The following example shows how you can save a trained estimator to the pickle format:
# ... # extract estimator: joblib.dump(clf, 'estimator.pkl', compress=0)
After that the estimator can be transpiled to JavaScript by using the following command:
$ porter -i estimator.pkl --js
The target programming language is changeable on the fly:
$ porter -i estimator.pkl --c $ porter -i estimator.pkl --java $ porter -i estimator.pkl --php $ porter -i estimator.pkl --java $ porter -i estimator.pkl --ruby
For further processing the argument --pipe
can be used to pass the result:
$ porter -i estimator.pkl --js --pipe > estimator.js
For instance the result can be minified by using UglifyJS:
$ porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js
Development
Environment
You have to install required modules for broader development:
$ make install.environment # conda environment (optional) $ make install.requirements.development # pip requirements
Independently, the following compilers and intepreters are required to cover all tests:
Name Version Command GCC>=4.2
gcc --version
Java
>=1.6
java -version
PHP
>=5.6
php --version
Ruby
>=2.4.1
ruby --version
Go
>=1.7.4
go version
Node.js
>=6
node --version
Testing
The tests cover module functions as well as matching predictions of transpiled estimators. Start all tests with:
$ make test
The test files have a specific pattern: '[Algorithm][Language]Test.py'
:
$ pytest tests -v -o python_files='RandomForest*Test.py' $ pytest tests -v -o python_files='*JavaTest.py'
While you are developing new features or fixes, you can reduce the test duration by changing the number of tests:
$ N_RANDOM_FEATURE_SETS=5 N_EXISTING_FEATURE_SETS=10 \
pytest tests -v -o python_files='*JavaTest.py'
Quality
It's highly recommended to ensure the code quality. For that Pylint is used. Start the linter with:
$ make lint
Citation
If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:
@unpublished{skpodamo,
author = {Darius Morawiec},
title = {sklearn-porter},
note = {Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
url = {https://github.com/nok/sklearn-porter}
}
License
The module is Open Source Software released under the MIT license.
Questions?
Don't be shy and feel free to contact me on Twitter or Gitter.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK