README.md

Chainlearn

Mini module with some syntax sugar utilities for pandas and sklearn. It basically allows you turn this:

import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE 
from sklearn.cluster import KMeans
 
iris = sns.load_dataset('iris').drop('species', axis=1)
 
pca = PCA(n_components=3)
tsne = TSNE(n_components=2)

kmeans = KMeans(n_clusters=2)

cluster_labels = kmeans.fit_predict(iris)

transformed = tsne.fit_transform(pca.fit_transform(iris))

plt.scatter(transformed[:, 0], transformed[:, 1], c=cluster_labels)

Into a chainlearn pipeline that tries to look like a "tidyverse" version:

import seaborn as sns
import chainlearn

iris = sns.load_dataset('iris')

(iris
 .drop('species', axis=1)
 .PCA(n_components=3)
 .TSNE(n_components=2)
 .assign(
     cluster=lambda df: df.KMeans(n_clusters=2)
 )
 .plot
 .scatter(
     x=0,
     y=1,
     c='cluster',
     cmap=plt.get_cmap('viridis')
 )
);

This is achieved by attaching some sklearn model and preprocessing classes to the pandas DataFrame and Series classes, and trying to guess what methods should be called.

You can also do supervised/regressions/etc:

(iris
 .assign(
     species=lambda df: df['species'].LabelEncoder()
 )
 .RandomForestClassifier(
     n_estimators=100,
     target='species'
 )
 .rename(columns={0: 'label'})
 .plot
 .hist()
)

Check out the examples notebook...

Other stuff you can do

Additionally, there are a couple of methods you can call to shorten some tasks.

Explain

Calling explain at the end of your chainlearn pipeline will get you whatever the model has to try to explain itself. In linear models this will be the coefficients, while ensemble models will have feature importances (in sklearn computed as mean decrease impurity for most models).

(iris
 .assign(
     species=lambda df: df['species'].LabelEncoder()
 )
 .Lasso(alpha=0.01, target='species')
 .explain()
 .plot
 .bar()
);

I may add some SHAP value calculations in the near future.

Cross-validate

There is also a cross_validate function that will perform cross validation and get you the scores.

(iris
 .assign(
     species=lambda df: df['species'].LabelEncoder()
 )
 .RandomForestClassifier(
     n_estimators=100,
     target='species'
 )
 .cross_validate(folds=5, scoring='f1_macro')
 .plot
 .hist()
);

Attaching your own models

If you have your own module with models that follow the sklearn api (i.e. have fit and/or fit_predict, fit_transform, transform, predict methods) you can attach them to DataFrames and Series:

import mymodels # Contains a MyModel class with a fit_transform method
from chainlearn import attach
attach(mymodels)

(iris
 .MyModel(params=params)
 .plot
 .scatter(x=0, y=1)
);

Install

pip install chainlearn or install locally by cloning, changing to the repo dir and pip install -e .

GitHub - dimenwarper/chainlearn: Mini module with syntax sugar for pandas/sklear...

README.md

Chainlearn

Other stuff you can do

Explain

Cross-validate

Attaching your own models

Install

Recommend

前端面试题 | JS部分（附带答案）

50道Redis面试题史上最全，以后面试再也不怕问Redis了

【译】停止滥用div! HTML语义化介绍

买车，预算 35-45w，求推荐！

算法工程师需不需要架构思维-许式伟首次完整架构经验分享

人民日报谈996工作制:牺牲健康为代价的成功值不值？

媒体：几乎所有4S店都收取金融服务费

西安公布“奔驰车主维权”结果：责成4S店退车退款

【译】UX是什么？概述，工具和资源

P2P、P2C 、O2O 、B2C到底是什么鬼？

About Joyk