Monotonicity constraints in machine learning

In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing strategy and use a model to estimate the likelyhood of a room being booked at a given price and day of the week. For a relationship like this the assumption is that, all other things being equal, a cheaper price is preferred by a user, so demand is higher at a lower price. However what might easily happen is that upon building the model, the data scientist discovers that the model is behaving unexpectedly: for example the model predicts that on Tuesdays, the clients would rather pay $110 than $100 for a room! The reason is that while there is an expected monotonic relationship between price and the likelyhood of booking, the model is unable to (fully) capture it, due to noisyness of the data and confounds in it.

Too often, such constraints are ignored by practitioners, especially when non-linear models such as random forests, gradient boosted trees or neural networks are used. And while monotonicity constraints have been a topic of academic research for a long time (see a survey paper on monotonocity constraints for tree based methods), there has been lack of support from libraries, making the problem hard to tackle for practiotioners.

Luckily, in recent years there has been a lot of progress in various ML libraries to allow setting monotonicity constrsints for the models, including in LightGBM and XGBoost , two of the most popular libraries for gradient boosted trees. Monotonicity constraints have also been built into Tensorflow Lattice , a library that implements a novel method for creating interpolated lookup tables.

Monotinicity constraints in LighGBM and XGBoost

For tree based methods (decision trees, random forests, gradient boosted trees), monotonicity can be forced during the model learning phase by not creating splits on monotonic features that would break the monotonicity constraint.

In the following example, let’s train too models using LightGBM on a toy dataset where we know the relationship between X and Y to be monotonic (but noisy) and compare the default and monotonic model.

import numpy as np
size = 100
x = np.linspace(0, 10, size) 
y = x**2 + 10 - (20 * np.random.random(size))

7JbYbyq.png!web

Let’s fit a fit a gradient boosted model on this data, setting min_child_samples to 5.

import lightgbm as lgb
overfit_model = lgb.LGBMRegressor(silent=False, min_child_samples=5)
overfit_model.fit(x.reshape(-1,1), y)

#predicted output from the model from the same input
prediction = overfit_model.predict(x.reshape(-1,1))

The model will slightly overfit (due to small min_child_samples ), which we can see from plotting the values of X against the predicted values of Y: the red line is not monotonic as we’d like it to be.

naqIfi2.png!web

Since we know that that the relationship between X and Y should be monotonic, we can set this constraint when specifying the model.

monotone_model = lgb.LGBMRegressor(min_child_samples=5, 
                                   monotone_constraints="1")
monotone_model.fit(x.reshape(-1,1), y)

The parameter monotone_constraints=”1″ states that the output should be monotonically increasing wrt. the first features (which in our case happens to be the only feature). After training the monotone model, we can see that the relationship is now strictly monotone.

And if we check the model performance, we can see that not only does the monotonicity constraint provide a more natural fit, but the model generalizes better as well (as expected). Measuring the mean squared error on new test data, we see that error is smaller for the monotone model.

from sklearn.metrics import mean_squared_error as mse

size = 1000000
x = np.linspace(0, 10, size) 
y = x**2  -10 + (20 * np.random.random(size))

print ("Default model mse", mse(y, overfit_model.predict(x.reshape(-1,1))))
print ("Monotone model mse", mse(y, monotone_model.predict(x.reshape(-1,1))))


Default model mse 37.61501106522855

Monotone model mse 32.283051723268265

Other methods for enforcing monotonicity

Tree based methods are not the only option for setting monotonicity constraint in the data. One recent development in the field is Tensorflow Lattice , which implements lattice based models that are essentially interpolated look-up tables that can approximate arbitrary input-output relationships in the data and which can optionally be monotonic. There is a thorough tutorial on it in Tensorflow Github.

If a curve is already given, monotonic spline can be fit on the data, for example using the splinefun package.

Monotinicity constraints in LighGBM and XGBoost

Other methods for enforcing monotonicity

Recommend

V神谈区块链的七宗罪：区块链的未来属于DAO

GitHub - kentcdodds/dom-testing-library-with-anything: you can use dom-testing-l...

推送系统从0到1（七）：推送用户画像建立

GitHub - felix-lang/felix: The Felix Programming Language

交通银行支付宝消费抽奖 5888积分 / 最高8888刷卡金

大家有没有这种感受：我们总想给女性半边天，给她们对等的权利和尊重，愿意和她们公平...

铲屎官家的三只喵每次睡觉都要抱在一起

你成功地报复过谁？

看到瑞典这件事，我不由想起，最近的一个冬天，某国首都，深夜驱赶成千上万的低端人口...

迪士尼这个coco新表演也太棒了吧！

About Joyk