Wei Hao Khoong

Logo


M.Sc. Statistics,
B.Sc. (Hons) Applied Mathematics,
National University of Singapore

LinkedIn | Kaggle

View My GitHub Profile

DEBoost

A Python Library for Weighted Distance Ensembling in Machine Learning

Installation

Requirements

$ pip install -r requirements.txt

DEBoost from PyPI

$ pip install deboost

Usage & Examples

Regression with Default Models for Ensemble

By default, DEBoostRegressor has parameters method='regression', mode='mean', sdhw=True. In other words, it will perform an ensemble of the mean of all predictions and assign higher weights to model predictions with smaller spatial/statistical distance to all other model predictions. DEBoostClassifier is similar, with method='classification'. Thus calling DEBoostRegressor() is akin to calling DEBoostRegressor(method='regression', mode='mean', sdhw=True). An example can be found below.

from sklearn.datasets import load_boston
from deboost import DEBoostRegressor

boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.2, random_state=42)
rgr = DEBoostRegressor()
rgr.fit(X_train, y_train)
rgr.predict(X_test)

Users can switch to other built-in distance metrics by changing mode='mean' to any of: 'median','dist_euclid','dist_cosine','dist_jaccard','dist_chebyshev', 'dist_correlation','dist_cityblock','dist_canberra','dist_braycurtis','dist_hamming','dist_battacharyya'. For more details on the distance metrics, refer to the referenced manuscript in the section below. If the user desires the second method of weighted ensemble (higher weights to model predictions with larger spatial/statistical distances), they may invoke sdhw=False instead.

Using Custom Models Instead of Built-ins

The default models available for regression are Ridge, Lasso, Elastic net, AdaBoost Regressor, Gradient Boosting Regressor, Random Forest Regressor, Support Vector Machine Regressor, LightGBM Regressor and XGBoost Regressor. For the classification task, the models are AdaBoost Classifier, Gradient Boosting Classifier, Gaussian Naive Bayes, K-Nearest Neighbors Classifier, Logistic Regression, Random Forest Classifier, Support Vector Machine Classifier, Decision Tree Classifier, LightGBM Classifier and XGBoost Classifier. These models have default parameters.

To use custom models, users must first ensure that they have at least the predict method like models from Scikit-learn. Suppose that the user wants to ensemble two models - Lasso and Ridge for regression, each used alongside GridSearchCV from Scikit-learn. Then they may add them into DEBoostRegressor with the following lines of code:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import GridSearchCV, KFold
from deboost import DEBoostRegressor

boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.2, random_state=42)
model = Ridge()
model2 = Lasso()
cv = KFold(n_splits = 5, shuffle=True, random_state=42)
grid = {'alpha': [0.0003, 0.001, 0.003, 0.01, 0.03]}
grid2 = {'alpha': np.linspace(0.0001, 0.1, 112)}
gs = GridSearchCV(model, grid, n_jobs=-1, cv=cv, verbose=0)
gs2 = GridSearchCV(model2, grid2, n_jobs=-1, cv=cv, verbose=0)
rgr = DEBoostRegressor()
rgr.models = [gs, gs2]
rgr.fit(X_train, y_train)
rgr.predict(X_test)

Alternatively, users can fit gs and gs2 first and then add it to rgr.models, and perform predictions.

License

MIT License

Copyright (c) 2020 Khoong Wei Hao

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Reference

The preprint can be found at https://www.preprints.org/manuscript/202005.0354/v1.

If you will like to make a donation to support us in this open-source project, you may proceed by accessing the donation page in the button below.


Return