Bagging and Boosting Algorithms

Shaily jain

4 min readMay 11, 2021

We are about to discuss pure beauty, stay with me for this.

Bagging algorithms:

Bagging meta-estimator
Random forest

Boosting algorithms:

AdaBoost
GBM(Gradient Boosting)
XGBoost
Light GBM
CatBoost

Bagging meta-estimator

Random subsets are created from the original dataset (Bootstrapping).
The subset of the dataset includes all features.
A user-specified base estimator is fitted on each of these smaller sets.
Predictions from each model are combined to get the final result.

Random Forest

Random subsets are created from the original dataset (bootstrapping).
At each node in the decision tree, only a random set of features are considered to decide the best split.
A decision tree model is fitted on each of the subsets.
The final prediction is calculated by averaging the predictions from all decision trees.

AdaBoost

Initially, all observations in the dataset are given equal weights.
A model is built on a subset of data.
Using this model, predictions are made on the whole dataset.
Errors are calculated by comparing the predictions and actual values.
While creating the next model, higher weights are given to the data points which were predicted incorrectly.
Weights can be determined using the error value. For instance, higher the error more is the weight assigned to the observation.
This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

Gradient Boosting

3 elements

A weak learner to make predictions.
These are like Base models, example can be regression trees. Typically they are kept weak by restricting leaf nodes, split, layers or nodes.
A loss function to be optimized.
Must be differentiable. Example, regression may use a squared error and classification may use logarithmic loss.
An additive model to add weak learners to minimize the loss function.
We have used Gradient Descent to get coefficients of Linear Regression. Here we will use it for same purpose of minimizing the error post prediction. Although the definition of error comes from y_pred= Average of predictions from Base Models(in case of regression) and y_actual.
The only problem here is that since we use base models to be decision trees, knn, etc..we need to adapt gradient descent to functional gradient descent which can solve optimization problems even for non linear relationships.
We add one tree at a time, just like gradient descent gets one step closer to optimization with one observation at each step.

To know more about Gradient Boosting, follow up for my other article. Trust me you won’t regret learning this.

XGBoost

Regularized Gradient Boosting Technique which reduced overfitting and speeds up the algorithm. Very similar to GBM but better. Refer to my full article on this and also its math.

Light GBM

It works like Light..FAST.

Others like XGBoost works level wise, while LightGBM works Leaf wise, Overfitting that might result can be dealt with maximum depth restriction.

CATBoost

Mainly used when many categorical features with high cardinality such that one hot encoding becomes difficult because it exponentially increase the dimensionality making it difficult to work with such dataset.

CatBoost can automatically deal with categorical variables and does not require extensive data preprocessing like other machine learning algorithms