Long story short in this post we are dealing with predictions from a Combination of multiple models.

Common Ensemble Methods:

Voting or Averaging:

Majority Voting :

  • Classify each test observation through each base model
  • Pick the prediction made by majority of models. If none makes majority then ensemble method fails to give a stable prediction.
  • This can be adaptive by giving weights to any specific kind of models counting them multiple time. This is called Weighted Voting.

Averaging : Above voting technique can be replicated for regression by taking average of all the prediction made by all base models. This can also be extended by heavy weights given to any particular kind of models.

Stacking

  • Also known as stacked generalization, is an ensemble method where the base models are combined using another machine learning algorithm.
  • The basic idea is to train machine learning algorithms with training dataset and then generate a new dataset with these base models. Then this new dataset is used as input for the combiner machine learning algorithm.
  • Note base models can be different hence Stacking is heterogenous.

We can understand the process in the following steps

1.We split the data into two parts viz, a training set and test set. The training data is further split into K-folds just like K-fold cross-validation.
2. A base model(e.g k-NN) is fitted on the K-1 parts and predictions are made for the Kth part.
3. This process is iterated until every fold has been predicted.
4. The base model is then fitted on the whole train data set to calculate its performance on the test set.
5. We repeat the last 3 steps for other base models.(e.g SVM, decision tree, neural network etc )
6.Predictions from the train set are used as features for the second level model.
7.Second level model is used to make a prediction on the test set.

The outputs from the base models used as input to the meta-model may be real values in the case of regression, and probability values, probability like values, or class labels in the case of classification.

Blending

Bagging (Bootstrap Aggregating)

  • Creating multiple models using the same algorithm with random sub-samples of the dataset which are drawn from the original dataset randomly with bootstrap sampling method.
  • The second step in bagging is aggregating the generated models. Well known methods, such as voting and averaging, are used for this purpose.
  • In bagging, each sub-samples can be generated independently from each other. So generation and training can be done in parallel.
  • Ex. Random Forest algorithm uses the bagging technique with some differences. Random Forest uses random feature selection, and the base algorithm of it is a decision tree algorithm.

Boosting

  • Convert weak models(substantial Error rate but better than guessing) to strong models.
  • Boosting incrementally builds an ensemble by training each model with the same dataset but where the weights of instances are adjusted according to the error of the last prediction. The main idea is forcing the models to focus on the instances which are hard.
  • Unlike bagging, boosting is a sequential method, and so you can not use parallel operations here.
  • Ex AdaBoost is incremental method using Decision Trees as base model
    The ‘adjusting dataset’ step is different from the one described above and the ‘combining models’ step is calculated by using weighted voting.

I am covering some commonly used Bagging and Boosting techniques in another post. Watch out for them.

Advantages:

  • Higher Predictive Accuracy
  • Lower Variance and Lower Bias
  • Deeper understanding of the data
  • Stable and more robust in predictions
  • Statistical. Training data is small relative to size of space we want to search, averaging reduces risk of choosing the wrong classifier
  • Computational. Individual models get stuck in local optima, averaging can get to better overall prediction
  • Representational. Individual learners may not span large enough space to find real solution, combining can enable search in expanded space

Disadvantages:
Lower Interpretability
Ensembles cost more to create, train, and deploy
Selection of ensemble model and parameters is an art
There is a rick of overfitting due to large number of parameters

Problem Solver, Data Science, Actuarial Science, Knowledge Sharer, Hardcore Googler

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store