XGBoost- The MATH

3 min readMay 11, 2021

Extreme Gradient Boosting- Nasty Math

XGBoost optimizes a loss function which needs to be prespecified, as below

Now objective is to minimize this by Gradient Descent, means

Here f(i) is the ith tree which is not necessarily a linear function. We need to get everything in terms of w(j) which is score of leaf j.

Let us see how XGBoost is comparatively better than other techniques:

2. Parallel Processing:

3. High Flexibility:

XGBoost allows users to define custom optimization objectives and evaluation criteria adding a whole new dimension to the model.

4. Handling Missing Values:

XGBoost has an in-built routine to handle missing values. Where it assigns all missing values to left or right leaf, based on maximum gain. So each node has missing values going to leaves providing ultimate maximum gain.

5. Tree Pruning:

XGBoost makes splits up to the max_depth specified and then starts pruning the tree backwards and removes splits beyond which there is no positive gain.

6. Built-in Cross-Validation:

XGBoost allows a user to run a cross-validation at each iteration of the boosting process and thus it is easy to get the exact optimum number of boosting iterations in a single run.