Making Scary looking, uncontrollable data to a manageable, small dimensions while retaining properties of original data.

Yes, this is what we are doing here.

Essence is look for columns that add no new information or little new information to what data set says. It might be performed after data cleaning…

One step forward to our goal of knowing everything.

When we learn about mean, median, mode, range, variance, skewness of the data; we are essentially talking about Descriptive Statistics. It is undoubtedly first and best thing you can do when your mail box is hit with data from manager.


We are about to discuss pure beauty, stay with me for this.

Bagging algorithms:

Boosting algorithms:

Bagging meta-estimator

As name says it is Extreme Gradient Boosting.

Having properties:

