Dimensionality Reduction

3 min readMay 14, 2021

Making Scary looking, uncontrollable data to a manageable, small dimensions while retaining properties of original data.

Yes, this is what we are doing here.

Essence is look for columns that add no new information or little new information to what data set says. It might be performed after data cleaning and data scaling and before training a predictive model. Although often it is also done post modelling just for visualization purposes too.

We start with Methods, Most common Techniques, Uses of Dimensionality Reduction

Methods

Feature Selection
- Missing Value(drop variables with large proportions of missing value)
- Low Variance Filter(Within column variance low implies less data information)
- High Correlation(highly correlated variables essentially represent same info)
- Random Forests(selected variables are those who appear as a split most times when they were candidate in base models)
- Backward/Forward Feature Elimination/selection(small sets, high time consumption)
LOOK AT MY ARTICLE HERE
Matrix Factorization
- Factor Analysis(highly correlated variables are grouped together and group is represented by a factor)
- Principal Component Analysis- Kernel and Probabilistic(Latent Dirichlet Allocation)- (Linear Data, uncorrelated components of data which explain as much data variance as possible)
- Independent Component Analysis(data into independent components)
- SVD
Projection Based/Manifold Learning
- Kohonen Self-Organizing Map (SOM).
- Sammons Mapping
- ISOMAP Embedding-(Strongly non linear data)
- Locally Linear Embedding
- Modified Locally Linear Embedding
- UMAP(-short run time as compared to t-sne)
- Multidimensional Scaling (MDS)
- t-distributed Stochastic Neighbor Embedding (t-SNE)-(Strongly Non Linear data)
AutoEncoders
Supervised
LDA, Canonical Correlation Analysis, Partial Least Square
Unsupervised
Latent Semantic Indexing(LSI)(manifold, pca, ica)

Uses of Dimensionality Reduction

Data Visualization(from n-D graph to 2-D with first few components of new feature set)
In Applied machine learning to simplify a classification or regression dataset in order to better fit a predictive model and deal with curse of Dimensionality
Lessen computation and Training time
Dealing with Multicollinearity of features.
Noise Removal