Dimensionality Reduction

Shaily jain
3 min readMay 14, 2021

Making Scary looking, uncontrollable data to a manageable, small dimensions while retaining properties of original data.

Yes, this is what we are doing here.

Essence is look for columns that add no new information or little new information to what data set says. It might be performed after data cleaning and data scaling and before training a predictive model. Although often it is also done post modelling just for visualization purposes too.

We start with Methods, Most common Techniques, Uses of Dimensionality Reduction

Methods

  1. Feature Selection
    - Missing Value(drop variables with large proportions of missing value)
    - Low Variance Filter(Within column variance low implies less data information)
    - High Correlation(highly correlated variables essentially represent same info)
    - Random Forests(selected variables are those who appear as a split most times when they were candidate in base models)
    - Backward/Forward Feature Elimination/selection(small sets, high time consumption)
    LOOK AT MY ARTICLE
    HERE
  2. Matrix Factorization
    - Factor Analysis(highly correlated variables are grouped together and group is represented by a factor)
    - Principal Component Analysis- Kernel and Probabilistic(Latent Dirichlet Allocation)- (Linear Data, uncorrelated components of data which explain as much data variance as possible)
    - Independent Component Analysis(data into independent components)
    - SVD
  3. Projection Based/Manifold Learning
    - Kohonen Self-Organizing Map (SOM).
    - Sammons Mapping
    - ISOMAP Embedding-(Strongly non linear data)
    - Locally Linear Embedding
    - Modified Locally Linear Embedding
    - UMAP(-short run time as compared to t-sne)
    - Multidimensional Scaling (MDS)
    - t-distributed Stochastic Neighbor Embedding (t-SNE)-(Strongly Non Linear data)
  4. AutoEncoders
  5. Supervised
    LDA, Canonical Correlation Analysis, Partial Least Square
  6. Unsupervised
    Latent Semantic Indexing(LSI)(manifold, pca, ica)

Uses of Dimensionality Reduction

  1. Data Visualization(from n-D graph to 2-D with first few components of new feature set)
  2. In Applied machine learning to simplify a classification or regression dataset in order to better fit a predictive model and deal with curse of Dimensionality
  3. Lessen computation and Training time
  4. Dealing with Multicollinearity of features.
  5. Noise Removal

MY SOCIAL SPACE

Instagram https://www.instagram.com/codatalicious/

LinkedIn https://www.linkedin.com/in/shaily-jain-6a991a143/

Medium https://codatalicious.medium.com/

YouTube https://www.youtube.com/channel/UCKowKGUpXPxarbEHAA7G4MA

--

--

Shaily jain

Problem Solver, Data Science, Actuarial Science, Knowledge Sharer, Hardcore Googler