# Multicollinearity

Have heard it multiple times, but not really sure what it means.

Let’s hear it once again,

Multicollinearity is existence of correlation between independent variables in modelled data.

Problems:

- It can cause inaccuracy in the regression coefficients
- Magnify the standard errors in the regression coefficients and reduce the efficiency of any t-tests.
- It can produce deceiving results and p-values and increase the redundancy of a model, making its predictability inefficient and less reliable.

Sources:

- It can be a result of error in
*data collection*: *Occur as a result of over-defined model or model specification/choice*: Over-defining is existence of more variables than observations, like taking undesirable interaction and main effect of variables.*Can be due to outliers*: The removal of extreme variable values before regression can reduce multicollinearity.

Detection

- Investigating independent variables for correlation in pairwise scatter plots.
*Variance Inflation Factor(VIF)*: A score of 10 or more shows high collinearity.*Eigen values*of*correlation matrix*are close to zero : One should use the condition numbers, as opposed to eigen value’s numerical sizes. Larger the condition numbers, the more the multicollinearity.

Correction:

- Collecting data from appropriate sub population.
- Proper Variable selection by regularization methods

**Resources:**

https://corporatefinanceinstitute.com/resources/knowledge/other/ridge/ ,