Discriminant Analysis- Linear and Gaussian
Now Logistic Regression and Multinomial Regression are called Discriminant learning algorithms which learn p(y|x) directly.
Naive bayes and linear/quadratic discriminant analysis are called Generative learning algorithms that try to model p(x|y) and p(y). They use Bayes rule to derive p(y|x).
Discriminant Analysis
Discriminant analysis seeks to model the distribution of X in each of the classes separately. Bayes theorem is used to flip the conditional probabilities to obtain P(Y|X). The approach can use a variety of distributions for each class. The techniques discussed will focus on normal distributions
Linear Discriminant Analysis:
With linear discriminant analysis, there is an assumption that the covariance matrices Σ are the same for all response groups.
For p(no. of independent variables)= 1:
Recall the pdf for the Gaussian distribution:
Then
where πk=P(Y=k).
Simplify by taking logs and simplifying
Since the objective is to maximize, remove all constants (terms that do not depend on kk) to obtain the discriminant score
Assign xx to the class with the largest discriminant score.
For p > 1:
The pdf for the multivariate Gaussian distribution:
The discriminant function is
This method assumes that the covariance matrix Σ is the same for each class.
Estimate the model parameters using the training data.
Compute the class posterior probabilities with the discriminant function
Decision boundaries are pictorially represented like
Note: That the decision function
in case of two cases, when subtracted for two classes give
gives decision boundary when w.x +a = 0 .[The effect of “w · x + a” is to scale and translate the logistic fn in x-space. It’s a linear transformation.]
The translation of posterior probabilities to logistic/ sigmoid function is
Quadratic Discriminant Analysis
In quadratic discriminant analysis, do not make the assumption that the covariance matrix Σk is the same for each class.
This changes the discriminant function to
The decision boundaries are pictorially represented as
Please Notes we might get coefficients are an output while implementing LDA, which simply means that coeff*variable value for each observation gives the decision rule, if there are only two class labels then,?
Please check NAIVE BAYES for generative algorithm for classification
Logistic Regression vs. Discriminant Analysis vs. Naive Bayes
Best to use Logistic Regression:
- More robust to deviations from modeling assumptions (non-Gaussian features)
Best to use Discriminant Analysis:
- When the assumption that the features are Gaussian can be made
- More efficient than logistic regression when the assumptions are correct
- Works better than logistic regression when data is well-separated
- Popular for multinomial responses since it provides a low-dimensional view of data
Best to use Naive Bayes:
- Can make the assumption that features are independent (conditional on response)
- Despite strong assumptions, works well on many problems
Resources:
https://people.eecs.berkeley.edu/~jrs/189/lec/07.pdf
http://jennguyen1.github.io/nhuyhoa/statistics/Discriminant-Analysis-Naive-Bayes.html