Linear Methods of Classification

2 min readFeb 12, 2021

Below I have accumulated few Linear methods of classification. If the discriminant function or P(y=K|X = x) is linear in x, then this results in a linear decision boundary, hence linear classification method.

There are two broad classes of methods for determining the parameters of a linear classifier namely Generative and Discriminative models . All of the below linear classifiers can be converted into Non linear algorithms operating on different input space phi(x) using Kernel Trick.

Generative Model — estimate X|y density (Left). Discriminative Model- estimate y density(Right)

Class 1 Generative Models ( Needs P(x|K) and P(K) to determine posterior distribution)- Good for handling missing data

Linear Discriminant Analysis (LDA) — assumes Gaussian Conditional density models with equal covariance (if unequal covariance then this becomes Quadratic Discriminant Analysis
Naive Bayes Classifier with multinomial or multivariate Bernoulli event models

Class 2 Discriminant Models ( Attempts to maximise the quality of the output on training set)-yields higher accuracy

Logistic Regression (Softmax Regression for muticlass classification) — MLE of parameters of regression assuming that the observed training set was generated by binomial model that depends on the output of the classifier.
Perceptron- an algorithm that attempts to fix all errors encountered in the training set
Fisher’s Linear Discriminant Analysis — an algorithm (different than ‘LDA’) that maximises the ratio of betwen class scatter to within class scatter, without any other assumptions. It is in essence a method of dimensionality reduction for binary classification
Support Vector Machine — an algorithm that maximises the margin between the decision hyperplane and the examples in the training set

There is generally an optimisation problem which is solved by a learning algorithm using techniques like Stochastic Gradient Descent, L-BFGS, coordinate descent and Newton methods.

Optimisation Problem

where

w is a vector of classifier parameters,
L(yi, wTxi) is a loss function that measures the discrepancy between the classifier’s prediction and the true output yi for the i’th training example,
R(w) is a regularization function that prevents the parameters from getting too large (causing overfitting), if this is convex then above is a convex problem
C is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function.

Linear Methods of Classification

Written by Shaily jain

No responses yet