Linear Methods of Classification

Shaily jain
2 min readFeb 12, 2021

--

Below I have accumulated few Linear methods of classification. If the discriminant function or P(y=K|X = x) is linear in x, then this results in a linear decision boundary, hence linear classification method.

There are two broad classes of methods for determining the parameters of a linear classifier namely Generative and Discriminative models . All of the below linear classifiers can be converted into Non linear algorithms operating on different input space phi(x) using Kernel Trick.

Generative Model — estimate X|y density (Left). Discriminative Model- estimate y density(Right)

Class 1 Generative Models ( Needs P(x|K) and P(K) to determine posterior distribution)- Good for handling missing data

  • Linear Discriminant Analysis (LDA) — assumes Gaussian Conditional density models with equal covariance (if unequal covariance then this becomes Quadratic Discriminant Analysis
  • Naive Bayes Classifier with multinomial or multivariate Bernoulli event models

Class 2 Discriminant Models ( Attempts to maximise the quality of the output on training set)-yields higher accuracy

  • Logistic Regression (Softmax Regression for muticlass classification) — MLE of parameters of regression assuming that the observed training set was generated by binomial model that depends on the output of the classifier.
  • Perceptron- an algorithm that attempts to fix all errors encountered in the training set
  • Fisher’s Linear Discriminant Analysis — an algorithm (different than ‘LDA’) that maximises the ratio of betwen class scatter to within class scatter, without any other assumptions. It is in essence a method of dimensionality reduction for binary classification
  • Support Vector Machine — an algorithm that maximises the margin between the decision hyperplane and the examples in the training set

There is generally an optimisation problem which is solved by a learning algorithm using techniques like Stochastic Gradient Descent, L-BFGS, coordinate descent and Newton methods.

Optimisation Problem

where

  • w is a vector of classifier parameters,
  • L(yi, wTxi) is a loss function that measures the discrepancy between the classifier’s prediction and the true output yi for the i’th training example,
  • R(w) is a regularization function that prevents the parameters from getting too large (causing overfitting), if this is convex then above is a convex problem
  • C is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function.

--

--

Shaily jain
Shaily jain

Written by Shaily jain

Problem Solver, Data Science, Actuarial Science, Knowledge Sharer, Hardcore Googler

No responses yet