### Linear Discriminant Analysis (LDA)

Linear discriminant analysis tries to find a separation line (linear) that best divides (discriminates) the binary classes in your data with maximum accuracy. Initially, for building the LDA model, a training data has to be provided along with the class of each data-point as the response variable *(Response)*. Once the model is built, you can use this model, to test or predict on data where the class of response is unknown.

The following code below shows how to build a LDA model. Prior to using the code, split your dataset into Training and Test Data so that the prediction accuracy of the model can be validated. The lda() function in MASS package comes handy for this.

### Build the model in training data

`library (MASS) `

*# load the package*

*#Fit the LDA Model*

lda_mod <- lda (Response ~ Pred1 + Pred2, data = trainingData) *# note: the response is a binary factor variable*

### Predict on test data

The model (lda_mod) is now built using the training data. Lets use this to predict on the test data and see how well it has done.

`predicted <- predict(lda_mod, testData) `

*# lda_mod is the model and testData is the new data on which LDA model is applied. *

names(predicted) *# display contents of 'predicted' *

# [1] "class" "posterior" "x"

prediction_response <- predicted$class *# prediction_response contains the needed predictions*

### Cross Validation

Use CV = TRUE option in lda() function to generate jack knifed predictions (leave one out predictions)

`lda_mod <- lda(Response ~ Pred1 + Pred2, data = trainingData, CV = TRUE) `

*# Jack knife *

The Model created above generates the prediction values and posterior probabilities when called.

### Create The Confusion matrix

`table(prediction_response, test_response) `

*# test_response is the actual binary response variable in testData* mean(prediction_response != test_response) *# Mis-classification Error*

### How To Implement QDA ?

The method for performing QDA remains almost the same as in LDA, except that it is called using qda() function.

`qda_mod <- qda(Response ~ Pred1 + Pred2, data = trainingData, CV = TRUE)`