How to use linear and quadratic discriminant analysis for binary classification with R ?

Linear Discriminant Analysis (LDA)

Linear discriminant analysis tries to find a separation line (linear) that best divides (discriminates) the binary classes in your data with maximum accuracy. Initially, for building the LDA model, a training data has to be provided along with the class of each data-point as the response variable (Response). Once the model is built, you can use this model, to test or predict on data where the class of response is unknown.

The following code below shows how to build a LDA model. Prior to using the code, split your dataset into Training and Test Data so that the prediction accuracy of the model can be validated. The lda() function in MASS package comes handy for this.

Build the model in training data

library (MASS) # load the package
#Fit the LDA Model
lda_mod <- lda (Response ~ Pred1 + Pred2, data = trainingData) # note: the response is a binary factor variable

Predict on test data

The model (lda_mod) is now built using the training data. Lets use this to predict on the test data and see how well it has done.

predicted <- predict(lda_mod, testData) # lda_mod is the model and testData is the new data on which LDA model is applied.
names(predicted) # display contents of 'predicted'
# [1] "class" "posterior" "x"
prediction_response <- predicted$class # prediction_response contains the needed predictions

Cross Validation

Use CV = TRUE option in lda() function to generate jack knifed predictions (leave one out predictions)

lda_mod <- lda(Response ~ Pred1 + Pred2, data = trainingData, CV = TRUE) # Jack knife

The Model created above generates the prediction values and posterior probabilities when called.

Create The Confusion matrix

table(prediction_response, test_response) # test_response is the actual binary response variable in testData mean(prediction_response != test_response) # Mis-classification Error

How To Implement QDA ?

The method for performing QDA remains almost the same as in LDA, except that it is called using qda() function.
qda_mod <- qda(Response ~ Pred1 + Pred2, data = trainingData, CV = TRUE)

Linear Ad Quadratic Discriminant Analysis
Linear & Quadratic Discriminant Analysis [image:]

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook