# Outlier Detection Methods

WHAT YOU NEED?

DMwR Package
SIMPLEST WAY

Quartile Method

The quartiles of a ranked set of data values are the three points that divide the data set into four equal groups, each group comprising a quarter of the data

The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set

The third quartile (Q3) is the middle value between the median and the highest value of the data set.

Inter Quartile Range(IQR) refers to the difference between third and first quartile

To find the oultier n,

n> Q3+1.5*IQR

or

n<Q1 -1.5*IQR

The same can be done in R using Box Whisker Plots

`boxplot(data,range)`

The bottom of the box is the first quartile and the top is the third quartile.

The length of the whiskers(the …… line) is calculated by the range value that you give.

LOF(LOCAL OUTLIER FACTOR)

Outliers are found based on their local neighbourhoods,more specifically on the local densities.

To calculate the local outlier factor scores

`score=lofactor(data,k)`

k is the number of neighbours

To plot density plots

`plot(density(score)`

To find the  data points with the highest outlier scores(greater the score greater the chance of the data point being an outlier)

For example the top 5 scores

`outliers <- order(outlier.scores, decreasing=T)[1:5]`

WINSORIZATION

It involves replacing the extreme values with the nearest neighbours that are not outliers.