Outlier Detection Methods

WHAT YOU NEED?

DMwR Package
SIMPLEST WAY

Quartile Method

The quartiles of a ranked set of data values are the three points that divide the data set into four equal groups, each group comprising a quarter of the data

The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set

The third quartile (Q3) is the middle value between the median and the highest value of the data set.

Inter Quartile Range(IQR) refers to the difference between third and first quartile

To find the oultier n,

n> Q3+1.5*IQR

or

n<Q1 -1.5*IQR

 

The same can be done in R using Box Whisker Plots

boxplot(data,range)

The bottom of the box is the first quartile and the top is the third quartile.

The length of the whiskers(the …… line) is calculated by the range value that you give.

png;base6497e0adc6d3d4e73b

LOF(LOCAL OUTLIER FACTOR)

Outliers are found based on their local neighbourhoods,more specifically on the local densities.

To calculate the local outlier factor scores

score=lofactor(data,k)

k is the number of neighbours

To plot density plots

plot(density(score)

To find the  data points with the highest outlier scores(greater the score greater the chance of the data point being an outlier)

For example the top 5 scores

outliers <- order(outlier.scores, decreasing=T)[1:5]

png;base6424d48a402d68267

WINSORIZATION

It involves replacing the extreme values with the nearest neighbours that are not outliers.

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook