"Breakout events in a time series can reveal unusual activities that has happened in the past as well as possible forthcoming level shifts and other anomalous behavior in the near future. It helps understand the time series better and can probably tell you where to look, while revealing valuable insights that you have so far been overlooking . ."

### Installation and setup

For this analysis, we are going to use 3 packages that offer facilities to detect breakouts:

- AnamolyDetection (available at twitter’s github page)
- changepoint
- strucchange

While *AnamolyDetection* package has its own mechanism to plot the graphs, we will use the *‘autoplot’* function in *ggplot2* along with the *ggfortify* package that enables *autoplot* to draw time series graphs. The data we use for this analysis is the australian air passengers data between the years 1970 and 2009, which is available in the *‘ausair’* timeseries in the *‘fpp’* package.

`devtools::install_github("twitter/AnomalyDetection") `

*# install twitter's AnomalyDetection*

devtools::install_github("sinhrks/ggfortify")

library(AnomalyDetection)

library(changepoint)

library(strucchange)

library(ggplot2)

library(fpp) *# for 'ausair' data*

library(ggfortify) *# enable timeseries in autoplot*

### Anamoly Detection

`myTS <- ausair `

*# initialise data*

myPeriod <- "year" *# set the period*

ymth <- paste(start(myTS), collapse="/")

startDate <- as.Date(paste(ymth, "1", sep="/"), format="%Y/%m/%d") *# start date*

eymth <- paste(end(myTS), collapse="/")

endDate <- as.Date(paste(eymth, "1", sep="/"), format="%Y/%m/%d") *# end date*

Dates <- seq.Date(startDate, endDate, by=myPeriod) *# create the dates*

Dates <- ymd(Dates) *# convert to POSIXct*

myData <- data.frame(Dates, myTS) *# cast as a data.frame*

AnomalyDetectionTs(myData, max_anoms = 0.2, direction='both', plot=TRUE) *# perform anamoly detection and plot*

What happened in the code above? Our aim is to prepare data in the format required by *AnomalyDetectionTs* function. It takes in as its first argument, a dataframe that has time stamps in the first columns and the actual values of the time series in the second column. Since the original ‘ausair’ data is available to us as a time series, we do the above steps to convert it the the required dataframe format (1st column contains time stamps, while 2nd contains the data values) before applying the *AnomalyDetection* function on the data frame.

The key arguments in the *AnomalyDetection* are the *‘max_anoms’* that takes the percentage of datapoints that can be considered as a breakout point and the *‘direction’* (pos/neg/both) where the anomalies need to be discovered.

Upon applying the function the result throws out a time series graph that highlights the breakpoints and a $anoms attribute that shows a set of breakpoint events. We may infer that the focus of these breakpoints are on the future events because, the points that are marked are typically are those that lie at a rising points in the time series where a breakout seem to have initiated a different level for the series. My guess is, had we had a time series that had low lying points, that leads to sharper movement in negative direction, those would be marked as breakouts as well.

$anoms timestamp anoms 1 2002-01-01 39.02158 2 2003-01-01 41.38643 3 2004-01-01 41.59655 4 2005-01-01 44.65732 5 2006-01-01 46.95177 6 2007-01-01 48.72884 7 2008-01-01 51.48843 8 2009-01-01 50.02697

### Changepoint detection

`autoplot(cpt.meanvar(myTS), size=1.5, colour="firebrick") +`

labs(x="Date", y="Total Annual Air Passengers", title="AusAir - Changepoint") + *# add labels*

theme(plot.title = element_text(size=20, face="bold", vjust=2), *# style the axis and title text*

axis.title.x=element_text(size=15, vjust=0.5),

axis.title.y=element_text(size=15, vjust=0.5),

axis.text.x=element_text(size=15, vjust=0.5),

axis.text.y=element_text(size=15, vjust=0.5),

plot.margin=unit(c(10,10,0,0),"mm")) *# adjust plot margin*

In the above code we use the *‘cpt.meanvar’* function to detect the change points, which are essentially points that cause anti-patterns that make it stand out from the from the rest of the series. The main engine of the above code is the first line containing the ‘autoplot’. The rest from second line on wards is meant for styling the graph.

### Strucchange

`bpts <- breakpoints(myTS ~ 1)# get the breakpoints`

autoplot(bpts, ts.colour="firebrick", size=1.5, cpt.linetype="solid") +

labs(x="Date", y="Total Annual Air Passengers", title="AusAir - Strucchange") +

theme(plot.title = element_text(size=20, face="bold", vjust=2),

axis.title.x=element_text(size=15, vjust=0.5),

axis.title.y=element_text(size=15, vjust=0.5),

axis.text.x=element_text(size=15, vjust=0.5),

axis.text.y=element_text(size=15, vjust=0.5),

plot.margin=unit(c(10,10,0,0),"mm"))

The breakpoints funcition uses a linear regression based approach to compute the breaks. It tries to partition that time series into segments. The algorithm for computing the optimal breakpoints given the number of breaks is based on a dynamic programming approach using the Bellman principle. The main computational effort is to compute a triangular RSS matrix, which gives the residual sum of squares for a segment starting at observation i and ending at i’ with i < i’. Breakpoints are the number of observations that are the last in one segment.

Optimal 5-segment partition: Call: breakpoints.formula(formula = myTS ~ 1) Breakpoints at observation number: 9 21 27 33 Corresponding to breakdates: 1978 1990 1996 2002

The three methods discussed here approach breakouts in much different ways. It is up to the investigator to decide which method to use based on your problem’s specific objectives.