"Cubist is a machine learning procedure, known to perform surprisingly well in challenging conditions. It has been tested rigorously for consistency and stability over time, often providing better accuracy."

Here is an excerpt from ‘Cubist’ documentation..

Cubist is a rule–based model that is an extension of Quinlan’s M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is “smoothed” by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simplification.

#### Step 1: Prepare data and build the model

`library(Cubist)`

library(mlbench)

data(BostonHousing)

BostonHousing$chas <- as.numeric(BostonHousing$chas) - 1

set.seed(1)

inTrain <- sample(1:nrow(BostonHousing), floor(.8*nrow(BostonHousing)))

trainingPredictors <- BostonHousing[ inTrain, -14]

testPredictors <- BostonHousing[-inTrain, -14]

trainingOutcome <- BostonHousing$medv[ inTrain]

testOutcome <- BostonHousing$medv[-inTrain]

modelTree <- cubist(x = trainingPredictors, y = trainingOutcome)

modelTree

summary (modelTree)

#### Step 2: Predict on the test data

`mtPred <- predict(modelTree, testPredictors)`

sqrt(mean((mtPred - testOutcome)^2)) # Test set RMSE: 3.337924

cor(mtPred, testOutcome)^2 # Test set R^2: 85%