Cubist

"Cubist is a machine learning procedure, known to perform surprisingly well in challenging conditions. It has been tested rigorously for consistency and stability over time, often providing better accuracy."

Here is an excerpt from ‘Cubist’ documentation..

Cubist is a rule–based model that is an extension of Quinlan’s M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is “smoothed” by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simplification.

Step 1: Prepare data and build the model

library(Cubist)
library(mlbench)
data(BostonHousing)
BostonHousing$chas <- as.numeric(BostonHousing$chas) - 1
set.seed(1)
inTrain <- sample(1:nrow(BostonHousing), floor(.8*nrow(BostonHousing)))
trainingPredictors <- BostonHousing[ inTrain, -14]
testPredictors <- BostonHousing[-inTrain, -14]
trainingOutcome <- BostonHousing$medv[ inTrain]
testOutcome <- BostonHousing$medv[-inTrain]
modelTree <- cubist(x = trainingPredictors, y = trainingOutcome)
modelTree
summary (modelTree)

Step 2: Predict on the test data

mtPred <- predict(modelTree, testPredictors)
sqrt(mean((mtPred - testOutcome)^2)) # Test set RMSE: 3.337924
cor(mtPred, testOutcome)^2 # Test set R^2: 85%

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook