What is Beta Regression and how to apply it with R?

Beta regression is commonly used to model variables that have values between 0 to 1, typically when the data points of Y variable represent a proportion of individuals from a subset of the total population (assuming that it follows a beta distribution). This often addresses the problem of heteroskedasticity.

Some examples of Y variables where beta regression would be appropriate

1. From GasolineYield data: Proportion of crude oil converted to gasoline after distillation and fractionation
2. Proportion of individuals infected with ‘xyz’ when exposed to various levels of artifical preservative agent 898D.

Example: Gasoline Yield

The example below shows an example implementation of beta regression using the GasolineYield data from betareg package.

library (betareg)
data("GasolineYield", package = "betareg")  # initialize data
inputData <- GasolineYield  # plug-in your data here
trainingIndex <- c(1:(nrow(inputData)-1))  # create row indices of training data
trainingData <- inputData[trainingIndex, ] # training data
testData <- inputData[-trainingIndex, ] # test data
betaMod <- betareg(yield ~ batch + temp, data = trainingData) # train model. Tune var names.
summary (betaMod) # model summary
predict (betaMod, testData) # predict on test data (0.19 vs actual 0.18)

# summary (betaMod)
Call:
betareg(formula = yield ~ batch + temp, data = GasolineYield)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-2.8750 -0.8149  0.1601  0.8384  2.0483 

Coefficients (mean model with logit link):
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.1595710  0.1823247 -33.784  < 2e-16 ***
batch1       1.7277289  0.1012294  17.067  < 2e-16 ***
batch2       1.3225969  0.1179020  11.218  < 2e-16 ***
batch3       1.5723099  0.1161045  13.542  < 2e-16 ***
batch4       1.0597141  0.1023598  10.353  < 2e-16 ***
batch5       1.1337518  0.1035232  10.952  < 2e-16 ***
batch6       1.0401618  0.1060365   9.809  < 2e-16 ***
batch7       0.5436922  0.1091275   4.982 6.29e-07 ***
batch8       0.4959007  0.1089257   4.553 5.30e-06 ***
batch9       0.3857930  0.1185933   3.253  0.00114 ** 
temp         0.0109669  0.0004126  26.577  < 2e-16 ***

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)    
(phi)    440.3      110.0   4.002 6.29e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood:  84.8 on 12 Df
Pseudo R-squared: 0.9617
Number of iterations: 51 (BFGS) + 3 (Fisher scoring)

http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook