# What is Beta Regression and how to apply it with R?

`Beta regression is commonly used to model variables that have values between 0 to 1, typically when the data points of Y variable represent a proportion of individuals from a subset of the total population (assuming that it follows a beta distribution). This often addresses the problem of heteroskedasticity.`

### Some examples of Y variables where beta regression would be appropriate

1. From GasolineYield data: Proportion of crude oil converted to gasoline after distillation and fractionation
2. Proportion of individuals infected with ‘xyz’ when exposed to various levels of artifical preservative agent 898D.

### Example: Gasoline Yield

The example below shows an example implementation of beta regression using the GasolineYield data from betareg package.

```library (betareg) data("GasolineYield", package = "betareg")  # initialize data inputData <- GasolineYield  # plug-in your data here trainingIndex <- c(1:(nrow(inputData)-1))  # create row indices of training data trainingData <- inputData[trainingIndex, ] # training data testData <- inputData[-trainingIndex, ] # test data betaMod <- betareg(yield ~ batch + temp, data = trainingData) # train model. Tune var names. summary (betaMod) # model summary predict (betaMod, testData) # predict on test data (0.19 vs actual 0.18)```

```# summary (betaMod)
Call:
betareg(formula = yield ~ batch + temp, data = GasolineYield)

Standardized weighted residuals 2:
Min      1Q  Median      3Q     Max
-2.8750 -0.8149  0.1601  0.8384  2.0483

Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.1595710  0.1823247 -33.784  < 2e-16 ***
batch1       1.7277289  0.1012294  17.067  < 2e-16 ***
batch2       1.3225969  0.1179020  11.218  < 2e-16 ***
batch3       1.5723099  0.1161045  13.542  < 2e-16 ***
batch4       1.0597141  0.1023598  10.353  < 2e-16 ***
batch5       1.1337518  0.1035232  10.952  < 2e-16 ***
batch6       1.0401618  0.1060365   9.809  < 2e-16 ***
batch7       0.5436922  0.1091275   4.982 6.29e-07 ***
batch8       0.4959007  0.1089257   4.553 5.30e-06 ***
batch9       0.3857930  0.1185933   3.253  0.00114 **
temp         0.0109669  0.0004126  26.577  < 2e-16 ***

Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi)    440.3      110.0   4.002 6.29e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood)
Log-likelihood:  84.8 on 12 Df
Pseudo R-squared: 0.9617
Number of iterations: 51 (BFGS) + 3 (Fisher scoring)```

http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf