How to test a regression model for heteroscedasticity and if present, how to correct it?

Absence of heteroscedasticity is one of the assumptions of linear regression, which means that the variance of residuals in the fitted model should not increase as the fitted value increases. We are about to learn how to test for the presence of heteroscedasticity, and if found, what measures can be taken to correct it?

Lets begin by building a simple linear regression model that tries to explain the ‘dist’ from the built-in ‘cars’ dataset. The predictor in this case is the ‘speed’ and the linear model object that is built is tested for heterorscedasticity using ‘bptest()’ function from the ‘lmtest’ package. Along with this we also use the ‘gvlma()’ that tests if the assumptions of linear regression hold your a ‘lm’ object

Build the initial model and test for heteroscedasticity

# initialize the packages)
library(e1071)
library(caret) # for box-cox transformation
library(lmtest) # bptest for testing heteroscedasticity
library(gvlma) # global validation of linear model assumptions

Let’s now build the model and perform bptest on the object created.
lmMod <- lm(dist ~ speed, data=cars) # initial model
bptest(lmMod) # Breusch-Pagan test

        studentized Breusch-Pagan test

data:  lmMod
BP = 3.2149, df = 1, p-value = 0.07297

A p-Value > 0.05 indicates that the null hypothesis(the variance is unchanging in the residual) can be rejected and therefore heterscedasticity exists. This can be confirmed by running a global validation of linear model assumptions (gvlma) on the lm object.

gvlma(lmMod) # validate if assumptions of linear regression holds true.

# Call:
 gvlma(x = lmMod) 

                    Value  p-value                   Decision
Global Stat        15.801 0.003298 Assumptions NOT satisfied!
Skewness            6.528 0.010621 Assumptions NOT satisfied!
Kurtosis            1.661 0.197449    Assumptions acceptable.
Link Function       2.329 0.126998    Assumptions acceptable.
Heteroscedasticity  5.283 0.021530 Assumptions NOT satisfied!

Graphical interpretation

As suspected, the heteroscedasticity condition is NOT satisfied. Now lets plot and see how it looks as well.
par(mfrow=c(2,2)) # 4 charts in 1 panel
plot(lmMod)

Original linear regression model with heteroscedasticity

The graph in top-left of the panel reveals a bit of a ‘pattern’ in the variance and that it is not completely random as suggested from the curvy smoothing line. So definitely, the errors are not so ‘innocent’. They appear know something (if only less) about the dependent variable that the error variable itself can be used as a predictor to model the ‘dist’ variable. In other words, the errors (read residuals) is not completely random and could potentially contain some information (read variance) that could in turn be used to predict the dependent variable. This above method is actually one of the ways that we could use to overcome the problem of heteroscedasticity.

How to correct for presence of heteroscedasticity?

  • By re-building the model with the created error term (lmMod$residuals) as one of the predictors
  • By transforming the dependent variable

Using box-cox transformation

For this example we will proceed to use the second method to make amends to the model and predict the ‘dist’. The transformation we will use by default in this case is the box-cox transformation.
Once the optimal ‘lambda’ value is found, we will use it to transform the dependent variable and build a model using the newly transformed variable as dependent.

distBCMod <- BoxCoxTrans(cars$dist)
print(distBCMod)

Box-Cox Transformation

50 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   26.00   36.00   42.98   56.00  120.00 

Largest/Smallest: 60 
Sample Skewness: 0.759 

Estimated Lambda: 0.5

As seen in the last line, the estimated value of lambda was 0.5. So, how is this lambda helpful? how is the transformation actually applied?. Simple. Each data point in the dependent variable is raised to the power of ‘lambda’, then subtract 1 from that number and finally divide this number by ‘lambda’ itself.

transformed var = (y^lambda – 1)/lambda
But we don’t have to do this manually, it can done by using the ‘BoxCoxTrans’ function from the caret package as you just saw. Now lets append this new transformed variable to cars data and re-build the model.

cars <- cbind(cars, dist_new=predict(distBCMod, cars$dist)) # append the transformed variable to cars
head(cars) # view the top 6 rows

  speed dist  dist_new
1     4    2 0.8284271
2     4   10 4.3245553
3     7    4 2.0000000
4     7   22 7.3808315
5     8   16 6.0000000
6     9   10 4.3245553

Build the new regression model and test for heteroscedasticity

We will use the dist_new variable above to build the regression model.
lmMod_bc <- lm(dist_new ~ speed, data=cars)
bptest(lmMod_bc)

        studentized Breusch-Pagan test

data:  lmMod_bc
BP = 0.011192, df = 1, p-value = 0.9157

gvlma(lmMod_bc) # checking assumptions

Call:
lm(formula = dist_new ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
     0.5541       0.6448  

ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = lmMod_bc) 

                     Value p-value                Decision
Global Stat        3.03592 0.55183 Assumptions acceptable.
Skewness           2.81284 0.09351 Assumptions acceptable.
Kurtosis           0.04915 0.82455 Assumptions acceptable.
Link Function      0.09269 0.76078 Assumptions acceptable.
Heteroscedasticity 0.08124 0.77563 Assumptions acceptable.

Both from the above output as well as the bptest(), we can verify that the heteroscedasticity problem is effectively resolved. Lets verify graphically as well.
plot(lmMod_bc)
Transformed linear regression model - no heteroscedasticity

In the top left plot, the residuals appear to be random with a more straighter smoothing line, which implies that the variance of residuals doesn’t change as the fitted value increases, which implies heteroscedasticity is removed. Case resolved!

NOTE: An important point to note is that, the current model you’ve built is on a transformed predictor variable But since this new variable is on a different scale compared to the original ‘dist’, we need to back-transform this variable by doing the opposite of box-cox transformation i.e. “by multiplying it by lambda, subtracting 1 and then raising it to the power of 1/lambda”. I will leave this as a fun exercise to try this out on your own 🙂

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook
  • arpit gupta

    Hi, how can we reject null hypothesis, our p value is greater than 0.05, i.e. 0.072 https://uploads.disquscdn.com/images/20905ce980e57e3f5efa595fdc640c7e61066bfad62fdf521889d74beaf7d3c2.png

    There is relatively weak evidence against the assumption of constant variance.
    Please describe?

    • Jai

      I think it is typo error, the p-value should be less than 0.5 to reject the null hypothesis.

      • Rahul Bansal

        If p value is less than 0.05 , we accept the null hypothesis. Or we can say we do not have enough evidence to reject null hypothesis.
        Remember, p value is probablity of making error while rejecting null hypothesis. So if p value is high , it means probablity of making error by rejecting null hypothesis is high.

        Here we have defined alpha as 0.05 , it means we have to be 95% confident to accept null hypothesis.
        Depending on the severity of problem, we can choose alpha to some lower value too.

    • Rahul Bansal

      If p value is less than 0.05 , we accept the null hypothesis. Or we can say we do not have enough evidence to reject null hypothesis.
      Remember, p value is probablity of making error while rejecting null hypothesis. So if p value is high , it means probablity of making error by rejecting null hypothesis is high.

      Here we have defined alpha as 0.05 , it means we have to be 95% confident to accept null hypothesis.
      Depending on the severity of problem, we can choose alpha to some lower value too.