Dirichlet Regression

"Dirichlet regression can be used for modelling 'compositional data', when the dependent-Y variable is practically a sum total of contribution from multiple Y components."

Dirichlet regression can be used to predict the ratio’s in which the sum total X (demand/forecast/estimate) can be distributed among the component Y’s. It is practically a case where there are multiple dependent ‘Y’ variables and one predictor ‘X’ variable, whose sum is distributed among the Y’s .

Few possible real-world examples could be as follows:

1. The dependent Y variable to be predicted - Total demand of a product of a multi-national organization is actually a sum of demand of the product from multiple factories  of the organization. We are interested in both the total demand as well as the factory wise split. 

2. The demand of a product is actually the sum total of demand of 4 different variants of the same product.

In either case, the dependent Y variables, which are the contributions from each component, should be converted to fractions summing up to 1. It is the job of DirichReg() to predict these fractions when the sum total (X) is known.

The code shown below can model, predict and visualize multiple Y Variables

Step 1: Prepare the data

Prepare the test and training samples. Make the diririchlet Reg data on Y’s.

library (DirichletReg)
inputData <- ArcticLake  # plug-in your data here.
train <- sample (1:nrow (inputData), round (0.7*nrow (inputData)))  # 70% training sample
inputData_train <- inputData [train, ] # training Data
inputData_test <- inputData [-train, ] # test Data
inputData$Y <- DR_data (inputData[,1:3])  # prepare the Y's
inputData_train$Y <- DR_data (inputData_train[,1:3])
inputData_test$Y <- DR_data (inputData_test[,1:3])

Step 2: Train the model

# Train the model. Modify the predictors as such.
res1 <- DirichReg(Y ~ depth + I(depth^2), inputData_train)  # modify the predictors and input data here
res2 <- DirichReg(Y ~ depth + I(depth^2) | depth, inputData_train, model="alternative")

Step 3: Fit the training data and forecast

# Predict On Training Data: Fitted Values
predict(res1) # Model 1 fit
predict(res2) # Model 2 fit
resid(res1) # Residuals
# Predict On Test Data or Forecast
predicted_res1 <- predict(res1, inputData_test) # Model 1
predicted_res2 <- predict(res2, inputData_test) # Model 2

Step 4: Visualize results

# Plot
plot(DR_data(predicted_res2)) # plot test Data on model 2
plot(DR_data(inputData_test$Y)) # plot actual test Data
# additional plots

Dirichlet Plot in R

A Dirichlet Plot

Review Date
Author Rating

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook