Question 1 Create a multiple regression forecasting model that includes a trend
ID: 3182079 • Letter: Q
Question
Question 1
Create a multiple regression forecasting model that includes a trend component and daily seasonality dummy variables. Use Saturday as your base day. Using this estimated regression equation, make a forecast for all 28 days for which you were given data. What is the RMSE over those forecasts? (Report to 3 digits to the right of the decimal.)
Day Week Sales Volume (US$ 000s) Sun 1 9 Mon 1 17 Tue 1 15 Wed 1 19 Thu 1 14 Fri 1 16 Sat 1 8 Sun 2 13 Mon 2 19 Tue 2 15 Wed 2 22 Thu 2 15 Fri 2 20 Sat 2 13 Sun 3 14 Mon 3 20 Tue 3 18 Wed 3 23 Thu 3 19 Fri 3 23 Sat 3 14 Sun 4 18 Mon 4 21 Tue 4 23 Wed 4 24 Thu 4 19 Fri 4 23 Sat 4 15Explanation / Answer
we shall analyse this in the open source statisitical package R
The complete R snippet is as follows
###
# read the data into R dataframe
data.df<- read.csv("C:\Users\586645\Downloads\Chegg\salesday.csv",header=TRUE)
str(data.df)
# create the dummy variables
data.df$sun<- ifelse(data.df$Day=="Sun",1,0)
data.df$mon<- ifelse(data.df$Day=="Mon",1,0)
data.df$tue<- ifelse(data.df$Day=="Tue",1,0)
data.df$wed<- ifelse(data.df$Day=="Wed",1,0)
data.df$thu<- ifelse(data.df$Day=="Thu",1,0)
data.df$fri<- ifelse(data.df$Day=="Fri",1,0)
# drop the day variable
data.df <- data.df[,-1]
#perform regression
fit <- lm(Sales~., data=data.df)
# summary
summary(fit)
# create the new data set
newdata<- data.df[,-2]
# use the predict function to get the predicted sales values
p<- predict(fit,newdata = newdata)
# rmse is calculated as
sqrt( mean( (data.df$Sales-p)^2 , na.rm = TRUE ) )
###
the results are
> summary(fit)
Call:
lm(formula = Sales ~ ., data = data.df)
Residuals:
Min 1Q Median 3Q Max
-1.68571 -0.84643 0.06429 0.84643 2.05714
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.1786 0.8107 8.855 2.35e-08 ***
Week 2.1286 0.2093 10.169 2.39e-09 ***
sun 1.0000 0.8757 1.142 0.267
mon 6.7500 0.8757 7.708 2.06e-07 ***
tue 5.2500 0.8757 5.995 7.32e-06 ***
wed 9.5000 0.8757 10.849 7.90e-10 ***
thu 4.2500 0.8757 4.853 9.63e-05 ***
fri 8.0000 0.8757 9.136 1.41e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.238 on 20 degrees of freedom
Multiple R-squared: 0.9368, Adjusted R-squared: 0.9146
F-statistic: 42.32 on 7 and 20 DF, p-value: 1.252e-10
the rmse is calculated as
> sqrt( mean( (data.df$Sales-p)^2 , na.rm = TRUE ) )
[1] 1.046617
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.