Question S1 - Lack of Fit Test & Polynomial Models To promote safe driving habit
ID: 3359667 • Letter: Q
Question
Question S1 - Lack of Fit Test & Polynomial Models To promote safe driving habits and to better protect its customers, an insurance company offers a discount of between 3% and 12% on renewal insurance course. The following table of data shows the number of customers who have applied for the discount at various discount levels, over a period of 12 months, where Y = number of renewing customers applying for discount (in 100); X discount in %. A quadratic regression model of Y on X is fitted to the data and the results are summarized in the following table ums to customers who have completed a defensive driving Month 9.70 20.50 21.12 20.40 22.98 22.00 36.00 36.10 34.50 44.50 46.50 53.00 10 12 12 12 12 a. Set up the appropriate hypotheses to test the overall significance of the quadratic model. Compute the F-value. Conduct a lack-of-fit test for the quadratic model. b. Suppose that a straight line mode of Y on X is considered, set up the appropriate hypotheses to test the overall significance of the straight-line model. Compute the F-value. Conduct a lack-of-fit test for the straight-line model. c. Calculate the total percent of variation explained by the straight-line model and the quadratic model. d. Set up the appropriate hypotheses to test the significance of the X' term in the quadratic model when the X term is already included in the model. e. Based on the above results, which one is the better model? Give details to support your reasoningExplanation / Answer
Solution:
a. H0: Square of Discount in % (X2) does not affect number of renewing customers applying for discount (Y in 100).
Halt: Square of Discount in % (X2) affects number of renewing customers applying for discount (Y in 100).
Conducting the lack-of-fit test, the result are as follows:
In the above result summary, SS (regression) = sum of squares of regression indicates the sum of squared differences between the predicted and the average values of the depedent variable,
SS (Residual) = sum of squares of residual (error) indicates the sum of squared differences between the predicted and the actual values of the depedent variable
Dividing each SS (Regression) & SS ( Residual) with their corresponding degrees of freedom, k and n-k-1, respectively gives us the MS(Regression) & MS ( Residual). Here, k= no. of independent variables ( 1 in this case) and n = total no. of observations (12 in this case).
Fobserved is the ratio of MS (Regression)/MS (Residual) = 1743.788/14.3366 = 121.63, and
Fcritical = F.05, k, n-k-1 = F0.05, 1, 10 = 4.96
Since, Fobserved > Fcritical, the model is considered to be a good fit. The significance value is also less than 0.05 (desired level of significance) which validates that the null Hypothesis is rejected and we can conclude that "Square of Discount in % (X2) affects number of renewing customers applying for discount (Y)".
b. Analogous to the above solution, we fit the model for X causes Y (insead of X2 causes Y)
H0: Discount in % (X) does not affect number of renewing customers applying for discount (Y in 100).
Halt: Discount in % (X) affects number of renewing customers applying for discount (Y in 100).
Conducting the lack-of-fit test, the result are as follows:
The computations remains as above but the interpretation is as follows:
Since, Fobserved > Fcritical, the model is considered to be a good fit. The significance value is also less than 0.05 (desired level of significance) which validates that the null Hypothesis is rejected and we can conclude that "Discount in % (X) affects number of renewing customers applying for discount (Y)".
c. R Squared value or the co-efficient of determination indicates the proportion of variance in Y explained by the model with respect to the total variance in Y.
thus, R squared = SS (regression)/SS (Total)
For Linear model, R squared = 1698.78/1887.15 (from part b) = 0.900. Thus, the straight line model explains 90% variance in Y.
For Quadratic model, R squared = 1743.788/1887.154 (from part a) = 0.924. Thus, the quadratic model explains 92.4 % variance in Y.
d. Test of significance for the independent variables:
Now , the model is such that both X & X2 are used to explain the variation in Y. Afer, model fitting, following are the co-efficients:
From above, we can say that both X & X2 when fitted together, their individual effects are not significant as the P-values are >>0.05 (desired level of significance).
However, the interpretation would be that for one unit change in X2 and keeping other variables constant, the change (increase) in Y is by 0.216 units.
e. based on above models, the summary for linear and quadratic variables are as follows:
LINEAR
QUADRATIC MODEL:
CONCLUSION:
1. R squared for quadratic model (92.4%) is more than R sqaured of linear model (90%). Thus, quadratic model is preferred.
2. Standard Error for quadratic model (3.79) is less than standard error of linear model (4.34). Thus, quadratic model is preferred.
3. The standard error for slope coefficient of X2 is less than the standard error of X. Thus, quadratic model is preferred, since both coefficients of X & X2 were significant.
FINAL CONCLUSION: Based on above 3 points, Quadratic model outperforms both linear model and Mixed (linear+quadratic) model.
ANOVA df SS MS F observed Significance F F critical Regression 1 1743.788 1743.788 121.631613 6.43461E-07 4.964603 Residual 10 143.3663 14.33663 Total 11 1887.154Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.