3. This is data from R. Weintraub (1962), “The Birth Rate and Economic Developme
ID: 3681653 • Letter: 3
Question
3. This is data from R. Weintraub (1962), “The Birth Rate and Economic Development: An Empirical Study”, Econometrica, Vol. 40, #4, pp 812-817. The data set contains birth rates, per capita income, proportion of population in farming, and infant mortality during early 1950s for 29 nations:
Nation................ Birthrate............. Income........ Farm........... IMR
Mexico................. 45.7.................. 118 ..............0.61............ 87.8
Ecuador.............. 45.3 ...................44................... 0.53............ 115.8
Colombia................ 38.6 .................158................ 0.53........... 106.8
Ceylon ...................37.2.................. 81.................... 0.53........... 71.6
PuertoRico............. 35 .....................374.................. 0.37............ 60.2
Chile........................ 34................... 187................... 0.3............. 118.7
Canada................... 28.3................. 993 ....................0.19 ............33.7
UnitedStates............. 24.7 ...............1723................... 0.12 .............27.2
Argentina................... 24.7............... 287.................... 0.2................. 62
New Zealand............... 24.4............... 970................... 0.19.............. 24.9
Australia....................... 22.7............... 885.................. 0.12............... 22.9
Hungary....................... 22.3................ 200................. 0.53............... 65.7
Netherlands................. 21.7 ................575 ....................0.14 .............21.6
Finland.......................... 21.6................. 688.................. 0.34 ..............32.4
Phillipines........................ 21.3............... 48 ......................0.69........... 108.7
Ireland ............................21.2.................. 572................... 0.49 ............38.6
Japan ................................20.8................ 239 ....................0.42............ 46.7
Spain................................ 20.3 ..................244................. 0.48 .............56.5
France ...............................18.9................... 472 ................0.25 ...............44.4
Greece ...............................18.8 ....................134................ 0.52 .............47.4
Norway ...............................18.6 .....................633................... 0.19 ...........21.7
Italy ......................................18..................... 295 .....................0.44 ...........55.7
Denmark ..............................17.6.................... 906 ....................0.24............ 27.1
Switzerland ...........................17...................... 1045 .................0.16.............. 28.5
Belgium .................................16.7..................... 775...................... 0.1 ............41.6
WestGermany ....................15.9 .....................619.................... 0.15 ..............44.6
England .................................15.3 ......................901.................. 0.05 .........26.1
Sweden.................................. 15 .........................910 ................0.24 ..........18.7
Austria ..................................14.8.......................... 556................. 0.22....... 49.1
(a) Choose the best subset of predictors (Birthrate, Income, Farming and IMR) using the forward selection algorithm. To fit the model select ‘Stat’ ‘Regression’ ‘Regression’ ‘Fit Regression Model’. Select ‘Birthrate’ for the response, and select the predictors for each model in the ‘Continuous predictors’ field. For each step of the algorithm, state which predictors are in the model you are fitting, and include the model summary for each model fit in that step. For example, the first model fit in the first step of the algorithm would be: Income: After completing the algorithm, create a table listing which model you chose for each step and the R2 adj for that model. Based on this table, explain which subset of predictors you should choose based on the forward selection algorithm.
(b) Repeat part (a) using the backwards selection (elimination) algorithm
Explanation / Answer
a)Forward selection
The simplest data-driven model building approach is called forward selection.
In this approach, one adds variables to the model one at a time.
At each step, each variable that is not already in the model is tested for inclusion in the model.
The most significant of these variables is added to the model, so long as it's P-value is below some pre-set level.
It is customary to set this value above the conventional .05 level at say .10 or .15, because of the exploratory nature of this method (see below).
Thus we begin with a model including the variable that is most significant in the initial analysis, and continue adding variables until none of remaining variables are "significant" when added to the model.
Note that this multiple use of hypothesis testing means that the real type I error rate for a variable (i.e. the chance of including it in the model given it isn't really necessary), does not equal the critical level we choose.
In fact, because of the complexity that arises from the complex nature of the procedure, it is essentially impossible to control error rates and this procedure must be viewed as exploratory.
Once we reduce the set of potential predictors to a reasonable number, we can examine all possible models and choose the “best” according to some criterion.
Say we have k predictors x1, . . . , xk and we want to find a good subset of predictors that predict the data well.
There are several useful criteria to help choose a subset of predictors.
Adjusted-R 2
“Regular” R 2 measures how well the model predicts the data that built it.
It is possible to have a model with R 2 = 1 (predicts the data that built it perfectly), but has lousy out-of-sample prediction.
The adjusted R 2 , denoted R 2 a provides a “fix” to R 2 to provide a measure of how good the model will predict data not used to build the model.
For a candidate model with p 1 predictors R 2 a = 1 n 1
n p SSEp SSTO = 1 MSEp s 2 y .
Equivalent to choosing the model with the smallest MSEp. If irrelevant variables are added, R 2 a may decrease unlike “regular” R 2 (R 2 a can be negative!).
R 2 a penalizes model for being too complex.
Problem: R 2 a is greater for a “bigger” model whenever the F-statistic for comparing bigger to smaller is greater than 1. We usually want F-statistics to be a lot bigger than 1 before adding in new predictors too liberal.
21 / 40 AIC Choose model with smallest Akaike Information Criterion (AIC). For normal error model, AIC = n log(SSEp) n log(n) + 2p. n log(SSEp) n log(n) = C 2 log{L(ˆ, ˆ 2 )} from the normal model where C is a constant; we’ll show this on the board. 2p is “penalty” term for adding predictors.
Like R 2 a , AIC favors models with small SSE, but penalizes models with too many variables p. 22 / 40 SBC (or BIC) Models with smaller Schwarz Bayesian Criterion (SBC) are estimated to predict better. SBC is also known as Bayesian Information Criterion: BIC = n log(SSEp) n log(n) + p log(n). BIC is similar to AIC, but for n 8, the BIC “penalty term” is more severe.
Chooses model that “best predicts” the observed data according to asymptotic criteria. 23 / 40 Mallow’s Cp Let F be the full model with all k predictors and R be a reduced model with p 1 predictors to be compared to the full model. Mallows Cp is Cp = SSE(R) MSE(F) n + 2p. Measures the bias in the reduced regression model relative full model having all k candidate predictors. The full model is chosen to provide an unbiased estimate ˆ 2 = MSE(x1, . . . , xk ). Predictors must be in “correct form” and important interactions included. If a reduced model is unbiased, E(Yˆ i) = µi , then E(Cp) = p (pp. 357–359).
The full model always has Cp = k + 1. If Cp p then the reduced model predicts as well as the full model. If Cp < p then the reduced model is estimated to predict better than the full model.
In practice, just choose model with smallest Cp. 24 / 40 Which criteria to use? R 2 a , AIC, BIC, and Cp may given different “best” models, or they may agree. Ultimate goal is to find model that balances: A good fit to the data. Low bias. Parsimony.
All else being equal, the simpler model is often easier to interpret and work with. Christensen (1996) recommends Cp and notes the similarity between Cp and AIC.
b)Variable selection is intended to select the “best” subset of predictors.
1. We want to explain the data in the simplest way — redundant predictors should be removed.
The principle of Occam’s Razor states that among several plausible explanations for a phenomenon, the simplest is best. Applied to regression analysis, this implies that the smallest model that fits the data is best.
2. Unnecessary predictors will add noise to the estimation of other quantities that we are interested in.
Degrees of freedom will be wasted.
3. Collinearity is caused by having too many variables trying to do the same job.
4. Cost: if the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.
Prior to variable selection: 1. Identify outliers and influential points - maybe exclude them at least temporarily.
2. Add in any transformations of the variables that seem appropriate.
Stepwise Procedures Backward Elimination
This is the simplest of all variable selection procedures and can be easily implemented without special software. In situations where there is a complex hierarchy, backward elimination can be run manually while taking account of what variables are eligible for removal.
1. Start with all the predictors in the model
2. Remove the predictor with highest p-value greater than crit
3. Refit the model and goto 2
4. Stop when all p-values are less than crit .
The crit is sometimes called the “p-to-remove” and does not have to be 5%. If prediction performance is the goal, then a 15-20% cut-off may work best, although methods designed more directly for optimal prediction should be preferred.
Forward Selection
This just reverses the backward method.
1. Start with no variables in the model.
2. For all predictors not in the model, check their p-value if they are added to the model. Choose the one with lowest p-value less than crit .
3. Continue until no new predictors can be added.
Apply these steps for the above data you will get the answer.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.