You will need the R data file glen.rda which is a 597 by 3 matrix called glen. Y
ID: 3326040 • Letter: Y
Question
You will need the R data file glen.rda which is a 597 by 3 matrix called glen. You can load this matrix directly into you working R directory using the command
load(url("http://users.stat.umn.edu/~gmeeden/classes/5201/datasets/glen.rda"))
For a recent year this contains information about 597 house sales in two zip codes in St Paul. A row gives y, the sale price of a home in thousands of dollars, x, the amount of taxes paid for the house in thousands of dollars and a zip code identifier.
To answer the following question assume you know both the sales price and tax amount for every house. In each case assume that we are estimating the population mean.
For a population with a y of interest and an auxiliary x which is correlated with y and a design you need to write a program which allows you to compare the behavior of three estimators under repeated sampling from the design. The three estimators are the Horvitz-Thompson (HT) estimator, the HT estimator that simultaneously constrains the weights so that the add to the population size and are calibrated on x, and the estimator which assumes that the design was srs with replacement and is again adjusted so that its new weights sum to the population size and are calibrated on x. For each estimator you need to compute its average value and average absolute error for 500 samples taken using the design.
Apply your function to the population of house sales in glen.rda for three different designs. The designs are pps using x, using x in reverse order, i.e. in R use rev(x) and simple random sampling without replacement. Take the sample size to be n = 30.
Explanation / Answer
i:
As variable is not mentioned, we assume we are doing for sales: I will pose answers for other variable too but we shall do it for one only as the method will exactly be same.
True variance of sample mean: var(ar{X}) = sigma^2/n
We calculate the variance of population and divide it by sample size.
True variance of population: R claculate the variance with N-1 divisor but for population, we need N.
So we correct the variance for true population variance: 18407.81
True variance of sample mean of sample size 60 for sales is 18407.81/60 = 306.7968
Similarly, for tax variable is: 4.927641/60 = 0.08212735
ii:
We shall use PPS sampling: We divide our sample size w=in the ratio 300, 200 and 97
The allcated sample size is: 30.150 20.100 9.748 which should be 30,20,10
True variance for sample mean will be var(rac{1}{N}(n_1ar{Y_1} + n_2ar{Y_2}+n_3ar{Y_3}))
var(rac{1}{N}(n_1ar{Y_1} + n_2ar{Y_2}+n_3ar{Y_3}))\ =rac{1}{N^2}(n_1^2sigma^2/n_1 +n_2^2sigma^2/n_2+n_3^2sigma^2/n_3 )
=sigma^2/N
So the true variance for sales is: 306.7968
Similarly, true variance for tax is : 0.08212735
iii:
The ration estimator : rac{Y_i}{X_i}
The variance of rato estimator is : 334.9781
We use simulations and take the average variance over the replications:
The R-code used here is:
rep=5000
est<-0
for(i in 1:rep){
samp<-sample(N, 60)
data1<-glen[samp,]
RE<-data1$sales/data1$tax
est<-est + var(RE)
}
est/rep
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.