Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Generate 12 data samples (x, y) such that x is uniformly distributed in the inte

ID: 3297898 • Letter: G

Question

Generate 12 data samples (x, y) such that x is uniformly distributed in the interval [0, 1], and y is normally distributed y tilde N(0, 0.5). Consider modeling this data as y = f (x) + noise, using polynomials of degree 1, 2 and 6, to estimate unknown f (x). Polynomial fitting using squared loss can be performed using function POLYFIT in MATLAB. Report the fitting error (MSE) for each model, and show the estimated regression model graphically along with the data samples. Briefly discuss whether the small fitting error can be used as a good indicator for small prediction error for such polynomial models.

Explanation / Answer

We can provide solutions using open source statisitcal package R , the complete R snippet is as follows

# generate data random normal r with given mean and sd
y <- rnorm(12,mean=0,sd=0.5)
x<- runif(12,min=0,max=1)

# fit the models
fit.1 <- lm(y~x)
fit.2 <- lm(y~x +x^2)
fit.3 <- lm(y~ x+x^2 +x^6)

# summarise the models
summary(fit.1)
summary(fit.2)
summary(fit.3)

##############################

The summary results are

> summary(fit.1)

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-0.54552 -0.17887 -0.05162 0.16486 0.51864

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2892 0.1989 1.454 0.177
x -0.4539 0.3409 -1.332 0.212

Residual standard error: 0.3396 on 10 degrees of freedom
Multiple R-squared: 0.1506,   Adjusted R-squared: 0.0657
F-statistic: 1.774 on 1 and 10 DF, p-value: 0.2125

> summary(fit.2)

Call:
lm(formula = y ~ x + x^2)

Residuals:
Min 1Q Median 3Q Max
-0.54552 -0.17887 -0.05162 0.16486 0.51864

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2892 0.1989 1.454 0.177
x -0.4539 0.3409 -1.332 0.212

Residual standard error: 0.3396 on 10 degrees of freedom
Multiple R-squared: 0.1506,   Adjusted R-squared: 0.0657
F-statistic: 1.774 on 1 and 10 DF, p-value: 0.2125

> summary(fit.3)

Call:
lm(formula = y ~ x + x^2 + x^6)

Residuals:
Min 1Q Median 3Q Max
-0.54552 -0.17887 -0.05162 0.16486 0.51864

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2892 0.1989 1.454 0.177
x -0.4539 0.3409 -1.332 0.212

Residual standard error: 0.3396 on 10 degrees of freedom
Multiple R-squared: 0.1506,   Adjusted R-squared: 0.0657
F-statistic: 1.774 on 1 and 10 DF, p-value: 0.2125

we see that the p values are not less than 0.05 , hence non of the regression results are statistically signficant