Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

a statsTeachR resource Lab 3: Introduction to linear regression Cigarettes and c

ID: 2922717 • Letter: A

Question

a statsTeachR resource Lab 3: Introduction to linear regression Cigarettes and carbon monoxide emissions An abundance of research has been done to assess the direct health impacts of cigarette smoke. Studies have also investigated the effects that different cigarette brands have on the environment based on their chemical make-ups While each chemical in cigarettes are considered hazardous to the smoker's health by the United States Surgeon General, in this lab we will be interested in seeing if there is an association between the amount of chemicals and the amount of carbon monoxide emitted into the environment This lab is due at 5pm on Thursday, October 5th. You should submit your assignment, in the form of a knitted RMarkdown PDF file by uploading it to your personal Google Dive folder that is shared with the TAs and the instructor. While you may collaborate with other students on this assignment, you must write up your own code and answers to the questions. Absolutely no cutting and pasting of any portion of the answers. Please note that to generate the PDF file directly from RStudio, you will need to install La TeX, a PDF typesetting program. This will allow the RMarkdown file to generate the PDF directly. This assignment, like the others, will be worth 50 points. The data (5 points) The data set presented here is taken from the 3rd edition of Statistics for Engineering and the Sciences by Mendenhall and Sincich (1992) and is a subset of the data produced by the Federal Trade Commission. This data was found through the American Statistical Association website, and a fuller description of the data can be found at http://www.amstat.org/publications/jse/datasets/cigarettes.txt. Let's load the data and look at summary of the variables. Be sure to install the package RCurl in order to obtain the data from the internet. cigs read.table ("https://ww.2.amstat. org/publications/jse/datasets/cigarettes.dat.txt") names(cigs)

Explanation / Answer

Exercise 1

Scatter plot is to be used.

Plot using the command -

plot(cigs$CO, cigs$tar)

Check if the plot is linear (a straight line)

To check the reliability of tar in predicting CO, we can check the mean squared error or the R^2 value of the following linear model -

tar_model = lm(CO ~ tar, data = cigs)

str(tar_model)

Check the value of R^2, if it is large enough (> 0.8) , then tar can be considered a reliable predictor of CO.

Exercise 2

Form - linear/non-linear (quadratic, logarithmic, cubic etc.)

Direction - if cor(cigs$CO, cigs$tar) > 0 => positive, if <0, negative

Strength => closer the absolute value of cor(cigs$CO, cigs$tar) , higher the strength