data :https://harlanhappydog.github.io/STAT306/docs/newbie.txt Download the \"ne
ID: 3067235 • Letter: D
Question
data :https://harlanhappydog.github.io/STAT306/docs/newbie.txt
Download the "newbie" data set from the website. It studies the relationship between whether internet users belong to the "Newbie" category (that is those that have been on the Internet for less than a year), and a set of demographic indicators. These demographic indicators include age, gender, household income, sexual preference, education, occupation and marital status. 1500 observations are included in this data set.
Read in this data and use the first 1200 observations to fit a logistic model for response "Newbie" with all the other variables. Then apply this model to the rest of observations and get the predicted probabilities. Classify a case as Newbie if the predicted probability exceeds 0.6 and otherwise classify it as a non-Newbie.
Explanation / Answer
Solution : All solution is performed into R and Studio
newbew <- read.csv('newbew.csv',header = T , stringsAsFactors = T)
View(newbew)
dim(newbew)
str(newbew)
colSums(is.na(newbew))
newbew_pre <- select(newbew, -Newbie)
head(newbew_pre)
#creating dummy variable with caret package
new_dmy <- dummyVars(~.,data = newbew_pre,fullRank = T)
new_trans <- data.frame(predict(new_dmy,newdata =newbew_pre))
View(new_trans)
dim(new_trans)
new_trans$Newbie <- newbew$Newbie
names(new_trans)
#train test split
train <- new_trans[1:1200,]
test <- new_trans[1201:1500,]
#Runnig a logistic model on first 1200 observation
newlr <- glm(Newbie ~ ., data = train, family = binomial)
summary(newlr)
###predicting on the remaining obsevation
bie_pre <- predict(newlr,, type = 'response', newdata = test)
head(bie_pre)
table(test$Newbie,bie_pre>0.6)
Q1: How many parameters are there in logistics model
31 + 1(intercept) = 32
Q2: Misclassification of Hold out set
What is the AIC of this model on training dataset
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.