Salmons Stores operates a national chain of women’s apparel stores. Five thousan
ID: 3054338 • Letter: S
Question
Salmons Stores operates a national chain of women’s apparel stores. Five thousand copies of an expensive four-color sales catalog have been printed, and each catalog includes a coupon that provides a $50 discount on purchases of $200 or more. Salmons would like to send the catalogs only to customers who have the highest probability of using the coupon. The file Salmons contains data from an earlier promotional campaign. For each of 500 Salmons customers, three variables are tracked: last year’s total spending at Salmons, whether they have a Salmons store credit card, and whether they used the promotional coupon they were sent.
Use logistic regression to classify observations as a promotion-responder or not by using Spending and Card as input variables and Coupon as the output variable.
- What is the classification rule as expressed as a mathematical equation relating the output?
- Is the overall model significant? And which predictor is the most significant
- Derive the confusion matrix and determine the Accuracy of the model?
- Classify those two customers: Paul spent $8,000 at Salmons but does not have the Salmons credit card and Jessica who spent $3000 and has the store credit card?
Customer Spending Card Coupon 1 2291 1 0 2 3215 1 0 3 2135 1 0 4 3924 0 0 5 2528 1 0 6 2473 0 1 7 2384 0 0 8 7076 0 0 9 1182 1 1 10 3345 0 0 11 2140 1 0 12 3255 0 1 13 1512 0 0 14 2148 0 1 15 6737 0 0 16 6486 0 0 17 1307 0 0 18 3470 1 0 19 2936 0 0 20 6404 0 1 21 2229 0 0 22 2933 0 0 23 2118 0 0 24 2050 0 0 25 4998 0 1 26 1394 0 0 27 3993 1 1 28 2059 0 1 29 1677 0 0 30 2229 0 1Explanation / Answer
The following the R code
a<- read.csv("1.csv")
# Checking the structure
str(a)
# Checking NA values
sum(is.na(a))
# No NA values
# Splitting data into train and test
library(caret)
# Splitting train and test in the ratio of 70% and 30% ie 21 and 9 observations
train_rows <- createDataPartition(a$Coupon, p = 0.83,
list = F)
train <- a[train_rows, ]
test <- a[-train_rows, ]
str(train)
# Logistic Regression
LogReg <- glm(Coupon ~., data=train, family=binomial)
summary(LogReg)
# Predicting on train data
prob <- predict(LogReg, type="response")
pred_class <- ifelse(prob > 0.5, 1, 0)
table(train$Coupon,pred_class)
# Generating the confusion metric on train data
conf.mat1 = table(train$Coupon,pred_class)
# Calculating the accuracy of the model on train data
accuracy1 = sum(diag(conf.mat1))/sum(conf.mat1);accuracy1
The following are the results
Call:
glm(formula = Coupon ~ ., family = binomial, data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2882 -0.8412 -0.4756 0.9411 2.0875
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.1031903 1.7115928 -1.813 0.0698 .
Customer 0.1225229 0.0662046 1.851 0.0642 .
Spending 0.0001251 0.0002616 0.478 0.6326
Card -0.6479859 1.3395237 -0.484 0.6286
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31.343 on 24 degrees of freedom
Residual deviance: 26.251 on 21 degrees of freedom
AIC: 34.251
Number of Fisher Scoring iterations: 5
> # Predicting on train data
> prob <- predict(LogReg, type="response")
> pred_class <- ifelse(prob > 0.5, 1, 0)
> table(train$Coupon,pred_class)
pred_class
0 1
0 15 2
1 3 5
> # Generating the confusion metric on train data
> conf.mat1 = table(train$Coupon,pred_class)
> # Calculating the accuracy of the model on train data
> accuracy1 = sum(diag(conf.mat1))/sum(conf.mat1);accuracy1
[1] 0.8
>
We see that the classification equation is given by
Coupon=-3.10 +0.122Customer at 90% level of significance alpha=0.1
The model is significant at 90% level ie alpha=0.1
Only intercept and Customer variable is significant
The confusion matrix is
pred_class
0 1
0 15 2
1 3 5
The Accuracy=0.8 or 80%
As the equatin does not depend on Spendings and Card
there is 50% probablity that the customers might fall into any of the groups
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.