You work for a bank as a business data analyst in the credit card risk-modeling
ID: 2757061 • Letter: Y
Question
You work for a bank as a business data analyst in the credit card risk-modeling department. Your bank recently conducted a bold experiment: over a short time interval three years ago, it quietly issued 600 credit cards to everyone who applied, regardless of their credit risk.
After three years, 150, or 25%, of card recipients defaulted – they failed to pay back at least some of the money they owed. However, the bank collected very valuable proprietary data that it can now use to optimize its future card-issuing process.
The bank initially collected six pieces of data about each person.
Age
Years at current employer
Years at current address
Income over the past year
Current credit card debt, and
Current automobile debt
You are first asked to propose a binary classification model for default that uses only data from one or more of the above six inputs, and outputs a single “score.” The relative rank-ordering of scores will determine the model’s effectiveness. For convenience, you are asked to use a scale for your score that has a maximum < 3.5 and a minimum > -3.5.
Initially you are not told what the bank’s best estimate for cost per False Negative (accepted applicant who becomes a defaulting customer) and False Positive (rejected customer who would not have defaulted). Therefore, the best you can do is to design a model that maximizes the Area Under the ROC Curve, or AUC.
You are told that if your model is effective (“high enough” AUC – not defined) and “robust” (not defined, but in general means relatively little change in AUC across multiple sets of available data) that it may be adopted by the bank as a predictive model for default, to determine which future applicants will be issued credit cards.
First Binary Classification Model: You are first given a “training set” of 200 out of the 600 people in the experiment. Design your model on this set. Standardize your data first. You may combine the six inputs by adding them to or subtracting them from each other, taking simple ratios, etc – The only restriction is that your final “score” needs to be scaled so that the maximum is less than 3.5 and the minimum is greater than -3.5, so you can use the Excel “AUC Calculator” provided.
Question 1: What is your model? Give it as a function of the two or more of the six inputs that outputs a single numerical score between -3.5 and 3.5 for each applicant
Question 2: What is your model’s AUC on the Training Set?
Explanation / Answer
Question 1: What is your model? Give it as a function of the two or more of the six inputs that outputs a single numerical score between -3.5 and 3.5 for each applicant
Answer:
logistic regression modeling analysis will use an automatic stepwise procedure, which begins by selecting the strongest candidate predictor, then testing additional candidate predictors, one at a time, for inclusion in the model. At each step, we check to see whether a new candidate predictor will improve the model significantly. We also check to see whether, if the new predictor is included in the model, any other predictors already in the model should stay or be removed. If a newly entered predictor does a better job of explaining loan default behavior, then it is possible for a predictor already in the model to be removed from the model because it no longer uniquely explains enough. This stepwise procedure continues until all the candidate predictors have been thoroughly tested for inclusion and removal. When the analysis is finished, we have the following table that contains various statistics.
Question 2: What is your model’s AUC on the Training Set?
Answer:
I think what you could do is split the training set into train-train and train-test, build your model on train-train, then do the following:
aucAll = GetAUC(myModel, train, train$Happy)
aucTrainTrain = GetAUC(myModel, train-train, train-train$Happy)
aucTrainTest = GetAUC(myModel, train-test, train-test$Happy)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.