Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

2) A large number of insurance records are to be examined to develop a model for

ID: 3849906 • Letter: 2

Question

2) A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% were judged to be fraudulent (class 1). A sample database is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate. When applied to this sample database (total number of records, N = 800), the model ends up correctly classifying 310 frauds, and 270 non-frauds. It misses 90 frauds, and classified 130 records incorrectly as frauds when they were not.{ the sample ratio is 1:99 (fraudulent vs. non-fraudulent, positive vs. negative)} If the positive sample number is fixed (400), Find a) what is the total number of records that should be in the original non-oversampled database? b) what is the total number of negative records that should be in the original non-oversampled database? c) what is the total number of false negative records that should be in the original non-oversampled database? d) what is the total number of true negative records that should be in the original non-oversampled database? e) what is the adjusted misclassification rate (error rate) that should be in the original non-oversampled database? f) what is the adjusted positive response rate that should be in the original non-oversampled database?

Explanation / Answer

total number of records in original non-oversampled database= 100*800 = 8000

total number of true negative records in sample = 310 (correctly identified fraud) + 90 (missed) - 130 (incorrectly as frauds) = 270

total number of negative records in   non-oversampled = 4000

total number of false negative records that should be in the original non-oversampled database = 1300

total number of true negative records that should be in the original non-oversampled database = 2700

misclassification rate (error rate) = ((8000 - (3100+2700) ) / 8000)*100 = 27.5 %

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote