Part I 1. What does the ROC curve show? 2. What metric is used to determine simi
ID: 3720285 • Letter: P
Question
Part I 1. What does the ROC curve show? 2. What metric is used to determine similarity. 3. Why is clustering considered an iterative process? 4. 80 people are tested for HIV. 28 test positive. 25 of those have the disease. 4 of those who tested negative have the disease. Fill out the confusion matrix. 5. Discuss the differing effect of Manhattan Distance v. Euclidian distance in a regression problem. 6. How does Weka use a test set to validate a classification model? Part I 1. What does the ROC curve show? 2. What metric is used to determine similarity. 3. Why is clustering considered an iterative process? 4. 80 people are tested for HIV. 28 test positive. 25 of those have the disease. 4 of those who tested negative have the disease. Fill out the confusion matrix. 5. Discuss the differing effect of Manhattan Distance v. Euclidian distance in a regression problem. 6. How does Weka use a test set to validate a classification model?Explanation / Answer
1. In ROC curve usually the true positive rate will be along Y axis and it is plotted as a function of false positive rate (X - Axis). Area under ROC curve says how good the classifier is, more the area better the performance.
2. Usually similarity measures can inverse of distance measures like euclidean or manhattan distance.
3. In clustering we will first fix some random points as cluster centers and we will try to adjust them by calculating the distance from each point from the group. After adjusting we will repeat the same step again. So it is an iterative process.
4. predicted
5. manhattan distance is mod(x1-x2) + mod (y1-y2). So manhattan distance will give better performance for higher dimensional regions because if we use euclidean because of the square term we may just take few axes into consideration and others will get ignored that. This effect won't be too much in lower dimensional spaces.
6. If we give as k fold cross validation then everytime k-1 folds of data is used for training and remaing is used for testing. If we just give train and test split. It will be pick that much percentage of data randomly and use it for testing.
n = 80 predicted No predicted yes Actual No 4 3 Actual Yes 48 25Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.