here is the link of the database: http://www.siam.org/books/ot108 2. Do Problem
ID: 3195485 • Letter: H
Question
here is the link of the database:
http://www.siam.org/books/ot108
2. Do Problem 7.2 of Section 1.7.2 in the text. This problem requires a classifier of breast cancer patients. We use Wisconsin Breast Cancer Database (WDBC) made publicly available by Wolberg Street and Mangasarian of the University of Wisconsin. A link to the database is made available on the webpage for this book, http://www.siam org/books/ot108, There are two files: wdbc.data and wdbe.names. The file wdbc.names gives more details about the data, and you should read it to understand the context. The file wdbc.data gives 569 data vectors. Each data vector (in row form) has 32 components. The first component is the patient number, and the second is either "M" or "B" depending whether the dat is malignant or benign. You may manually change the entries-N1" to "+l" and "B" to ".1". These entries are the indicators y. Elements 3 to 32 of each row i represent the observed values of the 30 features for the patient corresponding to that row i a. Use the first 500 data vectors for the train set of the SVM. Use MATLAB as the modeling language to formulate the problem as the non-separable case with the penalty parameter C 1000, and use fimincon as the optimizer. Solve the problem and display the separating hyperplane classifier. Determine whether the data is indeed separable Use the classifier that you have developed to predict whether the remaining 69 patients have cancer. Compare your prediction to the actual patients' medical status. Evaluate the accuracy (proportion of correct predictions), the sensitivity (proportion of positive diagnoses for patient with caneer) and the specificity (proportion of negative diagnoses for patients without cancer) b.Explanation / Answer
Using fmincon, an optimization approach can be obtained for the support vectors that we have desined for this binary classification.
The code for matlab is --
%% MATLAB SVM - CHEGG
% Siddy
%--------------------------
clear all
close all
data = readtable('wdbc.txt');
data.Properties.VariableNames{1} = 'id';
data.Properties.VariableNames{2} = 'diag';
N = 500;
X = table2array(data(:,3:end));
Y = table2array(data(:,2));
X_train = X(1:N,:);
for i = 1:length(Y)
if Y{i} == 'M'
Y{i} = 1;
else
Y{i} = 0;
end
end
Y = cell2mat(Y);
Y_train = Y(1:N);
%
a0 = eps * ones(N,1);
C = 1000;
a = fmincon('qfun',a0,[],[],Y_train',0,zeros(N,1),C*ones(N,1), [],[],X_train, Y_train);
wo = X_train'*(a.*Y_train);
bo = sum(diag(a)*(X_train*wo-Y_train))/sum([a > 10*eps]);
Here the qfunction is given by --
function y = qfun(a,X,d)
[N,m] = size(X);
y = -ones(1,N)*a+0.5*a'*diag(d)*(X*X')*diag(d)*a;
y = -y;
A short link to the perfect explaination for how the optimization approach works can be obtained from the SVM optimization handout by Yu Hen Hu. The data is already downloaded and saved as 'wdbc.txt'.
The basic steps to solving this problem is to import data, formulate an optimization function, which is simply the inner product (<w, w>) of the training matrix and can be expressed as X^T*X to ibtain a singular value. Then this function can be used as an argument in the fmincon optimization equation from the optimization toolbox in MATLAB and thus, the parameters of the hyper-plane can be calculated.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.