Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Using Python 3 Here is the starter code: # -------------------------------------

ID: 3740769 • Letter: U

Question

Using Python 3

Here is the starter code:

# ----------------------------------------------------------
#
# THIS CODE IS INCOMPLETE!
#
# ----------------------------------------------------------
import numpy as np
import numpy.linalg as la
import pandas as pd
import matplotlib.pyplot as plt

# Read in the data files

# Produce a Pandas histogram and plot (fill in appropriately)
plt.figure(0)
plt.title("a pretty histogram...")
plt.figure(1)
plt.title("a pretty plot...")
plt.xlabel("an axis")
plt.ylabel("an axis")

# Construct your A matrices
A_linear = np.zeros(some_shape)
A_quad = np.zeros(some_shape)

# Construct your b's
b = np.zeros(some_shape)

# Solve the least squares problem

# See how well your model (i.e. weights) does on the validate data set

# Plot a bar graph of the false-positives and false-negatives
bar_graph(fp_linear, fn_linear, fp_quad, fn_quad)

Breast Cancer Prediction Using Least Squares For this problem, you will develop models using the least squares method to predict whether a tumor is malignant M (cancerous / deadly) or benign B (non-cancerous / safe). Models similar to these could help doctors determine if a person is at risk for having cancer and consequently detect and treat the cancer earlier A tumor is a mass of abnormal tissue. Malignant and benign tumors have different cell growth characteristics (See the image to the right for an example). Some of the important tumor properties include the radius and the texture among others. X-ray imaging and biopsies (examining a small sample of the tumor under a microscope) can be used to determine these characteristics You will be given a large data set containing hundreds of patients along with properties of their tumors. You will solve least . squares problems with this data. You will then use the generated models to predict whether patients in another set have malignant M or benign B tumors We will be using the Python Data Analysis Library (Pandas) for importing the data and producing visualizations What Information Do I Have? (click to view) Least Squares Theory (dlick to view) Benign tumor Malig What Do I Need To Do? (click to view) INPUT labels: A list of strings which label the features in the data frame. You should include all these in your linear least-squares model subset_labels A list of strings indicating the names of the features which to include in the quadratic model bar_graph(fp_linear, fn_linear, fp_quad, fn_quad) : A function to plot a bar graph of error statistics OUTPUT b A 1-d numpy array that is the right-hand side to your least-squares problem weights_linear : The solution weights vector for your linear model weights quad: The solution weights vector for your quadratic model .A_linear: The matrix (2-d numpy array) for your linear least-squares model .A quad: The matrix (2-d numpy array) for your quadratic least-squares model

Explanation / Answer

import numpy as np
import numpy.linalg as la
import pandas as pd
import matplotlib.pyplot as plt

# Read in the data files
data_train = pd.io.parsers.read_csv("breast-cancer-train.dat", header=None, names=labels)
data_val = pd.io.parsers.read_csv("breast-cancer-validate.dat", header=None, names=labels)

plt.figure(0)
plt.title("Radius Mean Histogram")
data_train["radius (mean)"].hist()
plt.xlabel("Radius")
plt.ylabel("Patient Count")
plt.figure(1)
plt.title("Symmetry Mean Plot")
plt.scatter(np.linspace(0,300,num=300),data_train["symmetry (mean)"])
plt.xlabel("Patient Number")
plt.ylabel("Symmetry")

# Construct your A matrices
A_linear = np.zeros((300, 30))
for i in range(2,len(labels)):
    A_linear[:,i-2] = data_train[labels[i]]

A_quad = np.zeros((300, 14))
for i in range(len(subset_labels)):
    A_quad[:,i] = data_train[subset_labels[i]]
    A_quad[:,i+len(subset_labels)] = (data_train[subset_labels[i]])**2
A_quad[:,8] = data_train[subset_labels[0]]*data_train[subset_labels[1]]
A_quad[:,9] = data_train[subset_labels[0]]*data_train[subset_labels[2]]
A_quad[:,10] = data_train[subset_labels[0]]*data_train[subset_labels[3]]
A_quad[:,11] = data_train[subset_labels[1]]*data_train[subset_labels[2]]
A_quad[:,12] = data_train[subset_labels[1]]*data_train[subset_labels[3]]
A_quad[:,13] = data_train[subset_labels[2]]*data_train[subset_labels[3]]

# Construct your b's
m_b = data_train[labels[1]]
b = np.zeros(300)
for i in range(300):
    if m_b[i]=="M":
        b[i] = 1
    elif m_b[i]=="B":
        b[i] = -1

# Solve the least squares problem
U, sigma, VT = la.svd(A_linear)
Sigma = np.zeros(A_linear.shape)
Sigma[:30,:30] = np.diag(sigma)
Sigma_pinv = np.zeros(A_linear.shape).T
Sigma_pinv[:30,:30] = np.diag(1/sigma[:30])
Sigma_pinv.round(3)
weights_linear = VT.T.dot(Sigma_pinv).dot(U.T).dot(b)

U, sigma, VT = la.svd(A_quad)
Sigma = np.zeros(A_quad.shape)
Sigma[:14,:14] = np.diag(sigma)
Sigma_pinv = np.zeros(A_quad.shape).T
Sigma_pinv[:14,:14] = np.diag(1/sigma[:14])
Sigma_pinv.round(3)
weights_quad = VT.T.dot(Sigma_pinv).dot(U.T).dot(b)
# See how well your model (i.e. weights) does on the validate data set

A_linear2 = np.zeros((260, 30))
for i in range(2,len(labels)):
    A_linear2[:,i-2] = data_val[labels[i]]

A_quad2 = np.zeros((260, 14))
for i in range(len(subset_labels)):
    A_quad2[:,i] = data_val[subset_labels[i]]
    A_quad2[:,i+len(subset_labels)] = (data_val[subset_labels[i]])**2
A_quad2[:,8] = data_val[subset_labels[0]]*data_val[subset_labels[1]]
A_quad2[:,9] = data_val[subset_labels[0]]*data_val[subset_labels[2]]
A_quad2[:,10] = data_val[subset_labels[0]]*data_val[subset_labels[3]]
A_quad2[:,11] = data_val[subset_labels[1]]*data_val[subset_labels[2]]
A_quad2[:,12] = data_val[subset_labels[1]]*data_val[subset_labels[3]]
A_quad2[:,13] = data_val[subset_labels[2]]*data_val[subset_labels[3]]

pl = A_linear2 @ weights_linear
pq = A_quad2 @ weights_quad
m_b = data_val[labels[1]]

fp_linear = 0
fn_linear = 0
fp_quad = 0
fn_quad = 0
for i in range(len(pl)):
    if pl[i]>0 and m_b[i]=="B":
        fp_linear += 1
    elif pl[i]<0 and m_b[i]=="M":
        fn_linear += 1
    if pq[i]>0 and m_b[i]=="B":
        fp_quad += 1
    elif pq[i]<0 and m_b[i]=="M":
        fn_quad += 1

# Plot a bar graph of the false-positives and false-negatives
bar_graph(fp_linear, fn_linear, fp_quad, fn_quad)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote