Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Complete the missing parts indicated by # Implement me We expect you to follow a

ID: 3714781 • Letter: C

Question

Complete the missing parts indicated by # Implement me

We expect you to follow a reasonable programming style. While we do not mandate a specific style, we require that your code to be neat, clear, documented/commented and above all consistent. Marks will be deducted if these are not followed.

Some conversion between you and your engineer friend...¶

FriendFriend: Thanks again for solving the XOR problem only using perceptrons! You are really awesome!

YouYou: I know.

FriendFriend: ... Have you ever worked on Iris?

YouYou: Who hasn't?

FriendFriend: ... Can you explain string theory?

YouYou: Sure. Basically it is a unified theory that aims to explain every natural phenomenon. It claims that there are more than three dimensions in reality, most of which are too small to be observed. Actually, the math works pretty well when there are 10 dimensions. Now let us look at the equations...

FriendFriend: Stop! Let us go back to Iris.

YouYou: Sure.

FriendFriend: ... I heard that missing values can cause a lot of troubles.

YouYou: Not necessarily. You can still do things with them.

FriendFriend: Oh really? What if 90% of the class labels are missing? Can you predict them?

YouYou: No.

FriendFriend: Well, you know what? Don't feel bad about yourself. It is ok. You don't have to know everything.

YouYou: What I am saying is, I do not even need 10% labels. Remove all the labels if you want and leave only three unique ones. I can give you over 70% predict accuracy.

FriendFriend: Only three labels? Are you serious?

YouYou: Positive. The only problem is, I am a data scientist. I do not create missing values on purpose. You do that. Then I will go from there.

Step 1: Your friend diligently removed labels¶

In the end, y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'

import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

def your_friends_diligent_work():

    """

    :return: X_train, y_train, X_test, y_test

    """

   

    # Read Iris

    df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',

                     header=None,

                     names=['sepal length', 'sepal width', 'petal length', 'petal width', 'target'])

    # Get the features and target

    X, y = df[['sepal length', 'sepal width', 'petal length', 'petal width']], df['target']

    # Encode the target

    le = LabelEncoder()

    y = le.fit_transform(y)

    # Divide the data into training and testing data

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)

    # Standardize the features

    scaler = StandardScaler()

    X_train = scaler.fit_transform(X_train)

    X_test = scaler.transform(X_test)

    # Get the index of rows containing the three unique labels

    yu, idxs = np.unique(y_train, return_index=True)

   

    # Remove labels

    for i in range(len(y_train)):

        if i not in idxs:

            y_train[i] = -1

               

    return [X_train, X_test, y_train, y_test]

Step 2: Your first effort¶

You did not use any other packages

The prediction accuracy on y_test is over 75%

from sklearn.metrics import precision_recall_fscore_support

from sklearn.cluster import KMeans

# Get the training and testing data

# y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'

X_train, X_test, y_train, y_test = your_friends_diligent_work()

# The KMeans classifier

km = KMeans(n_clusters=3, random_state=0)

# Implement me

   

print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))

output:

(0.75555555555555554, 0.75555555555555554, 0.75555555555555554, None)

Step 3: You went an extra mile¶

You did not use any other packages

You did use some results in Step 2

You tried something mentioned in Amir's talk in ML I

The prediction accuracy on y_test is improved

from sklearn.neural_network import MLPClassifier

# The MLP classifier

mlp = MLPClassifier(random_state=0)

# Implement me

print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))

output:

(0.77777777777777779, 0.77777777777777779, 0.77777777777777779, None)

Explanation / Answer

The solution is given below. However, the output is not exactly the same. The output being generated is better. Where needed, appropriate comments are provided.

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def your_friends_diligent_work():
    """
        :return: X_train, y_train, X_test, y_test
    """
    # Read Iris
    df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
                     header=None,names=['sepal length', 'sepal width', 'petal length', 'petal width', 'target'])
    # Get the features and target
    X, y = df[['sepal length', 'sepal width', 'petal length', 'petal width']], df['target']
    # Encode the target
    le = LabelEncoder()
    y = le.fit_transform(y)
    # Divide the data into training and testing data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)
    # Standardize the features
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
   # Get the index of rows containing the three unique labels
    yu, idxs = np.unique(y_train, return_index=True)

    # Remove labels
    for i in range(len(y_train)):
        if i not in idxs:
            y_train[i] = -1    
    return [X_train, X_test, y_train, y_test]


#Step 2: Your first effort
#You did not use any other packages
#The prediction accuracy on y_test is over 75%
from sklearn.metrics import precision_recall_fscore_support
from sklearn.cluster import KMeans

# Get the training and testing data
# y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'
X_train, X_test, y_train, y_test = your_friends_diligent_work()

# The KMeans classifier
km = KMeans(n_clusters=3, random_state=0)
# Implement me
X_useful = X_train[y_train!=-1]
y_useful = y_train[y_train!=-1]
km.fit(X_useful)
y_test_pred = km.predict(X_test)
#the following is to reorder the labes as kmeans is unsupervised learning; the labels chosen may not match with that given in y_test
for i in range(len(y_test_pred)):
   if y_test_pred[i]==0:
       y_test_pred[i] = 2
   elif y_test_pred[i]==1:
       y_test_pred[i] = 0
   elif y_test_pred[i]==2:
       y_test_pred[i] = 1

print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))
#output:
#(0.75555555555555554, 0.75555555555555554, 0.75555555555555554, None)

#Step 3: You went an extra mile
#You did not use any other packages
#You did use some results in Step 2
#You tried something mentioned in Amir's talk in ML I
#The prediction accuracy on y_test is improved
from sklearn.neural_network import MLPClassifier

# The MLP classifier
mlp = MLPClassifier(random_state=0)

# Implement me
mlp.fit(X_useful,y_useful)
y_test_pred = mlp.predict(X_test)
print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))
#output:
#(0.77777777777777779, 0.77777777777777779, 0.77777777777777779, None)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote