Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Cluster the following eightpoints into three clusters using K means clustering a

ID: 3706376 • Letter: C

Question

Cluster the following eightpoints into three clusters using K means clustering algorithm and use Euclidean distance.

A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8),

A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).

a) Create distance matrix by calculating Euclidean distance between each pair of points. (0.5 mark)

b) Suppose that the initial centers of each cluster are A1, A4 and A7. Run the k-means algorithm for once only and show:

i. The new clusters (i.e. the examples belonging to each cluster) (1 mark)

ii. The centers of the new clusters (0.5 mark)

Show all your work.

Explanation / Answer

Since the language to be used isn't mentioned, I am going to use python numpy. Also we can assume that points are stored in a 8x2 matrix where different rows represent different points and first column represents x-coordinate and second column represents y-coordinate. Lets name the matrix as X. Further explanations are along with the comments in the code:

#code starts here

import numpy as np

#function to calculate the distance matrix
def calculate_dis(X):
   distance = np.zeros(shape=[8,8])
   for i in range(0,8):
       for j in range(0,8):
           sqdis = (X[i][0] - X[j][0])**2 + (X[i][1] - X[j][1])**2
           distance[i][j] += np.sqrt(sqdis)
   return distance

#function to find closest centroid
def findclosest(X,A1,A4,A7):
   a1 = np.empty((0,2), int)
   a4 = np.empty((0,2), int)
   a7 = np.empty((0,2), int)
   print(a7)
   for i in range(0,8):
       #calculate distance of point i from each of the centroids
       disa1 = np.sqrt((X[i][0] - A1[0])**2 + (X[i][1]-A1[1])**2)
       disa4 = np.sqrt((X[i][0] - A4[0])**2 + (X[i][1]-A4[1])**2)
       disa7 = np.sqrt((X[i][0] - A7[0])**2 + (X[i][1]-A7[1])**2)
       if min(disa1,disa4,disa7)==disa1:
           a1 = np.vstack((a1, X[i]))
       elif min(disa1,disa4,disa7)==disa4:
           a4 = np.vstack((a4, X[i]))
       elif min(disa1,disa4,disa7)==disa7:
           a7 = np.vstack((a7, X[i]))
   return a1,a4,a7

#funtion to calculate new centroids
def new_centroid(a):
   num_points = a.shape[0]
   if num_points==0:
       num_points=1
   A = (1/num_points)*np.sum(a,axis=0)
   return A
  
#Intialze the points
X = np.array([[2,10], [2,5], [8,4], [5,8], [7,5], [6,4], [1,2], [4,9]])
distance = calculate_dis(X) #the distance matrix

#Initialze the centroids
A1 = X[0]
A4 = X[3]
A7 = X[6]
#K-means first step finding the closest centroids
#the three clusters will be stored as 3 matrices
#the matrices are named a1, a4 and a7

a1, a4, a7 = findclosest(X,A1,A4,A7)

#After the clusters have been made the centers of the new clusters are calulated
#Let the new centroids be A1new, A4new, A7new

A1new = new_centroid(a1)
A4new = new_centroid(a4)
A7new = new_centroid(a7)
#All the required variables can be printed as desired# your code goes here
print(distance)
print(a1)
print(a4)
print(a7)
print(A1new)
print(A4new)
print(A7new)

#code ends here

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote