Cluster the following eightpoints into three clusters using K means clustering a
ID: 3706376 • Letter: C
Question
Cluster the following eightpoints into three clusters using K means clustering algorithm and use Euclidean distance.
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8),
A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
a) Create distance matrix by calculating Euclidean distance between each pair of points. (0.5 mark)
b) Suppose that the initial centers of each cluster are A1, A4 and A7. Run the k-means algorithm for once only and show:
i. The new clusters (i.e. the examples belonging to each cluster) (1 mark)
ii. The centers of the new clusters (0.5 mark)
Show all your work.
Explanation / Answer
Since the language to be used isn't mentioned, I am going to use python numpy. Also we can assume that points are stored in a 8x2 matrix where different rows represent different points and first column represents x-coordinate and second column represents y-coordinate. Lets name the matrix as X. Further explanations are along with the comments in the code:
#code starts here
import numpy as np
#function to calculate the distance matrix
def calculate_dis(X):
distance = np.zeros(shape=[8,8])
for i in range(0,8):
for j in range(0,8):
sqdis = (X[i][0] - X[j][0])**2 + (X[i][1] - X[j][1])**2
distance[i][j] += np.sqrt(sqdis)
return distance
#function to find closest centroid
def findclosest(X,A1,A4,A7):
a1 = np.empty((0,2), int)
a4 = np.empty((0,2), int)
a7 = np.empty((0,2), int)
print(a7)
for i in range(0,8):
#calculate distance of point i from each of the centroids
disa1 = np.sqrt((X[i][0] - A1[0])**2 + (X[i][1]-A1[1])**2)
disa4 = np.sqrt((X[i][0] - A4[0])**2 + (X[i][1]-A4[1])**2)
disa7 = np.sqrt((X[i][0] - A7[0])**2 + (X[i][1]-A7[1])**2)
if min(disa1,disa4,disa7)==disa1:
a1 = np.vstack((a1, X[i]))
elif min(disa1,disa4,disa7)==disa4:
a4 = np.vstack((a4, X[i]))
elif min(disa1,disa4,disa7)==disa7:
a7 = np.vstack((a7, X[i]))
return a1,a4,a7
#funtion to calculate new centroids
def new_centroid(a):
num_points = a.shape[0]
if num_points==0:
num_points=1
A = (1/num_points)*np.sum(a,axis=0)
return A
#Intialze the points
X = np.array([[2,10], [2,5], [8,4], [5,8], [7,5], [6,4], [1,2], [4,9]])
distance = calculate_dis(X) #the distance matrix
#Initialze the centroids
A1 = X[0]
A4 = X[3]
A7 = X[6]
#K-means first step finding the closest centroids
#the three clusters will be stored as 3 matrices
#the matrices are named a1, a4 and a7
a1, a4, a7 = findclosest(X,A1,A4,A7)
#After the clusters have been made the centers of the new clusters are calulated
#Let the new centroids be A1new, A4new, A7new
A1new = new_centroid(a1)
A4new = new_centroid(a4)
A7new = new_centroid(a7)
#All the required variables can be printed as desired# your code goes here
print(distance)
print(a1)
print(a4)
print(a7)
print(A1new)
print(A4new)
print(A7new)
#code ends here
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.