Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

please do the k means cluster for the gpa using an array of size 20 Recoin mende

ID: 3697354 • Letter: P

Question

please do the k means cluster for the gpa using an array of size 20

Recoin mender Systems are systems that offer a set of predictions, suggestions or opinions to help users in assessing and choosing items. These systems have become very common in the last few years and applied in several applications, such as in suggesting and predicting the most important books, movies, social tags, products, news and articles for various types of users. Registering the right course for a student is a critical issue, the student should consider the prerequisite of the course as well as his GPA and his academic year. Thus, building a recommender system that would help in decision making in the course registration process and predicting the expected grade. This can be performed using various techniques and methodologies, where the most important one is data mining. Data mining is a process of looking for specific patterns and knowledge from large databases and carrying out predictions for outputs. K-means clustering is a simple data mining technique, which used for cluster analysis to divide observations into k clusters, where each observation belongs to that cluster with the closest mean. Therefore, this project aims to build a recommender system based on clustering technique for predicting appropriate course to a student and the expected grade depend on his (CiPA, academic year, and passed courses) for Computer Department Student GPA, Student Academic Year, Current Semester Courses The number of credits (Courses) to be registered (recommend) List of courses and the expected grade for each course

Explanation / Answer

The k-means algorithm is quite simple. But as you’ll see, some of the implementation details are a bit tricky.
The central concept in the k-means algorithm is the centroid. In data clustering, the centroid of a set of data tuples is
the one tuple that’s most representative of the group. The idea is best explained by example. Suppose you have three height-weight tuples

[a] (61.0, 100.0)
[b] (64.0, 150.0)
[c] (70.0, 140.0)
Which tuple is most representative? One approach is to compute a mathematical average (mean) tuple, and then select as the centroid the tuple that is closest to that average tuple. So, in this case, the average tuple is:

[m] = ((61.0 + 64.0 + 70.0) / 3, (100.0 + 150.0 + 140.0) / 3)
= (195.0 / 3, 390.0 / 3)
= (65.0, 130.0)

There are several ways to define closest. The most common approach, and the one used in the demo program, is to use the Euclidean distance. In words, the Euclidean distance between two tuples is the square root of the sum of the squared differences between each component of the tuples. Again, an example is the best way to explain. The Euclidean distance between tuple (61.0, 100.0) and the average tuple (65.0, 130.0) is:

dist(m,a) = sqrt((65.0 - 61.0)^2 + (130.0 - 100.0)^2)
= sqrt(4.0^2 + 30.0^2)
= sqrt(16.0 + 900.0)
= sqrt(916.0)
= 30.27
Similarly:

dist(m,b) = sqrt((65.0 - 64.0)^2 + (130.0 - 150.0)^2)
= 20.02
dist(m,c) = sqrt((65.0 - 70.0)^2 + (130.0 - 140.0)^2)
= 11.18
Because the smallest of the three distances is the distance between the math average and tuple [c],
the centroid of the three tuples is tuple [c]. You might wish to experiment with the demo program by using different definitions of the distance between two tuples to see how those affect the final clustering produced.

With the notion of a cluster centroid established, the k-means algorithm is relatively simple. In pseudo-code:

assign each tuple to a randomly selected cluster
compute the centroid for each cluster
loop until no improvement or until maxCount
assign each tuple to best cluster
(the cluster with closest centroid to tuple)
update each cluster centroid
(based on new cluster assignments)
end loop
return clustering

Program for k means cluster using an array of size 20
using System;
namespace ClusteringKMeans
{
class ClusteringKMeansProgram
{
static void Main(string[] args)
{
try
{
Console.WriteLine(" Begin outlier data detection demo ");
Console.WriteLine("Loading all (height-weight) data into memory");
string[] attributes = new string[] { "Height", "Weight" };
double[][] rawData = new double[20][];
rawData[0] = new double[] { 65.0, 220.0 };
rawData[1] = new double[] { 73.0, 160.0 };
rawData[2] = new double[] { 59.0, 110.0 };
rawData[3] = new double[] { 61.0, 120.0 };
rawData[4] = new double[] { 75.0, 150.0 };
rawData[5] = new double[] { 67.0, 240.0 };
rawData[6] = new double[] { 68.0, 230.0 };
rawData[7] = new double[] { 70.0, 220.0 };
rawData[8] = new double[] { 62.0, 130.0 };
rawData[9] = new double[] { 66.0, 210.0 };
rawData[10] = new double[] { 77.0, 190.0 };
rawData[11] = new double[] { 75.0, 180.0 };
rawData[12] = new double[] { 74.0, 170.0 };
rawData[13] = new double[] { 70.0, 210.0 };
rawData[14] = new double[] { 61.0, 110.0 };
rawData[15] = new double[] { 58.0, 100.0 };
rawData[16] = new double[] { 66.0, 230.0 };
rawData[17] = new double[] { 59.0, 120.0 };
rawData[18] = new double[] { 68.0, 210.0 };
rawData[19] = new double[] { 61.0, 130.0 };
Console.WriteLine(" Raw data: ");
ShowMatrix(rawData, rawData.Length, true);
int numAttributes = attributes.Length;
int numClusters = 3;
int maxCount = 30;
Console.WriteLine(" k = " + numClusters + " and maxCount = " + maxCount);
int[] clustering = Cluster(rawData, numClusters, numAttributes, maxCount);
Console.WriteLine(" Clustering complete");
Console.WriteLine(" Clustering in internal format: ");
ShowVector(clustering, true);
Console.WriteLine(" Clustered data:");
ShowClustering(rawData, numClusters, clustering, true);
double[] outlier = Outlier(rawData, clustering, numClusters, 0);
Console.WriteLine("Outlier for cluster 0 is:");
ShowVector(outlier, true);
Console.WriteLine(" End demo ");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
  
}
}