As part of the quarterly reviews, the manager of a retail store analyzes the qua
ID: 3183454 • Letter: A
Question
As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected data on 100 customers who visited the store; see the table below:
Apply k-means clustering using Wait time (min) as the variable with k = 3. Be sure to Normalize input data, and specify 50 iterations (or the maximum available in your version of XLMiner) and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Which created clusters have the lowest and highest waiting times?
Create one distinct data set for each of the three resulting clusters found. To do so, you may first add the columns Purchase Amount ($), Customer Age, and Customer Satisfaction Rating to the KMC_Clusters output obtained, and then sort the column Cluster ID.
For the observations composing the cluster which has the medium waiting time, apply hierarchical clustering with Ward’s method to form three sub-clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. What are the numbers of customers in each of the three created sub-clusters, say A1, A2, and A3? What are the averages of the four variables characterizing A1, A2, and A3?
For the observations composing the cluster which has the low waiting time, apply hierarchical clustering with Ward’s method to form two sub-clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. What are the numbers of customers in each of the two created sub-clusters, say B1 and B2? What are the averages of the four variables characterizing B1 and B2?
For the observations composing the cluster which has the high waiting time, apply hierarchical clustering with Ward’s method to form two sub-clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. What are the numbers of customers in each of the two created sub-clusters, say C1 and C2? What are the averages of the four variables characterizing C1 and C2?
Customer Number Wait Time (min) Purchase Amount ($) Customer Age Customer Satisfaction Rating 1 2.3 436 42 7 2 2.8 408 33 6 3 3.2 432 38 5 4 3.4 431 40 5 5 3.4 456 29 6 6 4.2 537 46 4 7 3.2 456 42 5 8 1.4 430 40 8 9 6.4 663 24 3 10 7.8 839 37 4 11 6.5 659 52 5 12 9.8 836 43 2 13 5 543 56 4 14 1.8 419 35 8 15 6.1 700 39 6 16 3.4 432 44 7 17 7.8 845 33 5 18 2.8 467 42 6 19 1.2 425 46 8 20 9.5 848 50 4 21 8.2 808 55 3 22 7.6 674 35 3 23 5.4 547 52 4 24 6.7 691 38 5 25 9.6 847 53 4 26 11.4 826 48 2 27 2.1 426 52 7 28 5.6 535 32 7 29 3.7 521 43 8 30 4.9 513 44 6 31 6.4 645 53 5 32 9.3 846 52 4 33 10.6 730 51 3 34 6.5 786 53 3 35 5.4 523 46 5 36 7.6 654 36 6 37 3.2 443 48 7 38 2.4 409 54 8 39 1 400 39 6 40 0.2 418 51 7 41 2.4 498 30 6 42 5.7 532 32 5 43 6.4 663 44 7 44 6 681 39 8 45 3.7 543 54 5 46 8.7 800 51 5 47 6.9 673 45 5 48 9.8 856 43 4 49 10 756 44 4 50 9.5 854 43 6 51 6.3 672 50 6 52 7.4 698 47 7 53 2.3 434 43 7 54 4.6 544 40 4 55 4.9 523 53 6 56 5.7 546 55 6 57 7.4 676 42 8 58 6.8 662 36 6 59 9.6 1000 40 5 60 6.4 678 46 5 61 7.2 655 32 4 62 5.6 535 36 5 63 9.7 833 35 3 64 2.3 498 30 7 65 4.3 508 41 6 66 5.7 542 49 6 67 2.4 435 39 8 68 6.7 665 41 5 69 2.4 387 54 9 70 9.8 845 34 7 71 4.5 532 40 6 72 6.7 687 30 5 73 7.2 643 33 4 74 3.5 424 49 7 75 8.9 836 47 5 76 9.7 876 31 4 77 3.5 456 47 7 78 4.7 523 49 6 79 8.5 818 35 5 80 9.7 845 54 4 81 2.7 401 55 7 82 5.7 554 43 6 83 7.6 648 51 7 84 4.4 540 31 6 85 7.8 839 45 5 86 9.4 845 48 4 87 4.9 534 36 5 88 7.1 693 44 4 89 5.4 512 39 3 90 6.7 665 49 5 91 8.6 825 36 5 92 4.5 548 30 7 93 6.1 704 31 5 94 5.3 509 31 6 95 6.7 672 35 5 96 8.1 824 36 4 97 6.3 632 30 4 98 7.4 689 35 2 99 8.8 839 50 4 100 9.6 847 35 2Explanation / Answer
I am using R software to solve this problem.
At first, we need to import the data into R environment as below:
Data <- read.table("Data.txt",header = T,sep=" ")
colnames(Data) <- c('CustomerName','WaitTimeInMins','PurchaseAmount','CustomerAge','CustSatisfactionRating')
str(Data)
'data.frame': 100 obs. of 5 variables:
$ CustomerName : int 1 2 3 4 5 6 7 8 9 10 ...
$ WaitTimeInMins : num 2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4 6.4 7.8 ...
$ PurchaseAmount : int 436 408 432 431 456 537 456 430 663 839 ...
$ CustomerAge : int 42 33 38 40 29 46 42 40 24 37 ...
$ CustSatisfactionRating: int 7 6 5 5 6 4 5 8 3 4 ...
#Extract Waiting time into a vector
WaitingTime <- Data$WaitTimeInMins
#Normalize the Waiting Time using scale function
WaitingTime <- as.vector(scale(WaitingTime))
k-means clustering can be applied using kmeans function as below. We set the seed for the sake of reproducibility.
set.seed(1234)
kmeansClustering <- kmeans(WaitingTime, centers = 3, iter.max = 50, nstart = 10)
K-means clustering with 3 clusters of sizes 32, 45, 23
Cluster means:
[,1]
1 -1.177044
2 0.154462
3 1.335418
So first cluster has lowest waiting time and third cluster has the highest waiting time. And second cluster has the medium waiting time.
#Add cluster column to the original dataset
Data$Cluster <- kmeansClustering$cluster
KMC_Clusters <- Data
#Sorting the data based on Cluster number
KMC_Clusters <- KMC_Clusters[order(KMC_Clusters$Cluster),]
#Creating 3 new datasets based on low, medium and high waiting times
KMC_Clusters_LowWaiting <- KMC_Clusters[KMC_Clusters$Cluster==1,]
KMC_Clusters_MediumWaiting <- KMC_Clusters[KMC_Clusters$Cluster==2,]
KMC_Clusters_HighWaiting <- KMC_Clusters[KMC_Clusters$Cluster==3,]
#Apply hierarchical clustering using Ward method for medium waiting time observations
#Normalize the data
KMC_Clusters_MediumWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')] <- data.frame(scale(KMC_Clusters_MediumWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')]))
#Create dissimilarity matrix
d <- dist(KMC_Clusters_MediumWaiting[,c('PurchaseAmount','CustomerAge','CustSatisfactionRating')])
hc_medium <- hclust(d, method = "ward.D")
#Create 3 clusters
clusters <- cutree(hc_medium, k = 3)
clusters
1 2 3
8 22 15
So First cluster has 8 observations, Second has 22 and third has 15.
Check average within each cluster
aggregate(.~Cluster,KMC_Clusters_MediumWaiting[-1],mean)
Cluster WaitTimeInMins PurchaseAmount CustomerAge CustSatisfactionRating
1 1 6.862500 668.3750 31.25000 3.750000
2 2 6.890909 700.1818 43.40909 5.545455
3 3 5.326667 531.4000 43.53333 5.333333
#Apply hierarchical clustering using Ward method for low waiting time observations
#Normalize the data
KMC_Clusters_LowWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')] <- data.frame(scale(KMC_Clusters_LowWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')]))
#Create dissimilarity matrix
d <- dist(KMC_Clusters_LowWaiting[,c('PurchaseAmount','CustomerAge','CustSatisfactionRating')])
hc_low <- hclust(d, method = "ward.D")
plot(hc_low)
#Create 2 clusters
clusters <- cutree(hc_medium, k = 2)
table(clusters)
clusters
1 2
15 17
So cluster 1 has 15 observations and cluster 2 has 17 observations.
KMC_Clusters_LowWaiting <- KMC_Clusters[KMC_Clusters$Cluster==1,]
KMC_Clusters_LowWaiting$Cluster <- clusters
aggregate(.~Cluster,KMC_Clusters_LowWaiting[-1],mean)
Cluster WaitTimeInMins PurchaseAmount CustomerAge CustSatisfactionRating
1 1 2.320000 425.0000 46.60000 7.466667
2 2 3.435294 489.3529 38.11765 5.764706
#Apply hierarchical clustering using Ward method for low waiting time observations
#Normalize the data
KMC_Clusters_HighWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')] <- data.frame(scale(KMC_Clusters_HighWaiting[c('PurchaseAmount','CustomerAge','CustSatisfactionRating')]))
#Create dissimilarity matrix
d <- dist(KMC_Clusters_HighWaiting[,c('PurchaseAmount','CustomerAge','CustSatisfactionRating')])
hc_high <- hclust(d, method = "ward.D")
plot(hc_high)
#Create 2 clusters
clusters <- cutree(hc_high, k = 2)
table(clusters)
clusters
1 2
14 9
So cluster 1 has 14 observations and cluster 2 has 9 observations.
KMC_Clusters_HighWaiting <- KMC_Clusters[KMC_Clusters$Cluster==3,]
KMC_Clusters_HighWaiting$Cluster <- clusters
aggregate(.~Cluster,KMC_Clusters_HighWaiting[-1],mean)
Cluster WaitTimeInMins PurchaseAmount CustomerAge CustSatisfactionRating
1 1 9.535714 822.0714 48.64286 3.571429
2 2 9.255556 859.0000 37.00000 4.777778
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.