R Programming Code: K-Means Algorithm Due Date: April 20, 2018 at 10:00pm centra
ID: 3712445 • Letter: R
Question
R Programming Code: K-Means Algorithm
Due Date: April 20, 2018 at 10:00pm central standard time zone
Write a complete working R programming code using K-means algorithm to do the following tasks.
Use the data set file that is given with this project and name it.
I have also attached my R code to this Project for help. I keep getting a few error messages. Please review the data file and Try to fix my code to make to run without errors. Please give an explanation to what you did to fix the program. Thanks for your help in advance.
1) Read data to Mdata to get pure data set only
2) Show the head of the Mdata
3) Exclude top value of B1
4) a all variables B1, B2,...B6
5) apply the same range of the variables.
6) show the results of the standardization.
7) With K-means method, analyze the data. ( K=3 case)
8) Show the results after dividing the data set.
9) find the number of size of the groups
Data file to use:
MyData2a
X B1 B2 B3 B4 B5 B6 B7
1 A1 8.6 72.7 88 401 1162 3910 604
2 A2 10.1 28.4 112 408 1159 2304 267
3 A3 8.1 28.9 80 278 1030 2305 195
4 A4 9.3 43.0 169 437 1908 4337 419
5 A5 11.3 44.9 343 521 1696 3384 762
6 A6 7.0 42.3 145 329 1792 4231 486
7 A7 4.6 23.8 192 205 1198 2758 447
8 A8 31.0 52.4 754 668 1728 4131 975
9 A9 4.9 56.9 124 241 1042 3090 272
10 A10 11.7 52.7 367 605 2221 4373 598
11 A11 11.2 43.9 214 319 1453 2984 430
12 A12 4.8 31.0 106 103 1339 3759 328
13 A13 1.8 12.5 42 179 956 2801 158
14 A14 3.2 20.0 21 178 1003 2800 181
15 A15 8.9 32.4 325 434 1180 2938 628
16 A16 6.0 25.9 90 186 887 2333 328
17 A17 4.4 32.9 80 252 1188 3008 258
18 A18 6.7 23.1 83 222 824 1740 193
19 A19 12.8 40.1 224 482 1461 3417 442
20 A20 3.6 29.7 193 331 1071 2189 906
21 A21 9.0 43.6 304 476 1296 2978 545
22 A22 2.0 14.8 28 102 803 2347 164
23 A23 11.3 67.4 301 424 1509 3378 800
24 A24 2.5 31.8 102 148 1004 2785 288
25 A25 9.2 29.2 170 370 1136 2500 439
26 A26 11.2 25.8 65 172 1076 1845 150
27 A27 2.9 17.3 20 118 783 3314 215
28 A28 8.1 26.4 88 354 1225 2423 208
29 A29 1.0 11.6 7 32 385 2049 120
30 A30 3.1 24.6 51 184 748 2677 168
31 A31 2.2 21.5 24 92 755 2208 228
32 A32 5.2 33.2 269 265 1071 2822 776
33 A33 11.5 46.9 130 538 1845 3712 343
34 A34 12.6 64.9 287 354 1604 3489 478
35 A35 10.7 30.5 514 431 1221 2924 637
36 A36 5.5 38.6 142 235 988 2574 376
37 A37 8.1 36.4 107 285 1787 3142 649
38 A38 6.6 51.1 206 286 1967 4163 402
39 A39 5.5 25.1 152 176 735 1654 354
40 A40 3.5 21.4 119 192 1294 2568 705
41 A41 8.6 41.3 99 525 1340 2846 277
42 A42 4.0 17.7 16 87 554 1939 99
43 A43 10.4 47.0 208 274 1325 2126 544
44 A44 13.5 51.6 240 354 2049 3987 714
45 A45 3.2 25.3 59 180 915 4074 223
46 A46 7.1 26.5 106 167 813 2522 219
47 A47 2.0 21.8 22 103 949 2697 181
48 A48 5.0 53.4 135 244 1861 4267 315
49 A49 3.1 20.1 73 162 783 2802 254
50 A50 5.9 18.9 41 99 625 1358 169
51 A51 5.3 21.9 22 243 817 3078 169
My Code and error messages:
> ## import the MyData2a set
> MyData2a <- read.csv(file.choose(), header = TRUE)
> # Display the pure data
> MyData2a
X B1 B2 B3 B4 B5 B6 B7
1 A1 8.6 72.7 88 401 1162 3910 604
2 A2 10.1 28.4 112 408 1159 2304 267
3 A3 8.1 28.9 80 278 1030 2305 195
4 A4 9.3 43.0 169 437 1908 4337 419
5 A5 11.3 44.9 343 521 1696 3384 762
6 A6 7.0 42.3 145 329 1792 4231 486
7 A7 4.6 23.8 192 205 1198 2758 447
8 A8 31.0 52.4 754 668 1728 4131 975
9 A9 4.9 56.9 124 241 1042 3090 272
10 A10 11.7 52.7 367 605 2221 4373 598
11 A11 11.2 43.9 214 319 1453 2984 430
12 A12 4.8 31.0 106 103 1339 3759 328
13 A13 1.8 12.5 42 179 956 2801 158
14 A14 3.2 20.0 21 178 1003 2800 181
15 A15 8.9 32.4 325 434 1180 2938 628
16 A16 6.0 25.9 90 186 887 2333 328
17 A17 4.4 32.9 80 252 1188 3008 258
18 A18 6.7 23.1 83 222 824 1740 193
19 A19 12.8 40.1 224 482 1461 3417 442
20 A20 3.6 29.7 193 331 1071 2189 906
21 A21 9.0 43.6 304 476 1296 2978 545
22 A22 2.0 14.8 28 102 803 2347 164
23 A23 11.3 67.4 301 424 1509 3378 800
24 A24 2.5 31.8 102 148 1004 2785 288
25 A25 9.2 29.2 170 370 1136 2500 439
26 A26 11.2 25.8 65 172 1076 1845 150
27 A27 2.9 17.3 20 118 783 3314 215
28 A28 8.1 26.4 88 354 1225 2423 208
29 A29 1.0 11.6 7 32 385 2049 120
30 A30 3.1 24.6 51 184 748 2677 168
31 A31 2.2 21.5 24 92 755 2208 228
32 A32 5.2 33.2 269 265 1071 2822 776
33 A33 11.5 46.9 130 538 1845 3712 343
34 A34 12.6 64.9 287 354 1604 3489 478
35 A35 10.7 30.5 514 431 1221 2924 637
36 A36 5.5 38.6 142 235 988 2574 376
37 A37 8.1 36.4 107 285 1787 3142 649
38 A38 6.6 51.1 206 286 1967 4163 402
39 A39 5.5 25.1 152 176 735 1654 354
40 A40 3.5 21.4 119 192 1294 2568 705
41 A41 8.6 41.3 99 525 1340 2846 277
42 A42 4.0 17.7 16 87 554 1939 99
43 A43 10.4 47.0 208 274 1325 2126 544
44 A44 13.5 51.6 240 354 2049 3987 714
45 A45 3.2 25.3 59 180 915 4074 223
46 A46 7.1 26.5 106 167 813 2522 219
47 A47 2.0 21.8 22 103 949 2697 181
48 A48 5.0 53.4 135 244 1861 4267 315
49 A49 3.1 20.1 73 162 783 2802 254
50 A50 5.9 18.9 41 99 625 1358 169
51 A51 5.3 21.9 22 243 817 3078 169
> # shows the heads
> head(MyData2a )
X B1 B2 B3 B4 B5 B6 B7
1 A1 8.6 72.7 88 401 1162 3910 604
2 A2 10.1 28.4 112 408 1159 2304 267
3 A3 8.1 28.9 80 278 1030 2305 195
4 A4 9.3 43.0 169 437 1908 4337 419
5 A5 11.3 44.9 343 521 1696 3384 762
6 A6 7.0 42.3 145 329 1792 4231 486
> ##Verifies number of columns
> ncol(MyData2a)
[1] 8
> ##Verifies names of columns
> names(MyData2a)
[1] "X" "B1" "B2" "B3" "B4" "B5" "B6" "B7"
> ## Excludes B1 column
> MyData2a.new <- MyData2a[-2]
> names(MyData2a.new)
[1] "X" "B2" "B3" "B4" "B5" "B6" "B7"
> ## New data excluding B1
> head(MyData2a.new )
X B2 B3 B4 B5 B6 B7
1 A1 72.7 88 401 1162 3910 604
2 A2 28.4 112 408 1159 2304 267
3 A3 28.9 80 278 1030 2305 195
4 A4 43.0 169 437 1908 4337 419
5 A5 44.9 343 521 1696 3384 762
6 A6 42.3 145 329 1792 4231 486
> #Exclude top value of B1
> MyData2a <- MyData2a [-1,]
> # first row has been removed and head displays remaining rows
> head(MyData2a)
X B1 B2 B3 B4 B5 B6 B7
2 A2 10.1 28.4 112 408 1159 2304 267
3 A3 8.1 28.9 80 278 1030 2305 195
4 A4 9.3 43.0 169 437 1908 4337 419
5 A5 11.3 44.9 343 521 1696 3384 762
6 A6 7.0 42.3 145 329 1792 4231 486
7 A7 4.6 23.8 192 205 1198 2758 447
> ## assign all variables B1, B2,...B6
> ## assigns column name as B1 B2 and so on for all
> ## the columns avaialble in MyData2a
> colnames(MyData2a) <- c("B1", "B2","B3","B4","B5", "B6")
> ## standardization
> standardized <- scale(x, scale=TRUE)
Error in scale(x, scale = TRUE) : object 'x' not found
Note: I’m note sure why I’m getting this Error
When I trace back I get this:
> traceback()
1: scale(x, scale = TRUE)
> standardized
Error: object 'standardized' not found
> kmeans.MyData2a <- kmeans(standardized, 3)
Error in as.matrix(x) : object 'standardized' not found
> kmeans.data$size
Error: object 'kmeans.data' not found
>
Explanation / Answer
You are getting "Error in scale(x, scale = TRUE) : object 'x' not found " error because there is no variable/object named 'x' in your code. Instead of x it should be 'MyData2a'.
Now since 'standardized <- scale(x, scale=TRUE)' threw an error and didn't get executed, no object named standardized was created, that's why, none of
" kmeans.MyData2a <- kmeans(standardized, 3)
kmeans.data$size"
work as they never got created.
Just replace x with MyData2a and your code will work fine.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.