Use R studio to do the following exercise: Part 1: Using the mtcars data set Cre
ID: 3851986 • Letter: U
Question
Use R studio to do the following exercise:
Part 1:
Using the mtcars data set
Create a kmeans object from the first, second, and third columns
What is the size of each cluster?
What are the centers of each cluster?
What is the average disp, wt, and qsec of each cluster?
Describe each cluster in English
Part 2:
Find a data set with at least 4 columns of numeric data and a categorical column
Run several scatter plots of the data
Create a kmeans object from the numeric data, you can pick K to be whatever you want
Determine the size of each cluster
Determine the centers of each cluster
Compare the clusters to the categorical data column as we did with the iris$Species column
Part 3:
For your chosen data set - airquality
Describe what each row of data represents
Describe each of your columns used – give a one sentence description of the column
If you know it, describe how the data was generated
For the clusters
Describe the size and means of clusters
Give a one- or two-word description to each cluster – in other words, give each cluster a label or name
This is an exercise in turning your numeric data into something descriptive for non-statisticians
Explanation / Answer
# to read the data set "mtcars " in veriable "mcar"
# "C:UsersN I T SDocumentsmtcars.csv" is path of file on my system you should change it
mcar<-read.csv(file="C:UsersN I T SDocumentsmtcars.csv", header=TRUE, sep=",")
Create a kmeans object from the first, second, and third columns:
You have not given the required number of cluster, so i assum it 3 . you should chnage it as per your requirment .
# kcls is the variable to store the cluster information
# mtcars[,1:3] ---> to select all row and only 1 to 3 colums
# 3 ----> number of required cluster
>kcls <- kmeans(mtcars[, 1:3], 3)
What is the size of each cluster?
Use this comand to see the details of clustes;
#kcls-----> name of your cluster variable.
>kcls$size
K-means clustering with 3 clusters of sizes 8, 16, 8
What are the centers of each cluster?
>kcls$centers
mpg cyl disp
1 14.6000 8.000 399.1250
2 24.5000 4.625 122.2937
3 16.7625 7.500 279.1750
What is the average disp, wt, and qsec of each cluster?
For average qsec
> mcar$cluster <- kcls$cluster
> kcls1<-mcar[mcar$cluster == 1,] # kcls1 store details of cluster 1
> mean(kcls1$qsec)
16.63
> kcls2<-mcar[mcar$cluster == 2,] # kcls2 store details of cluster 2
> mean(kcls2$qsec)
18.54312
> kcls3<-mcar[mcar$cluster == 3,] # kcls1 store details of cluster 3
> mean(kcls3$qsec)
17.67875
For average disp:
> mean(kcls1$disp)
399.125
> mean(kcls2$disp)
122.2938
> mean(kcls3$disp)
279.175
average wt:
> mean(kcls1$wt)
4.2355
> mean(kcls2$wt)
2.518
> mean(kcls3$wt)
3.5975
Part3:
Describe what each row of data represents:
Ans: each row represent daily readings of the air quality values for May 1, 1973 (a Tuesday) to September 30, 1973. in New York Air .
Ans:
Format
A data frame with 154 observations on 6 variables.
Details
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.
If you know it, describe how the data was generated.
Ans:
The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).
Describe the size and means of clusters
Give a one- or two-word description to each cluster – in other words, give each cluster a label or name:
Ans: You have not given any datail about clustring parmeter. From solution of part1 you know solve this part.
For part2:
Answer is same as part one .
only ploting id needed.
Plots:
# Plot of wt against mpg
attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
# Plot of wt against disp
attach(mtcars)
plot(wt, disp, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
# Plot of drat against cyl
attach(mtcars)
plot(drat, cyl, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.