Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Use R studio to do the following exercise: Part 1: Using the mtcars data set Cre

ID: 3851986 • Letter: U

Question

Use R studio to do the following exercise:

Part 1:

Using the mtcars data set

Create a kmeans object from the first, second, and third columns

What is the size of each cluster?

What are the centers of each cluster?

What is the average disp, wt, and qsec of each cluster?

Describe each cluster in English

Part 2:

Find a data set with at least 4 columns of numeric data and a categorical column

Run several scatter plots of the data

Create a kmeans object from the numeric data, you can pick K to be whatever you want

Determine the size of each cluster

Determine the centers of each cluster

Compare the clusters to the categorical data column as we did with the iris$Species column

Part 3:

For your chosen data set - airquality

Describe what each row of data represents

Describe each of your columns used – give a one sentence description of the column

If you know it, describe how the data was generated

For the clusters

Describe the size and means of clusters

Give a one- or two-word description to each cluster – in other words, give each cluster a label or name

This is an exercise in turning your numeric data into something descriptive for non-statisticians

Explanation / Answer

# to read the data set "mtcars " in veriable "mcar"

# "C:UsersN I T SDocumentsmtcars.csv" is path of file on my system you should change it

mcar<-read.csv(file="C:UsersN I T SDocumentsmtcars.csv", header=TRUE, sep=",")

Create a kmeans object from the first, second, and third columns:

You have not given the required number of cluster, so i assum it 3 . you should chnage it as per your requirment .

# kcls is the variable to store the cluster information

# mtcars[,1:3] ---> to select all row and only 1 to 3 colums

# 3 ----> number of required cluster

>kcls <- kmeans(mtcars[, 1:3], 3)

What is the size of each cluster?

Use this comand to see the details of clustes;

#kcls-----> name of your cluster variable.

>kcls$size

K-means clustering with 3 clusters of sizes 8, 16, 8

What are the centers of each cluster?

>kcls$centers

mpg cyl disp
1 14.6000 8.000 399.1250
2 24.5000 4.625 122.2937
3 16.7625 7.500 279.1750

What is the average disp, wt, and qsec of each cluster?

For average qsec

> mcar$cluster <- kcls$cluster
> kcls1<-mcar[mcar$cluster == 1,] # kcls1 store details of cluster 1

> mean(kcls1$qsec)
16.63

> kcls2<-mcar[mcar$cluster == 2,] # kcls2 store details of cluster 2

> mean(kcls2$qsec)
18.54312

> kcls3<-mcar[mcar$cluster == 3,] # kcls1 store details of cluster 3

> mean(kcls3$qsec)
17.67875

For average disp:

> mean(kcls1$disp)
399.125

> mean(kcls2$disp)
122.2938

> mean(kcls3$disp)
279.175

average wt:

> mean(kcls1$wt)
4.2355

> mean(kcls2$wt)
2.518


> mean(kcls3$wt)
3.5975

Part3:

Describe what each row of data represents:

Ans: each row represent daily readings of the air quality values for May 1, 1973 (a Tuesday) to September 30, 1973. in New York Air .

Ans:

Format

A data frame with 154 observations on 6 variables.

Details

Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

If you know it, describe how the data was generated.

Ans:

The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

Describe the size and means of clusters

Give a one- or two-word description to each cluster – in other words, give each cluster a label or name:

Ans: You have not given any datail about clustring parmeter. From solution of part1 you know solve this part.

For part2:

Answer is same as part one .

only ploting id needed.

Plots:

# Plot of wt against mpg

attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
   xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)

# Plot of wt against disp

attach(mtcars)
plot(wt, disp, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)

# Plot of drat against cyl

attach(mtcars)
plot(drat, cyl, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)

[,1] Ozone numeric Ozone (ppb) [,2] Solar.R numeric Solar R (lang) [,3] Wind numeric Wind (mph) [,4] Temp numeric Temperature (degrees F) [,5] Month numeric Month (1--12) [,6] Day numeric Day of month (1--31)