Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

### Problem 2 title: \"R Notebook\" output: html_notebook --- This is an [R Mark

ID: 3598096 • Letter: #

Question

### Problem 2

title: "R Notebook"
output: html_notebook
---
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.

## Homework 2 Review of Descriptive statistics and Correlation

### Problem 2
The dataset $NBAPlayers2015$ includes information on many variables for the players in the NBA (National Basketball Association) during the $2014 - 2015$ season. The dataset includes information for all players who averaged more than 24 minutes per game, and includes 182 players and 25 variables.

a) Import the dataset $NBAPlayers2015$ from the textbook website and do a preliminary analysis of the content of it. Additionally, identify which variables might be considered categorical and which ones nnumerical or quantitative?

```{r}
# Insert R code here

```

b) For the number of blocked shots during the season for each of the 182 players in the dataset compute the mean and standard deviation of the number of blocked shots. Compute also the summary for this dataset. Download the package $moments$ in order to compute the $skewness$ and $kurtosis$. Which set of summary statistics is more resistant to outliers and more appropriate if the data are heavily skewed?

```{r}
# Insert R code here

```

c) In basketball, a basket is awarded three points (rather than the usual two) if it is shot from farther away. Draw the histogram and the box-plot of the number of three point attempts by players in the NBA. Is it appropriate to use the $95 %$ rule with this dataset? Why or why not?

```{r}
# Insert R code here

```

d) In the dataset, $FGPct$ is the field goal percentage, $Points$ is total number of points scored during the season, $Assist$ is total number of assists during the season, and $Steal$ is total number of steals during the season. Compute the mean, median, standard deviation, and interquartile range for all these variables. Select five players of the NBA: LeBron James, Dwyane Wade, Chris Bosh, Tony Parker, Goran Dragic and use the z-score to determine, relative to other players in the NBA that season, which statistics of them is the most impressive and which is the least impressive?

```{r}
# Insert R code here

```

e) Let use $FTPct$, the percent of free trows made, to predict $FGPct$, the percent of field goals made. Make a scattered plot of this relationship. Is there a linear trend? If so, is it positive or negative? Find the correlation coefficient between the two variables. Find the regression line and the numerical summary of its components.

```{r}
# Insert R code here
```

Explanation / Answer

a)

##Set the working directory##

setwd("D:/")

##Read the data##

NBAPlayers2015<-read.csv("NBAPlayers2015.csv")

##See the structure of data##

str(NBAPlayers2015)

##Sample Output##

'data.frame': 182 obs. of 26 variables:

$ Player : Factor w/ 182 levels "Al Horford","Al Jefferson",..: 157 12 107 167 152 59 23 170 135 31 ...

$ Pos : Factor w/ 5 levels "C","PF","PG",..: 1 5 2 5 2 5 4 4 1 3 ...

$ Age : int 21 29 29 33 26 20 30 29 28 27 ...

$ Team : Factor w/ 31 levels "ATL","BOS","BRK",..: 21 29 25 15 19 17 20 11 19 29 ...

$ Games : int 70 78 71 63 61 81 40 82 76 82 ...

$ Starts : int 67 72 71 41 5 71 40 82 76 14 ...

$ Mins : int 1771 2502 2512 1648 1675 2541 1428 2930 1982 1964 ...

$ MinPerGame: num 25.3 32.1 35.4 26.2 27.5 31.4 35.7 35.7 26.1 24 ...

$ FGMade : int 217 375 659 225 291 383 358 366 213 258 ...

$ FGAttempt : int 399 884 1415 455 729 780 806 910 412 647 ...

$ FGPct : num 0.544 0.424 0.466 0.495 0.399 0.491 0.444 0.402 0.517 0.399 ...

$ FG3Made : int 0 118 37 10 122 7 61 194 0 83 ...

$ FG3Attempt: int 2 333 105 29 359 44 179 555 0 246 ...

$ FG3Pct : num 0 0.354 0.352 0.345 0.34 0.159 0.341 0.35 NA 0.337 ...

$ FTMade : int 103 167 306 79 129 257 189 122 131 178 ...

$ FTAttempt : int 205 198 362 126 151 347 237 143 225 205 ...

$ FTPct : num 0.502 0.843 0.845 0.627 0.854 0.741 0.797 0.853 0.582 0.868 ...

$ OffRebound: int 199 27 177 103 108 100 72 77 244 20 ...

$ DefRebound: int 324 220 549 177 187 442 192 382 504 141 ...

$ Rebounds : int 523 247 726 280 295 542 264 459 748 161 ...

$ Assists : int 66 129 124 86 55 207 122 209 72 353 ...

$ Steals : int 38 41 48 129 33 73 40 152 29 49 ...

$ Blocks : int 86 7 68 30 20 85 17 17 54 3 ...

$ Turnovers : int 99 116 122 86 62 173 89 141 95 145 ...

$ Fouls : int 222 167 125 166 113 254 87 186 144 93 ...

$ Points : int 537 1035 1661 539 833 1030 966 1048 557 777 ...

As we see only three variables are shown as Factor in this case which are Player, Pos and Team. Rest of the variables are numeric with different values.So the only categorical variables in this case are Player,Pos and Team. Rest are quantitative variables.

b)  mean(NBAPlayers2015$Blocks)
[1] 38.6044

##Standard deviation##
sd(NBAPlayers2015$Blocks)
[1] 39.82153

##Summary##
> summary(NBAPlayers2015$Blocks)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 14.0 23.0 38.6 48.0 200.0

##Moments##
install.packages("moments")
library(moments)

> set.seed(1234)
> x = rnorm(NBAPlayers2015$Blocks)
> skewness(x)
[1] 0.4253788
> kurtosis(x)
[1] 3.23995
From the summary stats the measure which is more resistant to outliers is median. The median value is calculated by finding the middle value after sorting the data in ascending manner. It doesn't matter if there are lots of outliers because median value won't be affected by these and will always lie in middle.
hist(NBAPlayers2015$FG3Attempt)
boxplot( Blocks~Player,data=NBAPlayers2015)
Just by seeing the histograms it seems like 95% of the values would be covered within the distance of 2 sigma.
So the 95%rule will work here.


summary(NBAPlayers2015$FGPct)
summary(NBAPlayers2015$Points)
summary(NBAPlayers2015$Assist)
summary(NBAPlayers2015$Steal)


LeBron James, Dwyane Wade, Chris Bosh, Tony Parker, Goran Dragic
head(NBAPlayers2015)

pnorm(abs(-0.5))

#For Le Bron#
z_FGPct<-(subset(NBAPlayers2015, Player=="LeBron James", select = FGPct)-mean((NBAPlayers2015$FGPct)))/
sd(NBAPlayers2015$FGPct)

pnorm(as.numeric(abs(z_FGPct)))


z_FGPct<-(subset(NBAPlayers2015, Player=="Dwyane Wade", select = FGPct)-mean((NBAPlayers2015$FGPct)))/
sd(NBAPlayers2015$FGPct)

pnorm(as.numeric(abs(z_FGPct)))


z_FGPct<-(subset(NBAPlayers2015, Player=="Chris Bosh", select = FGPct)-mean((NBAPlayers2015$FGPct)))/
sd(NBAPlayers2015$FGPct)

pnorm(as.numeric(abs(z_FGPct)))


z_Points<-(subset(NBAPlayers2015, Player=="Dwyane Wade", select = Points)-mean((NBAPlayers2015$Points)))/
sd(NBAPlayers2015$Points)

pnorm(as.numeric(abs(z_Points)))

###Points##

z_Points<-(subset(NBAPlayers2015, Player=="Chris Bosh", select = Points)-mean((NBAPlayers2015$Points)))/
sd(NBAPlayers2015$Points)

pnorm(as.numeric(abs(z_Points)))


z_Points<-(subset(NBAPlayers2015, Player=="Tony Parker", select = Points)-mean((NBAPlayers2015$Points)))/
sd(NBAPlayers2015$Points)

pnorm(as.numeric(abs(z_Points)))

z_Points<-(subset(NBAPlayers2015, Player=="Goran Dragic", select = Points)-mean((NBAPlayers2015$Points)))/
sd(NBAPlayers2015$Points)

pnorm(as.numeric(abs(z_Points)))


z_FGPct<-(subset(NBAPlayers2015, Player=="Tony Parker", select = FGPct)-mean((NBAPlayers2015$FGPct)))/
sd(NBAPlayers2015$FGPct)

pnorm(as.numeric(abs(z_FGPct)))

z_FGPct<-(subset(NBAPlayers2015, Player=="Goran Dragic", select = FGPct)-mean((NBAPlayers2015$FGPct)))/
sd(NBAPlayers2015$FGPct)

pnorm(as.numeric(abs(z_FGPct)))

##Assist##

z_Assists<-(subset(NBAPlayers2015, Player=="Dwyane Wade", select = Assists)-mean((NBAPlayers2015$Assists)))/
sd(NBAPlayers2015$Assists)

pnorm(as.numeric(abs(z_Assists)))


z_Assists<-(subset(NBAPlayers2015, Player=="Chris Bosh", select = Assists)-mean((NBAPlayers2015$Assists)))/
sd(NBAPlayers2015$Assists)

pnorm(as.numeric(abs(z_Assists)))


z_Assists<-(subset(NBAPlayers2015, Player=="Tony Parker", select = Assists)-mean((NBAPlayers2015$Assists)))/
sd(NBAPlayers2015$Assists)

pnorm(as.numeric(abs(z_Assists)))

z_Assists<-(subset(NBAPlayers2015, Player=="Goran Dragic", select = Assists)-mean((NBAPlayers2015$Assists)))/
sd(NBAPlayers2015$Assists)

pnorm(as.numeric(abs(z_Assists)))

#Steals##

z_Steals<-(subset(NBAPlayers2015, Player=="Dwyane Wade", select = Steals)-mean((NBAPlayers2015$Steals)))/
sd(NBAPlayers2015$Steals)

pnorm(as.numeric(abs(z_Steals)))


z_Steals<-(subset(NBAPlayers2015, Player=="Chris Bosh", select = Steals)-mean((NBAPlayers2015$Steals)))/
sd(NBAPlayers2015$Steals)

pnorm(as.numeric(abs(z_Steals)))


z_Steals<-(subset(NBAPlayers2015, Player=="Tony Parker", select = Steals)-mean((NBAPlayers2015$Steals)))/
sd(NBAPlayers2015$Steals)

pnorm(as.numeric(abs(z_Steals)))

z_Steals<-(subset(NBAPlayers2015, Player=="Goran Dragic", select = Steals)-mean((NBAPlayers2015$Steals)))/
sd(NBAPlayers2015$Steals)

pnorm(as.numeric(abs(z_Steals)))