Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Please answer Question 3 using R language (in as fewer steps as possible). You n

ID: 3258644 • Letter: P

Question

Please answer Question 3 using R language (in as fewer steps as possible). You need to install the Lahman package at:

https://cran.r-project.org/web/packages/Lahman/

Then you will have access to Batting or Teams data.

NL (National League) and AL (American League) are under lgID (LeagueID)

Please answer the following questions in either R (preferred) or Python. If you choose to use Python, you will have to use R to save the data and then read into Pvthon 1. Using Lahman MLB data in R, list the top 5 teams since 2000 with the largest stolen bases per at bat ratio library(Lahman) head(Batting) >library Lahman) + head(Batting) playerlD yearlD atint teamiD IglD G AB R H X2B X38 HR RBI SB CS BB SO 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 00 0 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 00 13 81 40 3 allisar01 1871 1 CL1 NA29 137 28 40 4 5 0 19 31 2 5 4 alliado01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0 2 5 anaonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2 1 6 armatbo01 1871 1 FW1 NA 12 49 911 2 1 0 501 0 1 IB8 HBP SH SF GIDP 1 NA NA NA NA NA 2 NA NANANA NA 3 NA NA NA NA NA 4 NA NA NA NA NA 5 NA NA NA NA NA 6 NA NANA NA NA where each row represents the yearly statistics for a player on a team in a year. Here, G= games played by the player, AB= at bats. 2. Using this same Batting data, plot the yearly SB.Per.AB rate. This will be computed over the entire year rather than per team-year as above 2a. Same plot but color each plot by lglD (LeaguelD). For this problem we only care about NL and AL, everything else can be filtered out. 3. Use this Year. SB.PerAB dataset (generated in #2 above) to create a model for how year relates to SB.PerAB. In this problem you are using only yearID to predict SB.PerAB. Try a few model fits and determine which one is best

Explanation / Answer

R output:

> #1:
> library(Lahman)
> head(Batting)
playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB SO
1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0 0
2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4 0
3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2 5
4 allisdo01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0 2
5 ansonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2 1
6 armstbo01 1871 1 FW1 NA 12 49 9 11 2 1 0 5 0 1 0 1
IBB HBP SH SF GIDP
1 NA NA NA NA NA
2 NA NA NA NA NA
3 NA NA NA NA NA
4 NA NA NA NA NA
5 NA NA NA NA NA
6 NA NA NA NA NA
> #?Batting
> attach(Batting)
The following objects are masked from Batting (pos = 3):

AB, BB, CS, G, GIDP, H, HBP, HR, IBB, lgID, playerID, R, RBI, SB,
SF, SH, SO, stint, teamID, X2B, X3B, yearID

The following objects are masked from Batting (pos = 4):

AB, BB, CS, G, GIDP, H, HBP, HR, IBB, lgID, playerID, R, RBI, SB,
SF, SH, SO, stint, teamID, X2B, X3B, yearID

The following objects are masked from Batting (pos = 5):

AB, BB, CS, G, GIDP, H, HBP, HR, IBB, lgID, playerID, R, RBI, SB,
SF, SH, SO, stint, teamID, X2B, X3B, yearID

The following objects are masked from Batting (pos = 6):

AB, BB, CS, G, GIDP, H, HBP, HR, IBB, lgID, playerID, R, RBI, SB,
SF, SH, SO, stint, teamID, X2B, X3B, yearID

> names(Batting)
[1] "playerID" "yearID" "stint" "teamID" "lgID" "G"   
[7] "AB" "R" "H" "X2B" "X3B" "HR"
[13] "RBI" "SB" "CS" "BB" "SO" "IBB"   
[19] "HBP" "SH" "SF" "GIDP"
> data1=subset(Batting,yearID>=2000)
> team=unique(data1$teamID)
> SB.Per.AB=NULL
> for(i in 1:length(team))
+ {
+ for(j in 2000:max(yearID))
+ {
+ z=which(data1$teamID==team[i] & data1$yearID==j)
+ SB.team.yr=sum(data1$SB[z])
+ AB.team.yr=sum(data1$AB[z])
+ SB.Per.AB=c(SB.Per.AB,SB.team.yr/AB.team.yr)
+ }
+ }
> length(team)
[1] 33
> length(SB.Per.AB)
[1] 528
> length(team)*length(2000:max(yearID))
[1] 528
> head(order(SB.Per.AB,decreasing="TRUE"),5)
[1] 24 474 131 475 188
> #top 5 teams since 2000 with the largest stolen bases per at bat ratio
>

> #2:
> yr.SB.Per.AB=NULL
> year=min(yearID):max(yearID)
> for(i in year)
+ {
+ z=which(yearID==i)
+ SB.yr=sum(SB[z])
+ AB.yr=sum(AB[z])
+ yr.SB.Per.AB[i-1870]=SB.yr/AB.yr
+ }
> length(year)
[1] 145
> plot(year,yr.SB.Per.AB)


> #In the same year if we observe the values of LeagueID,then they are different
> #Hence, we cannot treat them as either "NL" or "AL"
> #e.g
> i
[1] 2015
> unique(lgID[z])
[1] NL AL
Levels: AA AL FL NA NL PL UA
> #Thus we cannot plot the second graph

> #3:

> fit1=lm(yr.SB.Per.AB~year)

> summary(fit1)

Call:

lm(formula = yr.SB.Per.AB ~ year)

Residuals:

      Min        1Q    Median        3Q       Max

-0.028730 -0.010550 -0.001438 0.009906 0.049295

Coefficients:

              Estimate Std. Error t value Pr(>|t|)   

(Intercept) 4.985e-01 6.254e-02   7.972 1.92e-12 ***

year        -2.453e-04 3.226e-05 -7.604 1.22e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01293 on 106 degrees of freedom

(37 observations deleted due to missingness)

Multiple R-squared: 0.353,     Adjusted R-squared: 0.3468

F-statistic: 57.82 on 1 and 106 DF, p-value: 1.222e-11

> fit2=lm(sqrt(yr.SB.Per.AB)~year)

> summary(fit2)

Call:

lm(formula = sqrt(yr.SB.Per.AB) ~ year)

Residuals:

      Min        1Q    Median        3Q       Max

-0.089582 -0.036007 -0.003869 0.033278 0.110326

Coefficients:

              Estimate Std. Error t value Pr(>|t|)   

(Intercept) 1.538e+00 1.878e-01   8.192 6.30e-13 ***

year        -7.192e-04 9.687e-05 -7.424 2.99e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03883 on 106 degrees of freedom

(37 observations deleted due to missingness)

Multiple R-squared: 0.3421,    Adjusted R-squared: 0.3359

F-statistic: 55.12 on 1 and 106 DF, p-value: 2.993e-11

> fit3=lm(log(yr.SB.Per.AB)~year)

> summary(fit3)

Call:

lm(formula = log(yr.SB.Per.AB) ~ year)

Residuals:

     Min       1Q   Median       3Q      Max

-1.17806 -0.47664 -0.03061 0.46711 1.05348

Coefficients:

             Estimate Std. Error t value Pr(>|t|)   

(Intercept) 13.080473   2.479217   5.276 7.04e-07 ***

year        -0.008797   0.001279 -6.878 4.38e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5127 on 106 degrees of freedom

(37 observations deleted due to missingness)

Multiple R-squared: 0.3086,    Adjusted R-squared: 0.3021

F-statistic: 47.31 on 1 and 106 DF, p-value: 4.375e-10

1st model is the best.(using Adjusted R-squared)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote