***Software to be used- R*** A medical center is interested in modeling prostate
ID: 3224799 • Letter: #
Question
***Software to be used- R***
A medical center is interested in modeling prostate-specific antigen (PSA) and a number of prognostic clinical measurements in men with advanced prostate cancer. Data were collected on 97 men who were about to undergo radical prostectomies.
Data
1 0.651 0.5599 15.959 50 0.0000 0 0.0000 6
2 0.852 0.3716 27.660 58 0.0000 0 0.0000 7
3 0.852 0.6005 14.732 74 0.0000 0 0.0000 7
4 0.852 0.3012 26.576 58 0.0000 0 0.0000 6
5 1.448 2.1170 30.877 62 0.0000 0 0.0000 6
6 2.160 0.3499 25.280 50 0.0000 0 0.0000 6
7 2.160 2.0959 32.137 64 1.8589 0 0.0000 6
8 2.340 1.9937 34.467 58 4.6646 0 0.0000 6
9 2.858 0.4584 34.467 47 0.0000 0 0.0000 7
10 2.858 1.2461 25.534 63 0.0000 0 0.0000 6
11 3.561 1.2840 36.598 65 0.0000 0 0.0000 6
12 3.561 0.2592 36.598 63 3.5609 0 0.0000 6
13 3.561 5.0028 20.491 63 0.0000 0 0.5488 7
14 3.857 4.3929 20.086 67 0.0000 0 0.0000 7
15 4.055 3.3535 31.187 57 0.0000 0 0.6505 7
16 4.263 4.6646 21.328 66 0.0000 0 0.0000 6
17 4.349 0.6570 33.784 70 3.4556 0 0.5488 7
18 4.437 9.8749 38.475 66 0.0000 0 1.4477 6
19 4.759 0.5712 26.311 41 0.0000 0 0.0000 6
20 4.953 1.1972 46.063 70 5.2593 0 0.0000 7
21 5.155 3.1582 30.569 59 0.0000 0 0.0000 6
22 5.259 7.8460 33.115 60 4.3492 0 3.8574 7
23 5.474 0.5827 29.371 59 0.4493 0 0.0000 6
24 5.529 5.9299 31.500 63 1.5527 0 3.2544 7
25 5.641 1.4770 39.252 69 4.9530 0 0.0000 6
26 5.871 4.2631 22.646 68 1.3499 0 0.0000 6
27 6.050 1.6653 41.264 65 0.0000 0 0.4493 7
28 6.172 0.6703 47.942 67 6.1719 0 0.0000 7
29 6.360 2.8292 22.874 67 1.2461 0 1.0513 7
30 6.619 11.1340 29.371 65 0.0000 0 5.0531 6
31 6.821 1.3364 59.740 65 7.0993 0 0.4493 6
32 7.463 1.1972 450.339 65 5.4739 0 0.0000 6
33 7.463 3.5966 20.905 71 3.5609 0 0.0000 6
34 7.538 1.0101 26.311 54 0.0000 0 0.0000 6
35 7.768 0.9900 25.028 63 0.0000 0 0.4493 6
36 8.085 3.7062 61.559 64 8.7583 0 0.0000 7
37 8.671 4.1371 38.861 73 0.5599 0 5.2593 8
38 8.935 1.5841 10.697 64 0.0000 0 0.0000 7
39 9.116 14.2963 59.740 68 3.9354 1 6.2339 7
40 9.777 2.2255 20.287 56 2.5600 0 0.8521 7
41 9.974 1.8589 23.104 60 0.0000 0 0.0000 8
42 10.074 4.2207 39.646 68 0.0000 0 0.0000 7
43 10.278 1.7860 47.942 62 5.5290 0 0.6505 6
44 10.697 5.8709 49.402 61 0.0000 0 2.2479 7
45 12.429 4.4371 30.265 66 5.7546 0 0.6505 7
46 12.807 5.2593 29.666 61 1.8589 0 0.0000 7
47 13.066 15.3329 54.598 79 6.5535 1 14.2963 8
48 13.066 3.1899 56.826 68 5.5290 0 0.6505 7
49 13.330 5.7546 33.115 43 0.0000 0 0.0000 6
50 13.330 3.3872 35.517 70 3.9354 0 0.4493 6
51 14.296 2.9743 54.055 68 0.0000 0 0.0000 7
52 14.585 5.2593 68.717 64 7.9248 0 0.0000 6
53 14.585 1.6653 37.713 64 4.4371 0 1.0513 7
54 14.732 8.4149 61.559 68 5.8709 0 4.2631 7
55 14.880 23.3361 33.784 59 0.0000 0 0.0000 8
56 15.180 3.5609 72.240 66 8.3311 0 0.0000 7
57 16.281 2.6379 17.637 47 0.0000 0 1.6487 7
58 16.281 1.5841 42.948 49 4.1371 0 0.0000 6
59 16.610 1.7160 65.366 70 1.5527 0 0.0000 8
60 16.610 2.8864 46.993 61 3.6328 0 0.0000 7
61 17.116 1.5841 91.836 73 10.2779 0 0.0000 6
62 17.288 7.3891 41.264 63 5.0531 1 6.7531 7
63 17.288 16.1190 33.784 72 0.0000 0 4.7588 8
64 17.814 7.6141 50.400 66 7.4633 1 8.2482 7
65 17.814 7.9248 37.338 64 0.0000 0 0.0000 6
66 17.993 4.3060 46.525 61 3.7434 0 0.6505 7
67 18.541 7.5383 48.424 68 5.9299 0 3.7434 7
68 19.298 9.0250 57.397 72 10.0744 0 0.6505 7
69 19.298 0.6376 82.269 69 0.0000 0 0.0000 6
70 19.492 3.2871 119.104 72 10.2779 0 0.4493 7
71 20.287 6.4237 36.234 60 0.0000 1 3.7434 7
72 20.905 3.1899 28.219 77 5.7546 0 0.0000 7
73 21.328 3.3535 46.063 69 0.0000 1 1.2461 7
74 21.758 6.2965 25.534 60 1.5527 1 3.2544 8
75 26.576 20.0855 46.993 69 0.0000 1 6.7531 8
76 28.219 23.1039 26.050 68 0.9512 1 11.2459 6
77 29.666 7.4633 83.931 72 8.3311 0 1.6487 8
78 31.187 12.6797 77.478 78 10.2779 0 0.0000 8
79 31.817 14.1540 35.874 69 0.0000 1 13.1971 7
80 33.448 16.1190 45.604 63 0.0000 0 1.4477 8
81 33.784 4.3492 21.542 66 1.7507 0 1.2461 7
82 34.124 12.3049 32.137 57 1.5527 0 10.2779 7
83 35.517 13.5991 48.911 77 0.5886 1 1.7507 7
84 35.517 14.5851 46.525 65 3.0649 0 5.7546 8
85 36.234 4.7588 40.854 60 5.4739 0 2.2479 8
86 37.713 27.1126 33.784 64 0.0000 1 10.2779 8
87 39.646 7.5383 41.679 58 5.1552 0 0.0000 6
88 40.854 5.6407 29.079 62 0.0000 1 1.3499 7
89 53.517 16.6099 112.168 65 0.0000 1 11.7048 8
90 54.055 4.7588 40.447 76 2.5600 1 2.2479 8
91 56.261 25.7903 60.340 68 0.0000 0 0.0000 6
92 62.178 12.5535 39.646 61 3.8574 1 0.0000 7
93 80.640 16.9455 48.424 68 0.0000 1 3.7434 8
94 107.770 45.6042 49.402 44 0.0000 1 8.7583 8
95 170.716 18.3568 29.964 52 0.0000 1 11.7048 8
96 239.847 17.8143 43.380 68 4.7588 1 4.7588 8
97 265.072 32.1367 52.985 68 1.5527 1 18.1741 8
Each line of the data set ha an identification number and provides information on 8 other variables
Develop a “best” model for predicting PSA and interpret. In addition, create a 90%
prediction interval for PSA levels for an individual who has the following values.
Variable Number Variable Name Description 1 ID number 1-97 2 PSA level Serum prostate-specific antigen level (mg/ml) 3 Cancer volume Estimate of prostate cancer volume (cc) 4 Weight Prostate weight (grams) 5 Age Age of patient (years) 6 Benign hyperplasia Amount of benign prostatic hyperplasia (cm2) 7 Seminal Vesicle invasion Presence of seminal vesicle invasion: 1 yes; 0 otherwise 8 Capsular penetration Degree of capsular penetration (cm) 9 Gleason score Pathologically determined grade of disease. (Scores were either 6, 7, or 8 with higher scoresindicating worse prognosis)
Explanation / Answer
I am using R software to solve this problem.
First i have copied the data into a csv file. We can load the data into R environment using read.csv function as below:
InputData <- read.csv("Data1.txt",header=T)
#Check for dimensions once
dim(InputData)
97 9
#Convert SeminalVesicleInvasion and GleasonScore to factors
InputData$SeminalVesicleInvasion <- as.factor(InputData$SeminalVesicleInvasion)
InputData$GleasonScore <- as.factor(InputData$GleasonScore)
#Fit a linear model with all the variables using lm function
#Excluding IDNum as it is just a unique identifier
fit <- lm(PSALevel ~ . - IDNum, data = InputData)
summary(fit)
Call:
lm(formula = PSALevel ~ . - IDNum, data = InputData)
Residuals:
Min 1Q Median 3Q Max
-68.153 -7.323 -0.177 6.403 161.547
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.849265 28.958981 1.100 0.27442
CancerVolume 1.748107 0.615858 2.838 0.00563 **
Weight -0.004546 0.074038 -0.061 0.95118
Age -0.537278 0.471991 -1.138 0.25808
BenignHyperPlasia 1.530782 1.201007 1.275 0.20581
SeminalVesicleInvasion1 21.108723 10.844893 1.946 0.05479 .
CapsularPenetration 1.097882 1.322879 0.830 0.40883
GleasonScore7 -1.661862 7.570741 -0.220 0.82676
GleasonScore8 18.423157 10.661795 1.728 0.08750 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 30.91 on 88 degrees of freedom
Multiple R-squared: 0.4733, Adjusted R-squared: 0.4254
F-statistic: 9.886 on 8 and 88 DF, p-value: 1.037e-09
We can see that CancerVolume variable is highly significant and also SeminalVesicleInvasion and GleasonScore at 10% significance level. So lets fit the model with only these 3 variables.
fit <- lm(PSALevel ~ CancerVolume + SeminalVesicleInvasion + GleasonScore, data = InputData)
summary(fit)
Call:
lm(formula = PSALevel ~ CancerVolume + SeminalVesicleInvasion +
GleasonScore, data = InputData)
Residuals:
Min 1Q Median 3Q Max
-59.879 -6.706 0.501 4.983 162.012
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.4890 5.7769 0.258 0.797169
CancerVolume 1.9706 0.5485 3.592 0.000528 ***
SeminalVesicleInvasion1 23.1265 9.5612 2.419 0.017541 *
GleasonScore7 -1.0806 7.2794 -0.148 0.882317
GleasonScore8 18.1149 10.3085 1.757 0.082198 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 30.74 on 92 degrees of freedom
Multiple R-squared: 0.4557, Adjusted R-squared: 0.432
F-statistic: 19.26 on 4 and 92 DF, p-value: 1.551e-11
We can see that now CancerVolume and SeminalVesicleInvasion are significant at 5% significance level. And GleasonScore for score8 is significant at 10% significance level. P value of F statistic is also very low indicating that the model is way better than a null model
We can check for multicollinearity once using the vif function from the car package
library(car)
vif(fit)
GVIF Df GVIF^(1/(2*Df))
CancerVolume 1.899061 1 1.378064
SeminalVesicleInvasion 1.592206 1 1.261826
GleasonScore 1.551470 2 1.116056
VIF values are well within limits indicating no multicollinearity.
Coefficent of CancerVolume is 1.9706
That means for every 1 cc increase in prostate cancer volume, PSA level is getting increased by 1.9706 mg/ml
Coefficient of SeminalVesicleInvasion1 is 23.1265.
This means if there is presence of seminal vesicle invasion, the PSA level is getting increased by 23.1265 mg/ml as compared when seminal vesicle invalsion is absent.
Coefficient of GleasonScore8 is 18.1149. This means if there is GleasonScore of 8, the PSA level is getting increased by 18.1149 mg/ml as compared when GleasonScore is 6.
To do the prediction for a new individual we can create a dataframe as below:
NewData <- data.frame(CancerVolume=4.2633,Weight=22.783,Age=68,
BenignHyperPlasia=1.35,SeminalVesicleInvasion=0,
CapsularPenetration=0,GleasonScore=6)
FactorLevelsSeminalVesicleInvasion <- levels(InputData$SeminalVesicleInvasion)
FactorLevelsGleasonScore <- levels(InputData$GleasonScore)
NewData$SeminalVesicleInvasion <- as.factor(NewData$SeminalVesicleInvasion)
NewData$GleasonScore <- as.factor(NewData$GleasonScore)
levels(NewData$SeminalVesicleInvasion) <- FactorLevelsSeminalVesicleInvasion
levels(NewData$GleasonScore) <- FactorLevelsGleasonScore
#Prediction can be done using predict function and predcition interval can be calculated using argument interval="prediction"
predict(fit,newdata = NewData, interval="prediction", level = 0.90)
fit lwr upr
9.890279 -41.95032 61.73087
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.