Theoretical Overview Suppose we have a set of data consisting of ordered pairs a
ID: 3570253 • Letter: T
Question
Theoretical Overview
Suppose we have a set of data consisting of ordered pairs and we suspect the x and y coordinates are related. It is natural to try to find the best line that fits the data points. If we can find this line, then we can use it to make all sorts of other predictions. In this project, we're going to use several functions to find this line using a technique called least squares regression. The result will be what we call the least squares regression line (or LSRL for short).
In order to do this, you'll need to program a statistical computation called the correlation coefficient, denoted by r in statistical symbols:
Once you have the correlation coefficient, you use it along with the sample means and sample standard deviations of the x and y-coordinates to compute the slope and y-intercept of your regression line via these formulas:
Project Specifications
In this project, you must read the x- and y-coordinate pairs in from a data file of unknown length. Each line in the file must contain both coordinates, separated by whitespace, as shown here. In addition, you must use methods in this project, splitting the work up into smaller components(like some of Project 3) and reinforcing your skills with parameter passing and arrays.
You are required to create the following methods, and you must list them in this order above the main program (no prototypes, please!):
# (for reference)
Role
Method
# (for reference)
Role
Method
Explanation / Answer
METHOD 1:
First you'll need to find sx and sy. If you're allowed and using an upper-level programming language, use the st.dev. function (which you may have written earlier anyways). If not, it looks like this (for x...y is the same method):
avgx = average(x)
for i = 1:n
sumx += (x(i) - avgx)^2
end
sigmax = sqrt(sumx/n)
Once you have sx and sy, you can complete Method 1:
for i = 1:n
sumr += (x(i) - avgx)/sigmax * (y(i) - avgy)/sigmay
end
r = sumr / (n-1)
METHOD 2:
You'll need standard deviations again. I suggest making the scope of sigmax and sigmay public or creating a 3rd method (no rule against that!) that computes the standard deviations. You'll also need r, so Method 2 should call Method 1 to retrieve r.
r = Method1(x,y,n)
b = r * sigmay / sigmax
a = avgy - b * avgx
MAIN FUNCTION:
Here's where you will do your i/o and call Method 2 to do all the heavy work.
until the end of file is reached:
[x(i),y(i)] = read line from file
then n = i (or, more robustly, n = length of x or y array)
pass n, x, and y to Method 2 and retrieve a and b from it.
NOTES:
There is mention in the problem description about there being 6 methods used. You are only required to use 2, so this statement confuses me a little. If you want to get 6 methods, make one that receives the filename and returns x, y, and n, one that calculates St.Dev., one that calculates average, and maybe one that writes the output in addition to your 2 required ones. I would not recommend making 6 methods in the solution to this problem. Five at most would be okay and four is probably best.
I hope this was helpful!
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.