Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Linear regression draws a straight line through a group of data points such that

ID: 3666982 • Letter: L

Question

Linear regression draws a straight line through a group of data points such that the position and slope of the line minimizes the square of the vertical distance between the data points and the straight line. It fits the data in an intuitively satisfying and yet mathematically reproducible way. For linear regression to be valid, all data points should vary in exactly the same random way, and that variation should have a normal or “Gaussian” distribution ? the familiar bell-shaped distribution.

To illustrate the application of linear regression, this project uses it to generate a trend line for the effect of nitrogen fertilizer on the yield of a crop of corn (maise). To guarantee that the required assumptions are met, we have created the data artificially, by adding a normally distributed random variable to a sloping straight line, with the same variance for all data points. Specifically, we added a normal random number having a standard deviation of 25 to the straight line. Here’s the equation:

y = 50 + 100 * x + randomNumber

The following plot shows one set of 10 data points, and the linear-regression fit to those data points:

Sample session:

Enter number of data points or 0 for default: 0

Fert    Yield

81      131

14      71

60      112

12     53

99      115

35      92

4       71

23      65

45      104

14      25

slope = 0.8486061764042895

yieldAt0 = 51.058940973154

yieldAtMax = 135.91955861358295

residual error = 18.87483162574109

Another sample session:

Enter number of data points or 0 for default: 10000

Fert    Yield

64      139

1       52

86      121

31      97

95      126

86      166

67      118

26      95

89      179

39      95

slope = 1.0051707825618592

yieldAt0 = 50.025474774097034

yieldAtMax = 150.54255303028296

residual error = 25.0921873778027

The first sample session prints all of the data points used in the figure. The second sample session prints just the first 10 of 10,000 points used as the basis of the regression. Of course, your random number generator will not generate the same data values as those shown above, but the four values at the bottom of your output should be close to the four values we generated – which are close to the parameters used to generate the random data.

Your job is to write the program that produces these results. To generate the first sample session above, initialize a two-dimensional array with the 10 sets of output values shown. To generate the second sample session above, import the java.util.Random package, use the zero-parameter constructor to instantiate a random-number generator, and have that generator call its nextGaussian method to generate a random variable with a Gaussian distribution whose mean value is zero and whose standard deviation is 1.0. (See Section 5.8 for more information.)

Here is the basic algorithm for linear regression:

1) Find the average x (avgX) and the average y (avgY).

2) Find the x_variance, which is the sum of the squares of (x[i] – avgX), divided by the number of data points.

3) Find the x_y_covariance, which is the sum of the product, (x[i] ? avgX) * (y[i] ? avgY), divided by the number of data points.

4) The slope of the desired regression line is slope ? x_variance / x_y_covariance

5) The y-axis intercept (value of y at x=0) of the straight line is

Explanation / Answer

package Gaussian;

import java.util.Scanner;



public class gaussian {

   
   
    public static void main(String[] args) {

    Scanner sc=new Scanner(System.in)    ;   
    int [][] fert_yield=new int[10][];

    int DataPoints=0;
    System.out.println("Enter the number of data points or 0 for default");   
    DataPoints=sc.nextInt();
   
   
   
    for (int i = 0; i < fert_yield.length; i++) {
        for (int j = 0; j < 2; j++) {
    fert_yield[i][j]=sc.nextInt();
        }
    }
   
    float avgF=0.0f;
    float avgY=0.0f;
    int sumF=0;
    int sumY=0;
    float x_variance=0;
    float x_y_covariance=0;
   
    for (int i = 0; i < fert_yield.length; i++) {
    sumF+=fert_yield[i][0];   
    sumY+=fert_yield[i][1];   
    }
   
    avgF=sumF/10;
    avgY=sumY/10;
   
    for (int i = 0; i < fert_yield.length; i++)
    {
    x_variance+=(fert_yield[i][0]-avgF);
    }
   
    x_variance=x_variance/DataPoints;
   
   
    for (int i = 0; i < fert_yield.length; i++)
    {
    x_y_covariance+=((fert_yield[i][0]-avgF)*(fert_yield[i][1]-avgY));
    }

    x_y_covariance=x_y_covariance/DataPoints;

   
    float Desire_Slope=x_variance/x_y_covariance;
   
    float Y_Intercept=avgY-Desire_Slope*avgF;
   
   
   
    }

}

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote