You will write a program in C learn that uses a training data set to learn weigh
ID: 3722937 • Letter: Y
Question
You will write a program in C learn that uses a training data set to learn weights for a set of house attributes, and then applies those weights to a set of input data to calculate prices for those houses. learn takes two arguments, which are the paths to les containing the training data and input data.
Training data format - The rst line will be the word “train”. The second line will contain an integer k, giving the number of attributes. The third line will contain an integer n, giving the number of houses. The next n lines will contain k + 1 oating-point numbers, separated by spaces. Each line gives data for a house. The rst k numbers give the values x1···xk for that house, and the last number gives its price y.
For example, a le train.txt might contain:
train
4
7
3.000000 1.000000 1180.000000 1955.000000 221900.000000
3.000000 2.250000 2570.000000 1951.000000 538000.000000
2.000000 1.000000 770.000000 1933.000000 180000.000000
4.000000 3.000000 1960.000000 1965.000000 604000.000000
3.000000 2.000000 1680.000000 1987.000000 510000.000000
4.000000 4.500000 5420.000000 2001.000000 1230000.000000
3.000000 2.250000 1715.000000 1995.000000 257500.000000
This le contains data for 7 houses, with 4 attributes and a price for each house. The corresponding matrix X will be 7×5 and Y will be 7×1. (Recall that column 0 of X is all ones.)
Input data format - The rst line will be the word “data”. The second line will be an integer k, giving the number of attributes. The third line will be an ineteger m, giving the number of houses. The next m lines will contain k oating-point numbers, separated by spaces. Each line gives data for a house, not including its price.
For example, a le data.txt might contain:
data
4
2
3.000000 2.500000 3560.000000 1965.000000
2.000000 1.000000 1160.000000 1942.000000
This contains data for 2 houses, with 4 attributes for each house. The corresponding matrix X will be 2×5.
Output format - Your program should output the prices computed for each house in the input data using the weights derived from the training data. Each house price will be printed on a line, rounded to the nearest integer. To print a oating-point number rounded to the nearest integer, use the formatting code %.0f, as in:
printf("%.0f ", price);
Usage - Assuming the les train.txt and data.txt exist in the same directory as learn:
$ ./learn train.txt data.txt
737861
203060
Algorithm - Given matrices X and Y, your program will compute (((X^T)X)^1)(X^T)Y in order to learn W. This will require (1) multiplying, (2) transposing, and (3) inverting matrices. Transposing an m×n matrix produces an n×m matrix. Each row of the X becomes a columnof X^T. To nd the inverse of (X^T)X, you will use a simplied form of Gauss-Jordan elimination.
For example, if the training data includes n houses and has k attributes, this data can be represented as an n×(k + 1) matrix X, where each row corresponds to a house and each column corresponds to an attribute.
Note that the rst column contains 1 for all rows: this corresponds to the weight w0.
Implementation notes - You MUST use double to represent the attributes, weights, and prices. Using float may result in incorrect results due to rounding. To read double values from the training and input data les, you can use fscanf with the format code %lf. If learn successfully completes, it MUST return exit code 0. You MAY assume that the training and input data les are correctly formatted. You MAY assume that the rst argument is a training data le and that the second argument is an input data le. However, checking that the training data le begins with “train” and that the input data le begins with “data” may be helpful if you accidentally give the wrong arguments to learn while you are testing it. To read a string containing up to 5 non-space characters, you can use the fscanf format code %5s. learn SHOULD check that the training and input data les specify the same value for k. If the training or input les do not exist, are not readable, are incorrectly formatted, or specify dierent values of k, learn MAY print “error” and return exit code 1. Your code will not be tested with these scenarios.
Explanation / Answer
learn.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
double ** transpose(double ** m,int rows,int columns);
double ** multiply(double ** m1,double ** m2,int rows1,int rows2,int columns);
double * vmultiply(double ** m,double * v,int rows,int columns);
double ** inverse(double ** m,int rows);
int main(int argc,char ** argv) {
FILE * f1 = NULL; //temp file variables
FILE * f2 = NULL;
FILE * t = NULL;
FILE * d = NULL;
char * filename1 = NULL;
char * filename2 = NULL;
char label[20] = {}; //label of file given at the beginning of the file (data/train)
int columns,rows;
double ** temp; //temp matrix to input data
double ** train; //training data matrix
double ** data; //data matrix
double * prices; //house prices matrix
double * weights; //weight matrix
double ** tr; //holds tranpose of training data
double ** in; //holds inverse of of tranpose x original
double ** r1; //result matrices after being multiplied
double ** r2;
double * r3;
if (argc != 3) {
printf("error ");
exit(0);
}
//get the file names
filename1 = argv[1];
filename2 = argv[2];
//open the files for reading
f1 = fopen(filename1,"r");
f2 = fopen(filename2,"r");
//a file can't be opened
if (f1 == NULL || f2 == NULL) {
printf("error ");
exit(0);
}
fscanf(f1,"%s ",label);
//assign the temp file variables
if ((strncmp(label,"train",5) == 0)) {
t = f1;
d = f2;
} else {
t = f2;
d = f1;
}
//loop through data in file x and form matrix
fscanf(t,"%d ",&columns);
fscanf(t,"%d ",&rows);
columns++; //there are k+1 attributes
//allocate memory for training data matrix
train = (double **) malloc(rows * sizeof(double *));
temp = (double **) malloc(rows * sizeof(double *));
//allocate memory for prices matrix
prices = (double *) malloc(rows * sizeof(double *));
int i;
for (i = 0; i < rows; i++) {
train[i] = (double *) malloc(columns * sizeof(double));
temp[i] = (double *) malloc(columns * sizeof(double));
train[i][0] = 1; //make the first column all 1s
}
//input training data and prices data
int j;
for (i = 0; i < rows; i++) {
for (j = 0; j < columns; j++) {
fscanf(t,"%lf ",&temp[i][j]);
}
fscanf(t," ");
}
//printf("%d %d ",rows,columns);
//first n columns of temp start at n+1 for train
for (i = 0; i < rows; i++) {
for (j = 1; j < columns; j++) {
train[i][j] = temp[i][j-1];
}
}
//last column of temp goes to price matrix
for (i = 0; i < rows; i++) {
prices[i] = temp[i][columns-1];
}
tr = transpose(train,rows,columns);
r1 = multiply(tr,train,columns,rows,columns);
in = inverse(r1,columns);
r2 = multiply(in,tr,columns,columns,columns);
weights = vmultiply(r2,prices,columns,columns);
//now I have to allocate more memory, insert elements from data file and multiply by result of above
fscanf(d,"%s ",label);
//make sure there aren't two training data file
if (strncmp(label,"data",4) != 0) {
printf("error ");
exit(0);
}
fscanf(d,"%d ",&columns);
fscanf(d,"%d ",&rows);
columns++; //there will me k+1 columns
//allocate the memory for the matrix
data = (double **) malloc(rows * sizeof(double *));
//allocate memory for the columns
for (i = 0; i < rows; i++) {
data[i] = (double *) malloc(columns * sizeof(double));
data[i][0] = 1; //make the first column all 1s
}
for (i = 0; i < rows; i++) {
for (j = 1; j < columns; j++) {
fscanf(d,"%lf ",&data[i][j]);
}
fscanf(d," ");
}
r3 = vmultiply(data,weights,rows,columns);
for (i = 0; i < rows; i++) {
printf("%.0f ",r3[i]);
}
return 0;
}
double ** transpose(double ** m,int rows,int columns) {
//allocate memory for transpose matrix
double ** t = (double **) malloc(columns * sizeof(double *));
int i,j;
for (i = 0; i < columns; i++) {
t[i] = (double *) malloc(rows * sizeof(double));
}
for (i = 0; i < columns; i++) {
for (j = 0; j < rows; j++) {
t[i][j] = m[j][i];
}
}
return t;
}
//second rows argument will grab values from both matrices, first one with insert in new matrix
double ** multiply(double ** m1,double ** m2,int rows1,int rows2,int columns) {
double ** t = (double **) malloc(rows1 * sizeof(double *));
int i,j,k;
for (i = 0; i < rows1; i++) {
t[i] = (double *) malloc(columns * sizeof(double));
//make every element 0 to prevent error
for (j = 0; j < columns; j++) {
t[i][j] = 0;
}
}
for (i = 0; i < rows1; i++) {
for (j = 0; j < columns; j++) {
for (k = 0; k < rows2; k++) {
t[i][j] += m1[i][k] * m2[k][j];
}
}
}
return t;
}
//multiply function for a matrix and a vector
double * vmultiply(double ** m,double * v,int rows,int columns) {
double * t = (double *) malloc(rows * sizeof(double));
//make sure every element is preset to 0
int i,j;
for (i = 0; i < rows; i++) {
t[i] = 0;
}
for (i = 0; i < rows; i++) {
for (j = 0; j < columns; j++) {
t[i] += m[i][j] * v[j];
}
}
return t;
}
//since the matrix is a square matrix, only one int parameter needed
double ** inverse(double ** m,int size) {
//initialize the identity matrix that we be the inverse
double ** id = (double **) malloc(size * sizeof(double *));
int i,j,k;
for (i = 0; i < size; i++) {
id[i] = (double *) malloc(size * sizeof(double));
for (j = 0; j < size; j++) {
if (i == j) {
id[i][j] = 1;
} else {
id[i][j] = 0;
}
}
}
for (int i = 0; i < size; i++) {
double rec; //reciprocal value of the pivot
//check if pivot elements are equal to 1
if (m[i][i] != 1) {
rec = 1/m[i][i];
//loop through the rest of the row to adjust
for (j = 0; j < size; j++) {
m[i][j] *= rec;
id[i][j] *= rec; //make the adjustment to the identity matrix
//make sure there are no negatives 0s
if (m[i][j] == -0) m[i][j] = 0;
if (id[i][j] == -0) id[i][j] = 0;
}
}
//check the lower triangle of the matrix for non 0s
for (k = i+1; k < size; k++) {
double f; //factor of pivot value
if (m[k][i] != 0) {
f = m[k][i] * -1;
//complete row operations b/w pivot row and row k
int l;
for (l = 0; l < size; l++) {
m[k][l] += (f*m[i][l]);
id[k][l] += (f*id[i][l]);
//make sure there are no negatives 0s
if (m[k][l] == -0) m[k][l] = 0;
if (id[k][l] == -0) id[k][l] = 0;
}
}
}
}
//iterating the matrix to form lower triangular matrix from size-1 to 0
for (i = size-1; i >= 0; i--) {
double rec; //reciprocal value of the pivot
if (m[i][i] != 1) {
rec = 1/m[i][i];
for (j = 0; j < size; j++) {
m[i][i] *= rec;
id[i][i] *= rec;
if (m[i][j] == -0) m[i][j] = 0;
if (id[i][j] == -0) id[i][j] = 0;
}
}
//check the upper triangle of the matrix for non 0s
for (k = i-1; k >= 0; k--) {
double f;
if (m[k][i] != 0) {
f = m[k][i] * -1;
//complete row operations
int l;
for (l = 0; l < size; l++) {
m[k][l] += (f*m[i][l]);
id[k][l] += (f*id[i][l]);
//make sure there are no negatives 0s
if (m[k][l] == -0) m[k][l] = 0;
if (id[k][l] == -0) id[k][l] = 0;
}
}
}
}
return id;
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.