Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

C programming assignment: Write this in C only. You can also email files to star

ID: 3887362 • Letter: C

Question

C programming assignment: Write this in C only.

You can also email files to starship21[at]g mail[dot]com

Abstract
This is the first in a series of projects that will involve sorting a large amount of data. In this first phase, you will write a simple C program to sort a list of records of movies from imdb alphabetically by the data in a given column. All projects will be done in groups. Group signups will go out later this week. One submission per group.


Introduction
The input file will be a CSV or comma-separated value file, which is a simple way to represent a database table. In a CSV file each line is one record (or row). Commas are separators that denote different types of values per record (columns). Very often the first record (row) of a CSV is a list of value types (column headings):

food,calories,fat
soup,200,12
pie,500,25
celery,-10,0

Your code will be given a CSV as well as a value type (column heading) to sort on. Your code will then need to read in the CSV, sort it on the given value type and output the sorted version of the CSV.

Sorting the CSV above on the 'food' value type:

food,calories,fat
celery,-10,0
pie,500,25
soup,200,12

Sorting the CSV above on the 'calories' value type:

food,calories,fat
celery,-10,0
soup,200,12
pie,500,25



Methodology
a. Parameters
Your code will read the records through (STDIN). Remember, the first record (line) is the column headings and should not be sorted as data. Your code must take in a command-line parameter to determine which value type (column) to sort on. If that parameter is not present (?-> throw an error, or default behavior). The first argument to your program will be '-c' to indicate sorting by column and the second will be the column name:

./sorter -c food

              Be sure to check the arguments are there and that they correspond to a listed value type (column heading) in the CSV.

             
b. Operation
Your code will be reading the CSV to be sorted from STDIN. In order to run your code to test it, you will need to open the CSV and read it in to the STDIN for your code:

cat input.file | ./sorter -c movie_title      

The line above, if entered on the terminal, will open the file "input.file" and read it in to some executing code named "sorter", which was invoked with the parameters "-c" and "movie_title".

Your code's output will be a new CSV file outputted to STDOUT. You should output each record line by line using printf.

For testing purposes you can redirect STDOUT to a file:

cat input.file | ./sorter -c movie_title      > sortedmovies.csv


c. Structure
Your code should use Mergesort to do the actual sorting of records. It is a powerful algorithm with an excellent average case.

It is strongly suggested that you use structs internally to represent each record in the CSV coming in as input.

In order to help you do both of these, please use a user-defined header file named sorter.h. We have attached one to the assignment description with some simple comments in it.
             
If you write any other definitions, typedefs, structs, unions or large helper functions (>~25 lines), be sure to put them in your header file and document them.


Results:
Submit your "sorter.c", "sorter.h" and "mergesort.c" as well as any other source files your header file references.

Document your design, assumptions, difficulties you had and testing procedure. Include any test CSV files you used in your documentation. Be sure to also include a short description of how to use your code. Look at the man pages for suggestions on format and content. Do not neglect your header file. Be sure to describe the contents of it and why you needed them.

_________________________________________

sorter.h

Explanation / Answer

mergesort.c

#include "Sorter.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

FILE *file;

void mergeSortBegin(LL* dlist, char* field){ //takes in a data struct array and a field to sort by
if((dlist->head == NULL) || (dlist->head == dlist->tail)){ //if no nodes or one nodes, already solved
    printf("There are no entries in the csv, cannot sort.");
   return;
}
int n = 0;
int found = 0;
for(n = 0;n < dlist->numfields; n++){ //determines the field to be sorted
   if(strcmp(field,dlist->fields[n]) == 0){ //note: 0 = string, 1 = int, 2 = float
    dlist->sortingfield = n; //determines the field to sort by
    found = 1;
   }
}
if(found == 0){
   printf("Field not found. Please sort by one of the fields in the csv file. ");
}

dlist->sortingtype = dlist->types[dlist->sortingfield];
dlist->head = mergeSort(dlist,dlist->head);

}

Node* mergeSort(LL* dlist, Node* head){//note: String's mergesort
if((head == NULL) || (head->next == NULL)){
   return head;
}
Node* temp = head;
int n = 0;
while(temp != NULL){
   if(strcmp(temp->ndata.fielddata[11], "xXx ") ==0){
//   printf("%s, %d", temp->ndata.fielddata[11],n);

}
temp = temp->next;
n++;
}

Node* mid = split(head);

head = mergeSort(dlist, head);
mid = mergeSort(dlist, mid);

head = merge(dlist,head,mid);

return head;

}

Node* split(Node* head){ //given a head node, returns a pointer to the middle node
if((head == NULL) || (head->next == NULL)){
   return NULL;
}
Node* temp = head->next;
Node* prev = head;
while(temp != NULL){
   temp = temp->next;
   if(temp != NULL){
    prev = prev->next;
    temp = temp->next;
}
}
Node* midpt = prev->next;
prev->next = NULL;
return midpt;
}


Node* merge(LL* dlist, Node* left, Node* right){
if( left == NULL){
   return right;
}
else if(right == NULL){
   return left;
}
if(dlist->sortingtype == 0){ //if sorting a string
  
    //alphabetic sorting method
    /*
    char* leftstr = strdup(left->ndata.fielddata[dlist->sortingfield]);
    int n = 0;
    for(n = 0;n < strlen(leftstr);n++){
     leftstr[n] = tolower(leftstr[n]);
    }
    char* rightstr = strdup(right->ndata.fielddata[dlist->sortingfield]);
  
    for(n = 0;n < strlen(rightstr);n++){
     rightstr[n] = tolower(rightstr[n]);
    }
    */
  
    char* leftstr = left->ndata.fielddata[dlist->sortingfield];
    char* rightstr = right->ndata.fielddata[dlist->sortingfield];
     
    int cmp = strcmp(leftstr,rightstr);
    Node* final = NULL;
    if(cmp <= 0){
     final = left;
     final->next = merge(dlist,left->next,right);
    }
    else{
      final = right;
      final->next = merge(dlist,left,right->next);
    }
    return final;
}

}

Sorter.c

#include "Sorter.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mergesort.c"

FILE *file;


void printData(LL* dlist, data cdata){ //cmov = currentdata, prints the contents of a data struct
     int n=0;
     for(n = 0;n < dlist->numfields;n++){
       printf("%s: %s ",dlist->fields[n], cdata.fielddata[n]);
     }
}

void printTypes(LL* dlist){
int n = 0;
for(n = 0; n < dlist->numfields; n++){
   printf("%s type:", dlist->fields[n]);
   if(dlist->types[n] == 0){
   printf("String ");
   }
   else if(dlist->types[n] == 1){
    printf("Long ");
   }
   else{
    printf("Float ");
   }
}
}


void addTail(LL* dlist, data cdata){ //takes a linked list and a data, adds the data to a node and puts it at the end of the linked list
   Node* datanode = (Node*) malloc(sizeof(Node)); //allocate space for a node
   datanode->next = NULL;
   datanode->ndata = cdata; //set the new node's ndata to cdata
if((dlist->head == NULL) && (dlist->tail == NULL)){ //empty list case
   dlist->head = datanode;
   dlist->tail = datanode;
   dlist->count++;
}
else if((dlist->head == dlist->tail)){ //single node case
   dlist->tail = datanode;
   dlist->head->next = dlist->tail;
   dlist->count++;
}
else{// every other case
   dlist->tail->next = datanode;
   dlist->tail = datanode;
   dlist->count++;
}
}

void Finish(LL* dlist){ //frees everything in a LL
Node* temp = dlist->head;
int n = 0;
while(temp!= NULL){
   char** ftemp = temp->ndata.fielddata;
   for(n = 0;n < dlist->numfields;n++){
    free(ftemp[n]);
   }
   free(ftemp);
Node* oldtemp = temp;
temp = temp->next;
free(oldtemp);
}
for(n = 0;n < dlist->numfields;n++){
   free(dlist->fields[n]);
}

free(dlist->fields);
free(dlist->types);
free(dlist);
}

void initializeList(LL* dlist, char* fields){ //initializes the numfields, types, and fields in the linked list
char* temp = strdup(fields);
char* currvar = (char*) strtok(temp, ",");
int totalfields = 0; //counts total fields, determined by the number of tokens found
while(currvar != 0){ //discovers total number of fields
    totalfields++;
    currvar = (char*) strtok(NULL,",");
  
    }
dlist->numfields = totalfields;
dlist->types = (int*) malloc(sizeof(int)*dlist->numfields); //allocates fields for keeping track of what type each field is, to be used once all data is received
dlist->fields = (char**) malloc(sizeof(char*)*dlist->numfields); //keeps track of the name of the fields, to be used later when sorting
free(temp);
temp = strdup(fields);
currvar = (char*) strtok(temp, ",");
    int n;
    for(n = 0;n < dlist->numfields;n++){
      if(n == dlist->numfields-1){ //strips the newline character for the last field
   int specchar = strcspn(currvar, " "); //find the location of the first or the first
   if(specchar != 0){ //if the location isn't the start
   currvar[specchar] = ''; //remove the special characters
   }
      }
     dlist->fields[n] = strdup(currvar);
     currvar = (char*) strtok(NULL,",");
    }
free(temp);
free(currvar);
}

void initializeListTypes(LL* dlist){ //determines what type of data belongs to what field
if(dlist->head == NULL){
return;
}
Node* temp = dlist->head;
int n = 0;
for(n;n < dlist->numfields;n++){
   char* tempstr = temp->ndata.fielddata[n];
   char* endptr;
   long lg = 0;
   float fl = 0.0;
   if(n == dlist->numfields-1){ //if this field is the last field on the line, remove and
       int specchar = strcspn(tempstr, " "); //find the location of the first or the first
   if(specchar != 0){ //if the location isn't the start
   tempstr[specchar] = ''; //remove the special characters
   }
   }
   if(strchr(tempstr, '.')){ //if it has a '.', it's a string or a float/double
       fl = strtof(tempstr, &endptr);
       if((endptr == tempstr) || (endptr != (tempstr+strlen(tempstr)))){ //if endptr == tempstr, no floats at the start, the other conditional checks to make sure that the end of the read in float is the end of the string inputted
   dlist->types[n] = 0;
       }
       else{
   dlist->types[n] = 2;
       }
   }
   else{ //if it has no '.', it's either an int or a string
     lg = strtol(tempstr, &endptr, 10);
     if((endptr == tempstr) || (endptr != (tempstr+strlen(tempstr)))){ //if endptr == tempstr, no floats at the start, the other conditional checks to make sure that the end of the read in float is the end of the string inputted
   dlist->types[n] = 0;
       }
       else{
   dlist->types[n] = 1;
       }
   
}
}

}


void export(LL* dlist){
int n = 0;
for(n = 0;n < dlist->numfields;n++){
   if(n != dlist->numfields-1){
   printf("%s,",dlist->fields[n]);
   }
   else{
    printf("%s ",dlist->fields[n]);
   }
}
Node* temp = dlist->head;
while(temp != NULL){
   if(temp->ndata.comma == NULL){ //if there's no commas in the fields

    for(n = 0;n < dlist->numfields;n++){
      if(n != dlist->numfields-1){
   printf("%s,",temp->ndata.fielddata[n]);
   }
      else{
   printf("%s ",temp->ndata.fielddata[n]);
   }
      }
   }
   else{
   
    for(n = 0;n < dlist->numfields;n++){
      if(temp->ndata.comma[n] == 0){
    
      if(n != dlist->numfields-1){
   printf("%s,",temp->ndata.fielddata[n]);
   }
      else{
   printf("%s ",temp->ndata.fielddata[n]);
   }
      }
      else{
   if(n != dlist->numfields-1){
   printf(""%s",",temp->ndata.fielddata[n]);
   }
      else{
   printf(""%s" ",temp->ndata.fielddata[n]);
   }
  
      }
      }
   
   
   }


temp = temp->next;
}
}

int main(int argc, char *argv[]){
//Constructs an array containing all of the data data into cdata
file = stdin;
LL* dlist = (LL*) malloc(sizeof(LL)); //malloc a data linked list to which data nodes will be added
char* test = (char*) malloc(sizeof(char)*1000); //for getting data from the file
memset(test, 0 , sizeof(char)*1000);
//i should make sure there's no problem when there's more than 1000 chars
fgets(test,1000,file); //skips the column line
initializeList(dlist, test);

char c;
int currentvar=0;
int count = 0;
int place = 0;
int quote = 0;
data cdatanode = {};
cdatanode.fielddata = (char**) malloc(sizeof(char*)*dlist->numfields);
cdatanode.comma = NULL;
memset(test, 0 , sizeof(char)*1000);
while((c = fgetc(file)) != EOF){
    /*
    if(c < 0 || c > 128){
      continue;
    }
    */
    if((c == ',' || c == ' ') && (quote == 0)){
      cdatanode.fielddata[currentvar] = strdup(test); //copy test to cdatanode STRDUP IS MALLOC, REMEMBER TO FREE
      if(c == ' '){
      addTail(dlist, cdatanode);
      count++;
      currentvar = 0;
      place = 0;
      memset(test, 0 , sizeof(char)*1000);
      cdatanode.fielddata = (char**) malloc(sizeof(char*)*dlist->numfields); //get new char array for cdatanode
      cdatanode.comma = NULL;
    }
    else{
      currentvar++;
      place = 0;
      memset(test, 0 , sizeof(char)*1000);
    }
    
    }
    else if(c == '"'){
      if(quote == 0){
   //insert comma handling here, somehow
   cdatanode.comma = (int*) malloc(sizeof(int)*dlist->numfields);
   memset(cdatanode.comma,0,sizeof(sizeof(int)*dlist->numfields));
   cdatanode.comma[currentvar] = 1;
   quote = 1;
      }
      else{
   quote = 0;
      }
    }
    else{
      test[place] = c;
      place++;
    }
}
free(cdatanode.fielddata);
free(test);


//printData(dlist,dlist->head->ndata);
//printData(dlist,dlist->head->next->ndata);
//printData(dlist,dlist->tail->ndata);
initializeListTypes(dlist);
//printTypes(dlist);

//mergeSort(dlist, dlist->fields[25]);

int n = 0;
Node* temp = dlist->head;


if(argc >= 3){
    if(strcmp("-c", argv[1]) == 0){
mergeSortBegin(dlist, argv[2]);
    }
}

export(dlist); //exports the dlist into csv
// printData(dlist, split(dlist,dlist->head)->ndata);
// Finish(dlist); //frees everything in dlist
}

Sorter.h


//Suggestion: define a struct that mirrors a record (row) of the data set
#ifndef _SORTER_H_
#define _SORTER_H_

typedef struct _data { //THIS DATA STRUCT IS CURRENTLY BAD, PUT DATA IN NODE
char** fielddata; //2d array to contain string types
int* comma; //does the node have a comma in the field n? 0 = no, 1 = yes
//might want to make the data types a linked list also?
//or make a LL to keep track when creating, then place into array??
} data;

typedef struct _Node {
data ndata; //data in a node

//char** data; use this instead of a data struct
struct _Node* next;
}Node;

typedef struct _LL { //linked list that contains the nodes with the movies
Node* head;
Node* tail;
int count; //count of total nodes in list
int numfields; //count of the total fields in the csv
int sortingfield; //keeps track of the field to be sorted, where sortingfield = # field sorted
int sortingtype; //keeps track of what type is being sorted, 0 = string, 1 = int, 2 = float
int* types; //keeps track of types, 0 = string, 1 = int, 2 = float
char** fields; //keeps track of the name fields
}LL;

//Suggestion: prototype a mergesort function
LL* mergesortBegin(LL* dlist, char* field);
Node* mergeSort(LL* dlist, Node* head);
Node* split(Node* head);
Node* merge(LL* dlist, Node* left, Node* right);
#endif

main.c

#include "Sorter.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(int argc, char* argv[]){
char tempstr[] = "12";
char* endptr;
float fl = 0.0;
long in = 0;
     if(strchr(tempstr, '.')){ //if it has a '.', it's a string or a float/double
       fl = strtof(tempstr, &endptr);
       if((endptr == tempstr) || (endptr != (tempstr+strlen(tempstr)))){ //if endptr == tempstr, no floats at the start, the other conditional checks to make sure that the end of the read in float is the end of the string inputted
   printf("no float ");
       }
       else{
   printf("%f ", fl);
       }
  
     //use strtof
   }
   else{ //if it has no '.', it's either an int or a string
     in = strtol(tempstr, &endptr, 10);
     if((endptr == tempstr) || (endptr != (tempstr+strlen(tempstr)))){ //if endptr == tempstr, no floats at the start, the other conditional checks to make sure that the end of the read in float is the end of the string inputted
   printf("no int ");
       }
       else{
   printf("%ld ", in);
       }
   
   
   }

}