Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

This is in C. Consider this file of text: The Mason–Dixon Line (or Mason and Dix

ID: 3864581 • Letter: T

Question

This is in C.

Consider this file of text:

The Mason–Dixon Line (or Mason and Dixon's Line)

was surveyed between

1763 and 1767 by Charles Mason and Jeremiah Dixon

in the resolution

of a border dispute between British colonies in

Colonial America. It forms

a demarcation line among four U.S. states, forming part

of the borders of

Pennsylvania, Maryland, Delaware, and West Virginia

(then part of Virginia). In popular usage, especially

since the Missouri Compromise of 1820 (apparently the

first official use of the term "Mason's and Dixon's Line"),

the Mason–Dixon Line symbolizes a cultural boundary between

the Northeastern United States and the Southern United

States (Dixie).

The job of part one of the word processor is to read text from a file, reformat the text to make it look nice, and place the new text in another file. The output from the above file might look like this:

The Mason–Dixon Line (or Mason and Dixon's Line) was surveyed between 1763 and 1767 by Charles Mason and Jeremiah Dixon in the resolution of a border dispute between British colonies in Colonial America. It forms a demarcation line among four U.S. states, forming part of the borders of Pennsylvania, Maryland, Delaware, and West Virginia (then part of Virginia). In popular usage, especially since the Missouri Compromise of 1820 (apparently the first official use of the term "Mason's and Dixon's Line"), the Mason–Dixon Line symbolizes a cultural boundary between the Northeastern United States and the Southern United States (Dixie).

Notice that the output does not have to be “justified” on the right. The idea is that we keep each line under a certain number of characters, and move words as necessary to ensure that this requirement is met. Note that there is NO space after the last word on each line.

The program based on arguments passed on the command line. For example:

      $ ./wordproc 60 data

The first command line argument (60) indicates that all of the output text can be no more than 60 characters long per line (including the spaces, not including new line). You can assume that this is a “reasonable” number and that we will not need to worry about a wild value such as 10. Let’s stipulate that the number is >= 25 and <= 100.

The second command line argument is the name of the file with the input text. You are to open this file and use the data that is in this file, but create a new output file with the input name and “.out”. So the above invocation would read “data” and produce “data.out”.

The program should validate that there are two input parameters and that the first is a number within the range mentioned above. The program should provide a reasonable error message, and terminate, if the input file does not exist.

The general algorithm is something like this:

      Initialize the output line to nothing

While ( read a line of text from the input file )

      /* you can either fgets it and use strtok or similar, */

      /* or use fscanf and %s for the format.               */

      {

            For each word

                  Find the length of the word

                  If ( that length + length of the output line < limit )

                        Append the word to the output line

                  Else

                        Write out the output line

                        Copy the word into the output line

      }

      Print out anything remaining in output line

     

In addition the program will produce a second file; using the above example the filename would be “data.words”. This file will contain a sorted list of all words which were encountered in the file. A part of the file for the above input might be as below. Also you do not have to account for special cases so (apparently and apparently would not be counted as the same word and that is ok.

(apparently - 1

1763 - 1

1767 - 1

1820 – 1

America. - 1

a - 3

among - 1

and - 6

between - 3

and so on. Notice that the number of occurrences of each word is printed after the word. Also notice that we have a very simplistic notion of what a “word” is, and that these may include the punctuation. That’s fine.

In order to perform this function we want you to create an array of pointers, and to expand this array as needed using the “realloc” function. Shown below is an example of how this works, (using some built-in words instead of words that are read from the file). It’ll give you the general idea of how to use “realloc”:

You may use the below code and make changes or use your own.

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int main()

{

    char **wordlist = malloc(sizeof(char*)); /* pointer to pointer to words */

    int i, numwords = 0;

    char *testwords[] = { "hello", "there", "folks", "in", "1840" };

    wordlist[0] = strdup( testwords[0]);

    /* This will show how you can add words to the array of pointers */

    for( i = 1; i < sizeof( testwords ) / sizeof( testwords[ 0 ] ); i++ )

    {

        /* Expand our array of pointers by one pointer's worth */

        wordlist = realloc( wordlist, (numwords + 1) * sizeof( char * ) );

        /* Make a duplicate of the word and save the pointer to it */

        wordlist[ numwords ] = strdup( testwords[ i ] );

        numwords++;

    }

    printf( "Added %d words to the array and they are: ", numwords );

    for( i = 0; i < numwords; i++ )

        printf( "%s ", wordlist[ i ] );

    return( 0 );

}

The variable “wordlist” needs some explanation. Remember that pointers and arrays are nearly the same thing. We are allocating an array of pointers to strings, but since this is an array that we’re allocating, it is a “pointer to a bunch of pointers to character”.

Use the “qsort” function to sort the array. You will need to supply a comparison function for this. I’ll let you investigate.

Explanation / Answer

Here is the code with required comments for the above question. The output files are also shown for the input. Please do rate the answer if it helped you. Your feedback is valuable. Thanks

/*Program to format an input file into output file where
* output file has each line whose line is not exceeding specified limit
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//function used by qsort for sorting the list of words
int compare(void *s1,void *s2)
{
   char **a = (char **)s1; //cast the pointer type appropriately
   char **b = (char **)s2;

   return strcmp(*a, *b); //use strcmp for string comparison

}
int main(int argc,char *argv[])
{
   char output[101],word[50],*current; //an output line that will be written to output file
   FILE *infile,*outfile;//pointers to input and output files
   int i,limit,numofwords=0,count=0;
   char outfilename[100];
   char **wordlist=malloc(sizeof(char*));

   /*check there are 2 argumets*/
   if(argc!=3)
   {
       printf(" Usage <executable_name> <limit_num> <input_file>");
       exit(1);
   }

   limit=atoi(argv[1]); //convert the 1st argument to int

   sprintf(outfilename,"%s.out",argv[2]); //create the output filename by appending .out

   //open input and output files
   infile=fopen(argv[2],"r");
   outfile=fopen(outfilename,"w");

   if(infile==NULL) //check if we could not open the input file
   {
       printf(" Input file %s could not be opened ",argv[2]);
       exit(1);
   }

   if(outfile==NULL) //check if we could not open the input file
   {
       printf(" Output file %s could not be created ",outfilename);
       exit(1);
   }

   output[0]='';
   while(!feof(infile)) //till its not end of input file
   {
       fscanf(infile,"%s",word);
       if(strlen(word)+strlen(output)<limit) //if the word can fit in the line within limit
       {
           strcat(output," "); //append space and then the word
           strcat(output,word);
       }
       else
       {
           fprintf(outfile,"%s ",output); //write out the line and copy the word as the first word
           strcpy(output,word);
       }

       //add the word to list
       wordlist=realloc(wordlist,(numofwords+1)*sizeof(char*));
       wordlist[numofwords]=strdup(word);
       numofwords++;
   }
   //write out any pending line
   fprintf(outfile,"%s",output);


   //close both files
   fclose(infile);
   fclose(outfile);

   //now create the 2nd output file
   sprintf(outfilename,"%s.words",argv[2]); //create the output filename by appending .out
   outfile=fopen(outfilename,"w");
   if(outfile==NULL) //check if we could not open the input file
   {
       printf(" Output file %s could not be created ",outfilename);
       exit(1);
   }

   //now sort the wordlist using the our compare function
   qsort(wordlist,numofwords,sizeof(char*),compare);

   //now since the list is sorted, same words appear in sequence. just count
   //the number of times it appears and write out to file
   current=wordlist[0];
   count=1;
   for(i=1;i<numofwords;i++)
   {
       if(strcmp(current,wordlist[i])!=0) //a differnet word from previous one
       {
           fprintf(outfile,"%s - %d ",current,count);
           count=1; //reset count for the new word
           current=wordlist[i];
       }
       else
           count++;
   }
   //write out any pending data
   fprintf(outfile,"%s-%d",current,count);

   fclose(outfile);

   return 0;

}

input file contents

The Mason–Dixon Line (or Mason and Dixon's Line)
was surveyed between
1763 and 1767 by Charles Mason and Jeremiah Dixon
in the resolution
of a border dispute between British colonies in
Colonial America. It forms
a demarcation line among four U.S. states, forming part
of the borders of
Pennsylvania, Maryland, Delaware, and West Virginia
(then part of Virginia). In popular usage, especially
since the Missouri Compromise of 1820 (apparently the
first official use of the term "Mason's and Dixon's Line"),
the Mason–Dixon Line symbolizes a cultural boundary between
the Northeastern United States and the Southern United
States (Dixie).

contents of dat.out file (output for executing with arguments 100 data)

The Mason–Dixon Line (or Mason and Dixon's Line) was surveyed between 1763 and 1767 by Charles
Mason and Jeremiah Dixon in the resolution of a border dispute between British colonies in Colonial
America. It forms a demarcation line among four U.S. states, forming part of the borders of
Pennsylvania, Maryland, Delaware, and West Virginia (then part of Virginia). In popular usage,
especially since the Missouri Compromise of 1820 (apparently the first official use of the term
"Mason's and Dixon's Line"), the Mason–Dixon Line symbolizes a cultural boundary between the
Northeastern United States and the Southern United States (Dixie).

contents of output file data.words

"Mason's-1
(Dixie).-1
(apparently-1
(or-1
(then-1
1763-1
1767-1
1820-1
America.-1
British-1
Charles-1
Colonial-1
Compromise-1
Delaware,-1
Dixon-1
Dixon's-2
In-1
It-1
Jeremiah-1
Line-2
Line"),-1
Line)-1
Maryland,-1
Mason-2
Mason–Dixon-2
Missouri-1
Northeastern-1
Pennsylvania,-1
Southern-1
States-2
The-1
U.S.-1
United-2
Virginia-1
Virginia).-1
West-1
a-3
among-1
and-6
between-3
border-1
borders-1
boundary-1
by-1
colonies-1
cultural-1
demarcation-1
dispute-1
especially-1
first-1
forming-1
forms-1
four-1
in-2
line-1
of-6
official-1
part-2
popular-1
resolution-1
since-1
states,-1
surveyed-1
symbolizes-1
term-1
the-8
usage,-1
use-1
was-1

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote