Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The program requirement is: Part A Write a multithread C program to count the fr

ID: 3768256 • Letter: T

Question

The program requirement is:

Part A

Write a multithread C program to count the frequency of words in a text file. In the program, the main thread should get the input file name from the command line arguments, open the file, do necessary global variable initialization, and perform necessary error checking during these steps. The main thread then creates a child thread to read each word from the text file. If the word has not appeared before, the child thread should access one global variable to store this word and set its frequency to 1. If the word has appeared before, the child thread should access the global variable to increase its frequency by 1. After the entire file is processed, the main thread then access the global variable and output the word-frequency data on the screen. The output should be one word each line, in the form of "word frequency". And all the words should be output alphabetically. For example, assume the compiled program is named as "a.out":

./a.out test2.txt a 3 and 6 can 3 file 3 have 3 is 6 it 9 large 3 line 3 lines 3 multiple 3 one 3 or 3 single 3 test 3 very 3 where the content of the input text file is It is a "test" file, and it is very large. And it can have one single line or multiple lines. It is a "test" file, and it is very large. And it can have one single line or multiple lines. It is a "test" file, and it is very large. And it can have one single line or multiple lines. Your program need not distinguish the upper and lower cases, e.g., "we" and "We" are deemed as the same word. And different forms of a word can be treated differently, e.g., "cat" and "cats" are treated as two different words, as well as "take", "took" and "taken" are treated as three different words. Since both the main thread and the created child thread may access the file and the global variables, you may need to use mutex and/or other mechanisms to avoid the race conditions. The entire txt file may be very large and not able to be held in the main memory. For error checking, you need to consider whether the number of command line arguments is correct and whether the input text file can be opened. For example, assume the compiled program is named as "b.out": ./b.out Usage: ./b.out ./b.out input.txt output.txt Usage: ./b.out ./b.out input.txt Input file input.txt cannot be opened. The output messages for error checking should follow the format in the above examples. If you use any dynamic memory management, you need to free the dynamically allocated memory space before your program terminates. Hint: 1. You may need to use string manipulation functions such as strlen(), strcpy(), strncpy(), strcmp(), strstr(), etc. by include "string.h" in your code. 2. You may want to identify and divide the whole job into different smaller and easy-to-handle tasks (such as error checking, reading a word from the file, changing all letters in a word into lower case, removing punctuation, updating the word frequency, etc.) and implement them in different functions. 3. You may want to refer to the codes of your programming assignments, adopting and modifying part of them to be used in your project (e.g., create, maintain and update linked list). 4. You may want to start from the one thread version, where all the tasks are done within the main thread. Then create the child thread and move tasks to it. 5. You may assume the maximum length of a word is 512.

Part B

Extend the program in the Part A to create multiple children threads. The number of children threads should be input as another command line argument. For example, assume the compiled program is named as "b.out":

./b.out test2.txt 10 a 3 and 6 can 3 file 3 have 3 is 6 it 9 large 3 line 3 lines 3 multiple 3 one 3 or 3 single 3 test 3 very 3 ./b.out Usage: ./b.out ./b.out input.txt output.txt Usage: ./b.out %s ./b.out input.txt 8 Input file input.txt cannot be opened. Since all the created children threads will access the text file to read words and access one the variable to update word frequency, you may need to use mutex and/or other mechanisms to avoid the race conditions. Hint: You may want to refer to the codes of your programming assignments, adopting and modifying part of them to be used in your project (e.g., how to use mutex to avoid the race conditions).

Extend the program in the Part B to further improve the concurrency and parallelism.
In particular, instead of letting each child thread reads word by word from the input
file, the child thread now reads several lines (which we call a block) into its local
buffer each time. It then reads and processes words from its local buffer, and when
the block in the local buffer is processed, it read another block from the input file. In
addition, during updating the word frequency, for each child thread, instead of
directly updating a global variable, it now temporarily stores and updates the word
frequency in its local variable, and then merge its local results into the global variable
before it terminates.

The size of a block (i.e., the number of lines each time a child thread reads from the
input file) should be input as another command line argument. For example, assume
the compiled program is named as "c.out":
./c.out test2.txt 10 5
a 3
and 6
can 3
file 3
have 3
is 6
it 9
large 3
line 3
lines 3
multiple 3
one 3
or 3
single 3
test 3
very 3
./c.out
Usage: ./c.out <input text file name> <number of threads> <number
of lines per block>
./c.out input.txt output.txt
Usage: ./c.out %s <input text file name> <number of threads> <number
of lines per block>
./c.out input.txt 8
Usage: ./c.out %s <input text file name> <number of threads> <number
of lines per block>
Hint:
You may assume the maximum number of words per line is 1024.

A version of code is :

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
struct thread_data
{
FILE *fp;
long int offset;
int start;
int blockSize;
};
int words=0;
void *countFrequency(void* data)
{
struct thread_data* td=data;
char *buffer = malloc(td->blockSize);
int i,c;
i=0;c=0;
enum states { WHITESPACE, WORD };
int state = WHITESPACE;
fseek(td->fp, td->offset, td->start);
char last = ' ';
while ((fread(buffer, td->blockSize, 1, td->fp))==1)
{
if ( buffer[0]== ' ' || buffer[0] == ' ' )
{
state = WHITESPACE;
}
else if (buffer[0]==' ')
{
state = WHITESPACE;
}
else
{
if ( state == WHITESPACE )
{
words++;
}
state = WORD;
}
last = buffer[0];
}
free(buffer);
pthread_exit(NULL);
return NULL;
}
int main(int argc, char **argv)
{
int nthreads, x, id, blockSize,len;
FILE *fp;
pthread_t *threads;
struct thread_data data[nthreads];
if (argc < 2)
{
fprintf(stderr, "Usage: ./a.out <file_path>");
exit(-1);
}
if((fp=fopen(argv[1],"r"))==NULL)
{
printf("Error opening file");
exit(-1);
}
printf("Enter the number of threads: ");
scanf("%d",&nthreads);
threads = malloc(nthreads*sizeof(pthread_t));
fseek(fp, 0, SEEK_END);
len = ftell(fp);
printf("len= %d ",len);
blockSize=(len+nthreads-1)/nthreads;
printf("size= %d ",blockSize);
for(id = 0; id < nthreads; id++)
{
data[id].fp=fp;
data[id].offset = blockSize;
data[id].start = id*blockSize+1;
}
data[nthreads-1].start=(nthreads-1)*blockSize+1;
for(id = 0; id < nthreads; id++)
pthread_create(&threads[id], NULL, &countFrequency,&data[id]);
for(id = 0; id < nthreads; id++)
pthread_join(threads[id],NULL);
fclose(fp);
free(threads);
pthread_exit(NULL);
printf("%d ",words);
return 0;
}

After running it, it shows:
Enter the number of threads: 10
len= 286
size= 29
Segmentation fault

which is not the correct output in the question.

May I ask how to fix that? Also I think there are three compiled file according to the question. Thank you!

Explanation / Answer

#include #include #include #include #include #include #define threads 2 #define MAX 100 //structs for holding word buffer and thread information typedef struct partitions { int partStart; int ID; int result; char buffer2[MAX]; }; //array of structs to be used on different threads struct partitions globals[threads]; //the word counting thread DWORD WINAPI ThreadProc(LPVOID lpParam){ int count = 0; //set working struct and part of buffer where search for words starts struct partitions *temp = (struct partitions *)lpParam; count = globals[temp->ID].partStart; //algorithm for finding words based on spaces while (temp->buffer2[count] != '!' && temp->buffer2[count] != '') { if (temp->buffer2[count] == ' ' || temp->buffer2[count] == ' ' || temp->buffer2[count] == ' '){ globals[temp->ID].result += 1; } count++; } return 0; } int main(int argc, char *argv[]) { FILE *open; open = fopen(argv[1], "r"); DWORD ThreadId[threads]; HANDLE hThread[threads]; char buffer[MAX] = { 0 }; buffer[100] = '!'; for (int i = 0; i