Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

please consider this as two separate codes. one for part A and another for partB

ID: 3766531 • Letter: P

Question

please consider this as two separate codes. one for part A and another for partB.

Part A

Write a multithread C program to count the frequency of words in a text file. In the program, the main thread should get the input file name from the command line arguments, open the file, do necessary global variable initialization, and perform necessary error checking during these steps. The main thread then creates a child thread to read each word from the text file. If the word has not appeared before, the child thread should access one global variable to store this word and set its frequency to 1. If the word has appeared before, the child thread should access the global variable to increase its frequency by 1. After the entire file is processed, the main thread then access the global variable and output the word-frequency data on the screen. The output should be one word each line, in the form of "word frequency". And all the words should be output alphabetically. For example, assume the compiled program is named as "a.out":

./a.out test2.txt a 3 and 6 can 3 file 3 have 3 is 6 it 9 large 3 line 3 lines 3 multiple 3 one 3 or 3 single 3 test 3 very 3

where the content of the input text file is

It is a "test" file, and it is very large. And it can have one single line or multiple lines.

It is a "test" file, and it is very large. And it can have one single line or multiple lines.

It is a "test" file, and it is very large. And it can have one single line or multiple lines.

Your program need not distinguish the upper and lower cases, e.g., "we" and "We" are deemed as the same word. And different forms of a word can be treated differently, e.g., "cat" and "cats" are treated as two different words, as well as "take", "took" and "taken" are treated as three different words. Since both the main thread and the created child thread may access the file and the global variables, you may need to use mutex and/or other mechanisms to avoid the race conditions. The entire txt file may be very large and not able to be held in the main memory. For error checking, you need to consider whether the number of command line arguments is correct and whether the input text file can be opened. For example, assume the compiled program is named as "b.out":

./b.out Usage: ./b.out ./b.out input.txt output.txt

Usage: ./b.out ./b.out input.txt Input file input.txt cannot be opened.

The output messages for error checking should follow the format in the above examples. If you use any dynamic memory management, you need to free the dynamically allocated memory space before your program terminates.

Hint: 1. You may need to use string manipulation functions such as strlen(), strcpy(), strncpy(), strcmp(), strstr(), etc. by include "string.h" in your code.

2. You may want to identify and divide the whole job into different smaller and easy-to-handle tasks (such as error checking, reading a word from the file, changing all letters in a word into lower case, removing punctuation, updating the word frequency, etc.) and implement them in different functions.

3. You may want to refer to the codes of your programming assignments, adopting and modifying part of them to be used in your project (e.g., create, maintain and update linked list). 4. You may want to start from the one thread version, where all the tasks are done within the main thread. Then create the child thread and move tasks to it. 5. You may assume the maximum length of a word is 512.

Part B

Extend the program in the Part A to create multiple children threads. The number of children threads should be input as another command line argument. For example, assume the compiled program is named as "b.out":

./b.out test2.txt 10 a 3 and 6 can 3 file 3 have 3 is 6 it 9 large 3 line 3 lines 3 multiple 3 one 3 or 3 single 3 test 3 very 3

./b.out Usage: ./b.out ./b.out input.txt output.txt

Usage: ./b.out %s ./b.out input.txt 8 Input file input.txt cannot be opened.

Since all the created children threads will access the text file to read words and access one the variable to update word frequency, you may need to use mutex and/or other mechanisms to avoid the race conditions. Hint: You may want to refer to the codes of your programming assignments, adopting and modifying part of them to be used in your project (e.g., how to use mutex to avoid the race conditions).

Explanation / Answer

Write a multithread C program to count the frequency of words in a text file

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>


struct thread_data{
    FILE *fp;
    long int offset;
    int start;
    int blockSize;
};

int words=0;

void *countFrequency(void* data){

    struct thread_data* td=data;
    char *buffer = malloc(td->blockSize);

    int i,c;
    i=0;c=0;
    enum states { WHITESPACE, WORD };
    int state = WHITESPACE;

    fseek(td->fp, td->offset, td->start);

        char last = ' ';
        while ((fread(buffer, td->blockSize, 1, td->fp))==1){

            if ( buffer[0]== ' ' || buffer[0] == ' ' ){
            state = WHITESPACE;
            }
            else if (buffer[0]==' '){
            //newLine++;
                state = WHITESPACE;
            }
            else {
                if ( state == WHITESPACE ){
                    words++;
                }
                state = WORD;
            }
            last = buffer[0];
    }
    free(buffer);

    pthread_exit(NULL);

    return NULL;
}

int main(int argc, char **argv){

    int nthreads, x, id, blockSize,len;
    //void *state;
    FILE *fp;
    pthread_t *threads;

    struct thread_data data[nthreads];

    if (argc < 2){
        fprintf(stderr, "Usage: ./a.out <file_path>");
        exit(-1);
    }

    if((fp=fopen(argv[1],"r"))==NULL){
        printf("Error opening file");
        exit(-1);
    }

    printf("Enter the number of threads: ");
    scanf("%d",&nthreads);
    threads = malloc(nthreads*sizeof(pthread_t));

    fseek(fp, 0, SEEK_END);
    len = ftell(fp);
    printf("len= %d ",len);

    blockSize=(len+nthreads-1)/nthreads;
    printf("size= %d ",blockSize);

    for(id = 0; id < nthreads; id++){

        data[id].fp=fp;
        data[id].offset = blockSize;
        data[id].start = id*blockSize+1;

        }
        //LAST THREAD
        data[nthreads-1].start=(nthreads-1)*blockSize+1;

        for(id = 0; id < nthreads; id++)
            pthread_create(&threads[id], NULL, &countFrequency,&data[id]);

    for(id = 0; id < nthreads; id++)
        pthread_join(threads[id],NULL);

    fclose(fp);
    //free(threads);

    //pthread_exit(NULL);

    printf("%d ",words);
    return 0;
}

#include <string.h>
char *stpcpy (char *dest,const char *src)
int strcmp(char *string1,const char *string2)
char *strcpy(char *string1,const char *string2)
char *strerror(int errnum)
size_t strlen(const char *string)
char *strncat(char *string1, char *string2, size_t n)
int strncmp(char *string1, char *string2, size_t n)
char *strncpy(char *string1,const char *string2, size_t n)
int strcasecmp(const char *s1, const char *s2)
int strncasecmp(const char *s1, const char *s2, int n)

char *str1 = "HELLO";
char str2[10];
char *ans;
int length;
char *t1;
char src[SIZE],dest[SIZE];
int isrc[SIZE],idest[SIZE];


length = strlen("HELLO");
(void) strcpy(str2,str1);
ans = strpbrk(str1,'aeiou');
ans = strchr(str1,'l');
ans = strstr(str1,'lo');

for ( t1 = strtok(str1," ");
      t1 != NULL;
      t1 = strtok(NULL, " ") )
    
printf("%s ",t1);

memcpy(dest,src, SIZE);
memcpy(idest,isrc, SIZE*sizeof(int));

}