There is one test file in the dropbox link called HungerGames_edit.txt that cont
ID: 3850556 • Letter: T
Question
There is one test file in the dropbox link called HungerGames_edit.txt that contain the full text from Hunger Games Book 1(LINK: https://www.dropbox.com/sh/y07ivem7umm9e54/AAD7fRMU6SL9yYAfp_uP70u2a?dl=0). We have pre-processed the file to remove all punctuation and down-cased all words.
Your program needs to read in the .txt file, with the name of the file.Your program needs to store the unique words found in the file in a dynamically allocated array and calculate and output the following information:
- The top n words (n is also a user input) and the number of times each word was found
- The total number of unique words in the file
- The total number of words in the file
- The number of array doublings needed to store all unique words in the file
Your program needs to have two inputs– the first input is the name of the file to open and read, and the second input is the number of most frequent words to output.
Inputs HungerGames_edit.txt and 10 would return the 10 most common words in the file HungerGames_edit.txt and should produce the following results:
682 - is
492 - peeta
479 - its
431 - im
427 - can
414 - says
379 - him
368 - when
367 - no
356 - are
#
Array doubled: 7
#
Unique non-common words: 7682
#
Total non-common words: 59157
Use an array of structs to store the words and their counts
There are specific requirements for how your program needs to be implemented. For this assignment, you need to use an array of structs to store the words and their counts. The members of the struct are left to you, but keep it as simple as possible.
Exclude these top 50 common words from your word counting
The attached image shows the 50 most common words in the English language. In your code, exclude these words from the words you count in the .txt file. Your code should include a separate function to determine if the current word read from the .txt file is on this list and only process the word if it is not. (To make it easier: string wordsCommon[]={"the","you","one","be","do","all","to","at","would","of","this","there","and","but","their","a","his","what","in","by","so","that","from","up","have","they","out","i","we","if","it","say","about","for","her","who","not","she","get","on","or","which","with","an","go","he","will","me","as","my"} )
Use the array-doubling algorithm to increase the size of your array.
We don’t know a head of time how many unique words either of these files has, so you don’t know how big the array should be.Start with an array size of 100, and double the size as words are read in from the file and the array fills up with new words. Use dynamic memory allocation to create your array, copy the values from the current array into the new array, and then free the memory used for the current array.
Note: some of you might wonder why we’re not using C++ Vectors for this assignment. A vector is an interface to a dynamically allocated array that uses array doubling to increase its size. In this assignment, you’re doing what happens behind-the-scenes with a Vector.
Create a WordAnalysis class
All of the functionality above should be included in methods in a Word Analysis class. The header file for the class, called WordAnalysis.h is also available at the end.
You need to implement the methods defined in the header exactly as they are defined. You can test that your cpp code works by building the files and running the executable. g++ -std=c++11 WordAnalysis .cpp -o WordAnalysis The following methods and variables are defined in WordAnalysis.h.
//////// WordAnalysis.h /////////
struct word {
int count;
std::string w;
};
class WordAnalysis{
private:
//stores the number of times the array has been doubled in the program
int timesDoubled;
//stores the array of words. Memory will be dynamically allocated
word *words;
/*current size of the array. When you double the array, wordCount = wordCount * 2 to double the size */
int wordCount;
//how many unique words found
int index;
//call this method when you want to double the array and add the new word to the array. The new word is the input to the method.
void doubleArrayAndAdd(std::string);
//call this method to check if a word is in the common word list
bool checkIfCommonWord(std::string);
//call this method to sort the words array
void sortData();
public:
//call this method to open a file and read in the data. The filename is the argument
bool readDataFile(char*); //returns an error if file not opened
//returns index * count for each word
int getWordCount();
//returns index variable
int getUniqueWordCount();
//returns timesDoubled variable
int getArrayDoubling();
//call this method to print the common words. The argument is the number of words to print
void printCommonWords(int);
//call this method to print the final output of your program
void printResult(int);
//constructor. The argument is the initial size of the array
WordAnalysis(int);
~WordAnalysis();
};
Explanation / Answer
main.cpp
#include <iostream>
#include "WordAnalysis.h"
using namespace std;
// main expects 1 argument, filename
int main(int argc, char* argv[])
{
char* filename = argv[1];
int n = atoi(argv[2]);
WordAnalysis wa(100);
if(wa.readDataFile(filename)) {
//print common words
wa.printCommonWords(n);
cout << "#" << endl;
//timesDoubled returned from WordAnalysis
cout << "Array doubled: "<< wa.getArrayDoubling() << endl;
// display the output
cout << "#" << endl;
//nonCommonWords returned from WordAnalysis
cout << "Unique non-common words: " << wa.getUniqueWordCount() << endl;
cout << "#" << endl;
//totalWords returned from WordAnalysis
cout << "Total non-common words: " << wa.getWordCount() << endl;
// free the final array
}
return 0;
}
WordAnalysis.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <stdlib.h>
#include "WordAnalysis.h"
using namespace std;
// CONSTRUCTOR & DESTRUCTOR
WordAnalysis::WordAnalysis(int n){
WordAnalysis::wordCount = n;
WordAnalysis::words = new word[n];
WordAnalysis::index = 0;
WordAnalysis::timesDoubled = 0;
}
WordAnalysis::~WordAnalysis(){
delete []WordAnalysis::words;
}
// PRIVATE FUNCTIONS
void WordAnalysis::doubleArrayAndAdd(std::string w){
if (WordAnalysis::index == WordAnalysis::wordCount){ //array needs doubling
int j=0;
word *array_doubled = new word[WordAnalysis::wordCount * 2];
for(j=0; j<WordAnalysis::wordCount; j++) {
array_doubled[j] = WordAnalysis::words[j];
}
delete []WordAnalysis::words;
WordAnalysis::words = array_doubled; //officiate doubling
WordAnalysis::words[WordAnalysis::index].w = w;
WordAnalysis::words[WordAnalysis::index].count = 1; //add word
}
}
bool WordAnalysis::checkIfCommonWord(std::string w){
string commonWordList[] = {"the", "be", "to", "of", "and",
"a", "in", "that", "have", "i", "it", "for", "not",
"on", "with", "he", "as", "you", "do", "at", "this",
"but", "his", "by", "from", "they", "we", "say", "her",
"she", "or", "an", "will", "my", "one", "all", "would",
"there", "their", "what", "so", "up", "out", "if",
"about", "who", "get", "which", "go", "me"
};
int i;
for(i=0; i<50; i++) {
if(w.compare(commonWordList[i]) == 0)
return 1; //common word
}
return 0; // unique word
}
void WordAnalysis::sortData(){ //array needs to be sorted from most to least common unique words
int i, j;
word swapWord;
//bubble sort
for (i=0; i < WordAnalysis::index; i++) {
for (j=0; j < WordAnalysis::index; j++) {
if(WordAnalysis::words[j].count < WordAnalysis::words[j+1].count) {
swapWord = WordAnalysis::words[j];
WordAnalysis::words[j] = WordAnalysis::words[j+1];
WordAnalysis::words[j+1] = swapWord;
}
}
}
}
// PUBLIC FUNCTIONS
bool WordAnalysis::readDataFile(char *f){ //basically the 'old main'
//returns an error if file not opened
ifstream listIn;
string line;
string wordIn;
int gate, i;
listIn.open(f);
//listIn.open("HungerGames_edit.txt");
if (listIn.good()) {
while(getline(listIn, line)) {
stringstream ss(line);
while(ss >> wordIn) {
if (!WordAnalysis::checkIfCommonWord(wordIn)) {
//word is not in common words
gate = 0;
for(i=0; i<WordAnalysis::wordCount; i++) {
if(wordIn.compare(WordAnalysis::words[i].w) == 0) {
(WordAnalysis::words[i].count)++;
// word exists in array and just needs to be counted
gate = 1;
}
}
if (gate == 0) {
//word is added to WordAnalysis::words after checking (and possibly doubling) the size
if(WordAnalysis::index >= WordAnalysis::wordCount) {
WordAnalysis::doubleArrayAndAdd(wordIn);
WordAnalysis::timesDoubled++;
WordAnalysis::wordCount*=2;
}
else {
WordAnalysis::words[WordAnalysis::index].w = wordIn;
WordAnalysis::words[WordAnalysis::index].count = 1; // if doubling is not needed, just add the word
}
WordAnalysis::index++;
}
}
}
}
return 1;
}
else {
cout << "error - file not opened" << endl;
return 0;
}
}
int WordAnalysis::getWordCount(){
//accounts for each word counted in word array
int totalWords;
for (int i=0; i < WordAnalysis::wordCount; i++) {
totalWords = totalWords + WordAnalysis::words[i].count;
}
return totalWords;
}
int WordAnalysis::getUniqueWordCount(){
return WordAnalysis::index;
}
int WordAnalysis::getArrayDoubling(){
return WordAnalysis::timesDoubled;
}
void WordAnalysis::printCommonWords(int N){
WordAnalysis::sortData(); //words array should be sorted
for(int i=0; i < N; i++) {
cout << WordAnalysis::words[i].count << " - " << WordAnalysis::words[i].w << endl;
}
}
WordAnalysis.h
#include <string>
using namespace std;
struct word {
int count;
std::string w;
};
class WordAnalysis{
private:
int timesDoubled;
word *words;
int wordCount;
int index;
void doubleArrayAndAdd(std::string);
bool checkIfCommonWord(std::string);
void sortData();
public:
bool readDataFile(char*); //returns an error if file not opened
int getWordCount();
int getUniqueWordCount();
int getArrayDoubling();
void printCommonWords(int);
WordAnalysis(int);
~WordAnalysis();
};
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.