write a C++ program that reads an input file of text and builds a concordance of
ID: 669251 • Letter: W
Question
write a C++ program that reads an input file of text and builds a concordance of the words in the file. The program will also report the number of distinct words in the file. Also have the program seperated into a header file, class implementation file and a main file.
A class will contain the implementation of a concordance type as a linked list; each node of a list/concordance will contain a word and a count of the word's appearances in the input text, and these entries will be ordered alphabetically by their words.
Any sequence of letters is a word, and all non-letter characters, including apostrophes, hyphens, and digits, are separators, equivalent to white space. One or more non-letters may separate words, and ends of lines separate words. Differences in capitalization do not make words different; that is, "HOUSE," "house," and "HouSe" are three instances of the same word.
Words may be any length in the input file, but the program should consider only their first eight characters. Thus, "manipulated" and "manipulation" are two instances of the word "manipula."
INPUT
The user will enter the name of the input file from the terminal. Any file of text is legitimate input for this program. Note that the program should be able to process its own source files without crashing. (These make interesting tests.)
OUTPUT
The program's output is a concordance of the words in the input file: a table of the words in alphabetical order accompanied by the number of times each word appears in the input text, as well as the number of distinct words in the text. The program prints this output to the terminal.
EXAMPLE
If an input file is this:
then the corresponding output might look something like this:
OTHER REQUIREMENTS
A class will implement a concordance abstract data type. Within the class, a concordance will be represented by a linked list whose links are pointers. Each node will contain a word, a count for that word, and a pointer, and the list will be ordered alphabetically by its words, in this way:
In addition to the required constructor and destructor, implement only the following operations in this class:
insert(word) - Inserts word in the invoking concordance in the correct position. If the word is already in the concordance, increment its count.
get_count(word) - Returns the count associated with word in the invoking concordance. This function returns zero if word is not in the concordance.
length() - Returns the length of the invoking concordance; that is, the number of distinct words that it lists.
Printing - Overloads the "<<" operator to print the invoking concordance to an output stream.
get_node(word,count,link) - Returns a pointer to a new node that contains word, count, and the pointer link. This function will be private.
Represent words using arrays of characters, as described below.
HINTS
This program will read its input from a file whose name the user enters.
C++ assumes that strings will be represented in arrays of characters. In an array of characters, the characters in a string occupy an initial segment of the array, and the null character '', marks the end of the string, though it is not itself a character in the string. Note that the null character is not the same as the pointer value NULL. An empty string---a string of length zero---is represented by an array of characters whose first position holds the null character. If you want to represent strings of a particular maximum length, as we do in this project, any arrays that will hold such strings must be one character longer than that maximum, to accomodate the terminating null character.
The library cstring contains several useful functions that operate on strings represented in this way. Note in particular:
strcpy(), which copies the contents of one string to another
strcmp(), which compares two strings alphabetically
strlen(), which returns the length of a string
Define the maximum word length as a constant in the Concordance class's header file, but before the definition section of the class. Define the word type in the definition section, and note that in the client program, it will be referred to like this: Concordance::Word. The definition file can begin like this:
The nodes in the linked list hold a word, a count, and a pointer to the next node. The class's only data member is a pointer to such a node; for example:
In the client program, build words one character at a time using a function called, say, read_word(). If the length of a word (in the input) is greater than eight, then this function will save only the first eight letters, but it will have to continue reading letters until it reaches the end of the input word; the program will discard letters beyond the first eight in each word. Don't forget to skip leading non-letters before a word begins, and terminate each word (string) with the null character.
The function strcmp(), which compares two strings, does so according to the ASCII values of their characters. To compare alphabetically, be sure that all characters are capitalized consistently: all upper-case or all lower-case.
Explanation / Answer
The file limits.h associated with a C++ compiler defines constants
which give the ranges for each type of integer in that implementation. For
example the number of bits used to represent the type int is given by a
constant WORD_BIT, which is set to 16 in the Borland C++ compiler. This
value could be used to allow a program to avoid producing erroneous results
caused by integer overflow. For example consider the following function
which evaluates 2n and tests that the value of n cannot produce an integer
overflow. The function return value is used to indicate success or failure of
the operation and in the case of success the value is returned via a reference
parameter.
#include <limits.h>
int power2(int n, int& power)
// Returns True if n is such that two
// to the power n is within the range
// of the type int in the implementation
// then the function returns true and
// returns the result in power. Else
// the function returns false.
// The largest power of two that can
// be represented in the type int is
// two to the power WORD_BIT-2.
{
int prod = 1,i;
if (n < WORD_BIT-1 && n >= 0)
{
for (i=0; i<n; i++) prod *= 2;
power = prod;
return 1;
}
else return 0;
}
power.cpp Integer constants are normally held as being of type int. However if an integer constant is outside the range of int then it will be held as a long integer. Long integer constants can also be written with an L appended. For example 1234567L represents the number 1,234,567. It is worth noting that unsigned int quantities obey the rules of arithmetic to the modulo 2n where n is the number of bits used in the representation. This means that if 16 bits are used in the representation then an unsigned int has the range 0–216 1 (65535). Thus all positive values are represented by the remainder when the value is divided by 65536 and to represent a negative value multiples of 65536 are added until a value between 0 and 65535 is obtained. Thus 65530 + 10 added as unsigned values gives 4 as a result. This means that great care must be taken when working with unsigned values. The following piece of C++ when run on a compiler that uses 2 bytes to represent an int int u = 10; int v = -17; int sum; unsigned int usum; sum = u + v; usum = u + v; cout << "the int sum of 10 and -17 is " << sum << endl; cout << "the unsigned int sum of 10 and -17 is " << usum << endl would output: the int sum of 10 and -17 is -7 the unsigned int sum of 10 and -17 is 65529 When -7 is assigned to an unsigned int then 65536 is added to it to bring it into the range 0–65535, giving the result 65529. Using unsigned quantities can very easily lead to errors in programs unless they are never subtracted from one another, and one can guarantee that their magnitude will stay in range. Their use is not recommended. Integer overflow occurs when an integer takes a value outside the appropriate limits. A compiler will usually detect any integer overflow at compiletime when using integer constants. However at run-time integer overflow will not be reported and the consequences are unpredictable. Processing may continue with the integer truncated to fit or the program may exit. Portability problems may also occur with programs which are written for a compiler that uses 4 bytes for the type int when transferred to a compiler that only uses 2 bytes for an int. These problems will only occur if values larger than 32767 are used. If this is so and portability is a concern then it is probably best to use the type long int instead.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.