In this project you will implement and explore the performance of different kind
ID: 3573015 • Letter: I
Question
In this project you will implement and explore the performance of different kinds of spell checkers. I have provided a text file containing about 110,000 English words on eCourseware for you to use. You can assume that all words contain only lower-case letters. Approach 1: Binary Search Tree One easy way of implementing a spell checker is to use a binary search tree. Given a list of valid words, you can insert them into a BST. Checking whether a word is spelled correctly simply involves searching the tree for that word. If the word is found, great – if not, it must be misspelled! Remember that finding items in a BST takes O(log n) time on average, where n is the number of elements in the tree. Thus, the expected time cost of using this spell checker depends on the number of words that it’s storing. 1. (14 pts) Write a BSTSpellChecker class. This class should have an instance variable of type BinarySearchTree to store its word list. You can use the BinarySearchTree code that we wrote in lecture. BSTSpellChecker should contain the following methods: add(String s) Adds the specified word to the tree. contains(String s) Returns whether the specified word is in the tree. A method that reads a list of words from a file and adds them all into the tree. Add the words to the tree in the same order that they appear in the file (i.e., alphabetically). (Note – you will probably have to use non-recursive versions of the BST add and find methods to prevent stack overflows here!) d. Remember that adding things in order into a BST is a bad idea – this will result in a linked list with O(n) performance. Write another version of the above method that adds the words in a balanced manner. You can do this by first adding the word in the middle of the file, then recursively adding the words in the middle of the left and right halves, and so on. A good spell checker should also give some suggested corrections for words that are misspelled. Some common techniques of doing this are to take the misspelled word and see if any valid words can be formed by: Swapping any pair of adjacent characters in the word. Inserting a new character at any position in the word. (For a word containing n characters, there should be n + 1 insertions tested.) Deleting a character at any position in the word. Replacing the character at each position in the word with a new character. Inserting a space between any pair of adjacent characters in the word, to see if two smaller valid words can be formed. Of course, not all results from these transformations should be included as suggested corrections – only those that form valid words! Each result must be tested to see if it’s contained within the spell checker’s collection of words. 2. (12 pts) Within your BSTSpellChecker class, add a method closeMatches(String s) that returns a java.util.Set of valid words that are obtained by applying these transformations to the string s. Approach 2: Hash Table As we discussed in class, hash tables allow (in theory) the “holy grail” of O(1) performance for adding and retrieving items from the table. Since every add and get operation requires hashing, the hash function plays a critical role in the performance of a hash table. A hash function is a poor choice if it takes too long to compute, or if it results in frequent collisions. In this part you’ll explore the relative performance of a few different string hash functions. (8 pts) Write a HashTable class that supports the following operations: add(String newItem) Adds the specified newItem to the hash table. get(String someItem) Finds and returns the specified someItem from the table if it exists. If someItem doesn’t exist in the table, the method should return null. This HashTable class should resolve collisions via chaining. You can use either java.util.ArrayList or java.util.LinkedList for the lists at each table index. Use an initial table length of 5, and rehash the table as soon as the load factor reaches 1.0. The load factor is computed the same way as in a hash table with open addressing: load factor = (number of elements in the table) / (length of the data array). Rehashing should change the table length to 2*(old length) + 1. HashTable should also include a constructor that allows you to specify a StringHasher object indicating what type of hash function to use (more on that in the next part). (Note: You can use the HashMapChained class that we wrote in lecture as a starting point. However, note that HashTable is a little simpler – it’s designed to store only strings, and there is no separate value associated with each string key.) (8 pts) Write a StringHasher interface that contains a single method, int hash(String s). This interface should be implemented by three different classes, each of which defines a different string hash function: The first hash function is very simple – it returns 0 all the time. This makes it a good candidate for the Worst Hash Function EverTM, since collisions are guaranteed! The second hash function sums the ASCII values of all characters in the string to determine the hash. This gives a decent chance that different strings will hash to different values, but it still does not consider the placement of characters within the string. Different strings that are permutations of each other – for example, “mean” vs. “name” – will hash to the same value. The third hash function improves the second by also taking the location of each character into consideration. One popular way of doing this is the “djb2” hash function, which you can find C code for here: http://www.cse.yorku.ca/~oz/hash.html. pseudocode: function hash(string s): hash = 5381 for each character c in s: hash = hash*33 + c return hash C syntax is very close to Java, but here’s the algorithm in 5. (8 pts) Write a HashTableSpellChecker class. This class should have an instance of HashTable to store its word list. It should have the same functionality as your BSTSpellChecker class from previously. Specifically, HashTableSpellChecker should support these operations: add(String s) Adds the specified word to the spell checker. contains(String s) Returns whether the specified word is in the spell checker. A method that reads a list of words from a file and adds them all to the spell checker. closeMatches(String s) Returns a java.util.Set of valid words that are obtained by applying the previously mentioned transformations to the string s. I recommend creating an abstract superclass for both BSTSpellChecker and HashTableSpellChecker to save yourself some coding! Both the file reading and closeMatches methods can be placed in this superclass. Putting It All Together 6. (10 pts) Write a client program that allows the user to pick one of your five spell checkers (the unbalanced BST, the balanced BST, or any of the three hash table versions). The program should create a spell checker object of that type and then allow the user to enter any number of sentences to check. For each sentence, show the misspelled words and a list of suggested corrections for each one. Also have your program display some statistics when it creates the spell checker object: The time required to read the word list (For hash tables) The number of hash collisions that occurred during spell checker creation (For hash tables) The percentage of indices in the final table that are occupied (15 pts) are set aside for coding style and documentation.
Explanation / Answer
Hope this will help you:
class BSTSpellChecker
{
public static void main(String[] args) throws IOException {
System.out.println("Spell Checker Application");
System.out.println("--------------------------------");
Scanner scan = new Scanner(System.in);
String inputFileName,dictionaryName;
System.out.println("What is the input file for spelling check?");
inputFileName = scan.nextLine();
System.out.println("What is the dictionary file?");
dictionaryName = scan.nextLine();
Scanner d = new Scanner(new File(dictionaryName));
ArrayList<String> wordsAll = new ArrayList<String>();
int lines = 0;
String line = null;
BufferedReader reader = new BufferedReader(new FileReader(inputFileName));
while((line = reader.readLine()) != null){
wordsAll.add(line);
lines++;
}
ArrayList<String> wordsAllFinal = new ArrayList<String>();
int length = 0;
for(int i = 0; i<lines; i++){
String sentence = wordsAll.get(i);
String[] tokens = sentence.split(" ");
for(String word : tokens){
wordsAllFinal.add(word);
}
}
BinarySearchTree tree = new BinarySearchTree();
String inp;
for(int o = 0; o < wordsAllFinal.size(); o++) {
inp = wordsAllFinal.get(o);
tree.insert(inp);
}
System.out.println(wordsAllFinal.get(2));
int words =0, misspelled=0;
ArrayList<String> miss = new ArrayList<String>();
Scanner in = new Scanner(new File(inputFileName));
while(in.hasNext()){
for(int i=0; i<wordsAllFinal.size(); i++){
words++;
if(tree.retrieve(wordsAllFinal.get(i))==null){
misspelled++;
miss.add(wordsAllFinal.get(i));
}
}
}
in.close();
System.out.println("Spell checking ...done");
System.out.println("Possible misspelled words in the file:");
for(int i=0;i<miss.size();i++){
System.out.println(miss.get(i));
}
System.out.println("Total number of words is "+ words);
System.out.println("Possible number of misspelled words is "+ misspelled);
scan.close();
}
}
Using_Hash_table:
class SpellChecker {
private static String stringInput; // input to check;
private static String[] checkThis; // the stringInput turned array of words to check.
public static HashSet dictionary; // the dictionary used
/**
* Main method.
* @param args Argh!
*/
public static void main(String[] args) {
setup();
}//end of main
/**
* This method loads the dictionary and initiates the checks for errors in a scanned input.
*/
public static void setup(){
int tableSIZE=59000;
dictionary = new HashSet(tableSIZE);
try {
//System.out.print(System.getProperty("user.dir"));//just to find user's working directory;
// I combined FileReader into the BufferReader statement
//the file is located in edu.frostburg.cosc310
BufferedReader bufferedReader = new BufferedReader(new FileReader("./dictionary.txt"));
String line = null; // notes one line at a time
while((line = bufferedReader.readLine()) != null) {
dictionary.add(line);//add dictinary word in
}
prompt();
bufferedReader.close(); //close file
}
catch(FileNotFoundException ex) {
ex.printStackTrace();//print error
}
catch(IOException ex) {
ex.printStackTrace();//print error
}
}//end of setUp
/**
* Just a prompt for auto generated tests or manual input test.
*/
public static void prompt(){
System.out.println("Type a number from below: ");
System.out.println("1. Auto Generate Test 2.Manual Input 3.Exit");
Scanner theLine = new Scanner(System.in);
int choice = theLine.nextInt(); // for manual input
if(choice==1) autoTest();
else if(choice==2) startwInput();
else if (choice==3) System.exit(0);
else System.out.println("Invalid Input. Exiting.");
}
/**
* Manual input of sentence or words.
*/
public static void startwInput(){
//printDictionary(bufferedReader); // print dictionary
System.out.println("Spell Checker by C. Austria Please enter text to check: ");
Scanner theLine = new Scanner(System.in);
stringInput = theLine.nextLine(); // for manual input
System.out.print(" You have entered this text: "+stringInput+" Initiating Check...");
/*------------------------------------------------------------------------------------------------------------*/
//final long startTime = System.currentTimeMillis(); //speed test
WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
splitString(removePunctuation(stringInput));//turn String line to String[]
grammarNazi.initialCheck(checkThis);
//final long endTime = System.currentTimeMillis();
//System.out.println("Total execution time: " + (endTime - startTime) );
}//end of startwInput
/**
* Generates a testing case.
*/
public static void autoTest(){
System.out.println("Spell Checker by C. Austria This sentence is being tested: The dog foud my hom. And m ct hisse xdgfchv!@# ");
WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
splitString(removePunctuation("The dog foud my hom. And m ct hisse xdgfchv!@# "));//turn String line to String[]
grammarNazi.initialCheck(checkThis);
}//end of autoTest
/**
* This method prints the entire dictionary.
* Was used in testing.
* @param bufferedReader the dictionary file
*/
public static void printDictionary(BufferedReader bufferedReader){
String line = null; // notes one line at a time
try{
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}catch(FileNotFoundException ex) {
ex.printStackTrace();//print error
}
catch(IOException ex) {
ex.printStackTrace();//print error
}
}//end of printDictionary
/**
* This methods splits the passed String and puts them into a String[]
* @param sentence The sentence that needs editing.
*/
public static void splitString(String sentence){
// split the sentence in between " " aka spaces
checkThis = sentence.split(" ");
}//end of splitString
/**
* This method removes the punctuation and capitalization from a string.
* @param sentence The sentence that needs editing.
* @return the edited sentence.
*/
public static String removePunctuation(String sentence){
String newSentence; // the new sentence
//remove evil punctuation and convert the whole line to lowercase
newSentence = sentence.toLowerCase().replaceAll("[^a-zA-Z\s]", "").replaceAll("\s+", " ");
return newSentence;
}//end of removePunctuation
}
This class checks for misspellings
public class WordFinder extends SpellChecker{
private int wordsLength;//length of String[] to check
private List<String> wrongWords = new ArrayList<String>();//stores incorrect words
/**
* This methods checks the String[] for spelling errors.
* Hashes each index in the String[] to see if it is in the dictionary HashSet
* @param words String list of misspelled words to check
*/
public void initialCheck(String[] words){
wordsLength=words.length;
System.out.println();
for(int i=0;i<wordsLength;i++){
//System.out.println("What I'm checking: "+words[i]); //test only
if(!dictionary.contains(words[i])) wrongWords.add(words[i]);
} //end for
//manualWordLookup(); //for testing dictionary only
if (!wrongWords.isEmpty()) {
System.out.println("Mistakes have been made!");
printIncorrect();
} //end if
if (wrongWords.isEmpty()) {
System.out.println(" Move along. End of Program.");
} //end if
}//end of initialCheck
/**
* This method that prints the incorrect words in a String[] being checked and generates suggestions.
*/
public void printIncorrect(){//delete this guy
System.out.print("These words [ ");
for (String wrongWord : wrongWords) {
System.out.print(wrongWord + " ");
}//end of for
System.out.println("]seems incorrect. ");
suggest();
}//end of printIncorrect
/**
* This method gives suggestions to the user based on the wrong words she/he misspelled.
*/
public void suggest(){
MisSpell test = new MisSpell();
while(!wrongWords.isEmpty()&&test.possibilities.size()<=5){
String wordCheck=wrongWords.remove(0);
test.generateMispellings(wordCheck);
//if the possibilities size is greater than 0 then print suggestions
if(test.possibilities.size()>=0) test.print(test.possibilities);
}//end of while
}//end of suggest
/*ENTERING TEST ZONE*/
/**
* This allows a tester to look thorough the dictionary for words if they are valid; and for testing only.
*/
public void manualWordLookup(){
System.out.print("Enter 'ext' to exit. ");
Scanner line = new Scanner(System.in);
String look=line.nextLine();
do{
if(dictionary.contains(look)) System.out.print(look+" is valid ");
else System.out.print(look+" is invalid ");
look=line.nextLine();
}while (!look.equals("ext"));
}//end of manualWordLookup
}
/**
* This is the main class responsible for generating misspellings.
* @author Catherine Austria
*/
public class MisSpell extends SpellChecker{
public List<String> possibilities = new ArrayList<String>();//stores possible suggestions
private List<String> tempHolder = new ArrayList<String>(); //telps for the transposition method
private int Ldistance=0; // the distance related to the two words
private String wrongWord;// the original wrong word.
/**
* Execute methods that make misspellings.
* @param wordCheck the word being checked.
*/
public void generateMispellings(String wordCheck){
wrongWord=wordCheck;
try{
concatFL(wordCheck);
concatLL(wordCheck);
replaceFL(wordCheck);
replaceLL(wordCheck);
deleteFL(wordCheck);
deleteLL(wordCheck);
pluralize(wordCheck);
transposition(wordCheck);
}catch(StringIndexOutOfBoundsException e){
System.out.println();
}catch(ArrayIndexOutOfBoundsException e){
System.out.println();
}
}
/**
* This method concats the word behind each of the alphabet letters and checks if it is in the dictionary.
* FL for first letter
* @param word the word being manipulated.
*/
public void concatFL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord+=cur;
//if the word is in the dictionary then add it to the possibilities list
tempWord=tempWord.concat(word); //add passed String to end of tempWord
checkDict(tempWord); //check to see if in dictionary
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of concatFL
/**
* This concatenates the alphabet letters behind each of the word and checks if it is in the dictionary. LL for last letter.
* @param word the word being manipulated.
*/
public void concatLL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=123;i>97;i--){
cur=(char)i;//assign ASCII from index i value
tempWord=tempWord.concat(word); //add passed String to end of tempWord
tempWord+=cur;
//if the word is in the dictionary then add it to the possibilities list
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of concatLL
/**
* This method replaces the first letter (FL) of a word with alphabet letters.
* @param word the word being manipulated.
*/
public void replaceFL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord=cur+word.substring(1,word.length()); //add the ascii of i ad the substring of the word from index 1 till the word's last index
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of replaceFL
/**
* This method replaces the last letter (LL) of a word with alphabet letters
* @param word the word being manipulated.
*/
public void replaceLL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord=word.substring(0,word.length()-1)+cur; //add the ascii of i ad the substring of the word from index 1 till the word's last index
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of replaceLL
/**
* This deletes first letter and sees if it is in dictionary
* @param word the word being manipulated.
*/
public void deleteFL(String word){
String tempWord=word.substring(1,word.length()-1); // stores temp made up word
checkDict(tempWord);
//print(possibilities);
}//end of deleteFL
/**
* This deletes last letter and sees if it is in dictionary
* @param word the word being manipulated.
*/
public void deleteLL(String word){
String tempWord=word.substring(0,word.length()-1); // stores temp made up word
checkDict(tempWord);
//print(possibilities);
}//end of deleteLL
/**
* This method pluralizes a word input
* @param word the word being manipulated.
*/
public void pluralize(String word){
String tempWord=word+"s";
checkDict(tempWord);
}//end of pluralize
/**
* It's purpose is to check a word if it is in the dictionary.
* If it is, then add it to the possibilities list.
* @param word the word being checked.
*/
public void checkDict(String word){
if(dictionary.contains(word)){//check to see if tempWord is in dictionary
//if the tempWord IS in the dictionary, then check if it is in the possibilities list
//then if tempWord IS NOT in the list, then add tempWord to list
if(!possibilities.contains(word)) possibilities.add(word);
}
}//end of checkDict
/**
* This method transposes letters of a word into different places.
* Not the best implementation. This guy was my last minute addition.
* @param word the word being manipulated.
*/
public void transposition(String word){
wrongWord=word;
int wordLen=word.length();
String[] mixer = new String[wordLen]; //String[] length of the passed word
//make word into String[]
for(int i=0;i<wordLen;i++){
mixer [i]=word.substring(i,i+1);
}
shift(mixer);
}//end of transposition
/**
* This method takes a string[] list then shifts the value in between
* the elements in the list and checks if in dictionary, adds if so.
* I agree that this is probably the brute force implementation.
* @param mixer the String array being shifted around.
*/
public void shift(String[] mixer){
System.out.println();
String wordValue="";
for(int i=0;i<=tempHolder.size();i++){
resetHelper(tempHolder);//reset the helper
transposeHelper(mixer);//fill tempHolder
String wordFirstValue=tempHolder.remove(i);//remove value at index in tempHolder
for(int j=0;j<tempHolder.size();j++){
int inttemp=0;
String temp;
while(inttemp<j){
temp=tempHolder.remove(inttemp);
tempHolder.add(temp);
wordValue+=wordFirstValue+printWord(tempHolder);
inttemp++;
if(dictionary.contains(wordValue)) if(!possibilities.contains(wordValue)) possibilities.add(wordValue);
wordValue="";
}//end of while
}//end of for
}//end for
}//end of shift
/**
* This method fills a list tempHolder with contents from String[]
* @param wordMix the String array being shifted around.
*/
public void transposeHelper(String[] wordMix){
for(int i=0;i<wordMix.length;i++){
tempHolder.add(wordMix[i]);
}
}//end of transposeHelper
/**
* This resets a list
* @param thisList removes the content of a list
*/
public void resetHelper(List<String> thisList){
while(!thisList.isEmpty()) thisList.remove(0); //while list is not empty, remove first value
}//end of resetHelper
/**
* This method prints out a list
* @param listPrint the list to print out.
*/
public void print(List<String> listPrint){
if (possibilities.isEmpty()) {
System.out.print("Can't seem to find any related words for "+wrongWord);
return;
}
System.out.println("Maybe you meant these for "+wrongWord+": ");
System.out.printf("%s", listPrint);
resetHelper(possibilities);
}//end of print
/**
* This returns a String word version of a list
* @param listPrint the list to make into a word.
* @return the generated word version of a list.
*/
public String printWord(List<String> listPrint){
Object[] suggests = listPrint.toArray();
String theWord="";
for(Object word: suggests){//form listPrint elements into a word
theWord+=word;
}
return theWord;
}//end of printWord
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.