For this problem you must use Minipedia.java from the code distribution (which c
ID: 3691951 • Letter: F
Question
For this problem you must use Minipedia.java from the code distribution (which currently is implemented using the DumbList.java data structure to search for titles, insert, and delete articles) as a template to write a program MiniGoogle.java, which will allow for searching the text of an article using keywords, using Cosine Similarity (see previous). The steps in this process are as follows:
To find articles most similar to a search phrase P:
Use the ArticleTable iterator methods to enumerate all the articles in the ArticleTable;
For each article, compare the body of the article with the search phrase P (as if they are two documents), by first preprocessing the strings (described below), then using the TermFrequencyTable class to calculate the Cosine Similarity (calling it once for each article to compare with the search phrase);
Insert any articles with a cosine similarity greater than 0.001 into a binary heap (implementing a max priority queue).
Call getMax() for this heap to print out the top three matching articles; if no articles are found, print out "No matching articles found!" but if only 1 or 2 are found, simply print them out.
In order to do this, you should write the following methods.
Note that you will construct a TermFrequencyTable for each article that you compare with the search phrase.
For the heap, you may use the code from the class web site, modifying to use the Article (or your own) class instead of integers. Remember that you should only insert articles with a cosine similarity greater than 0.001 with the search phrase into the heap.
Create your heap as a file MaxHeap.java.
Explanation / Answer
import java.util.*;
public class MiniGoogle
{
private static Article[] getArticleList(DatabaseIterator db)
{
int count = db.getNumArticles();
Article[] list = new Article[count];
for(int i = 0; i < count; ++i)
list[i] = db.next();
return list;
}
private static DatabaseIterator setupDatabase(String path)
{
return new DatabaseIterator(path);
}
private static void addArticle(Scanner s, ArticleTable T)
{
System.out.println();
System.out.println("Add an article");
System.out.println("==============");
System.out.print("Enter article title: ");
String title = s.nextLine();
System.out.println("You Can now enter the body of the article.");
System.out.println("Press return two times when you are done.");
String body = "";
String line = "";
do
{
line = s.nextLine();
body += line + " ";
}
while (!line.equals(""));
T.insert(new Article(title, body));
}
private static void removeArticle(Scanner s, ArticleTable T)
{
System.out.println();
System.out.println("Remove an article");
System.out.println("=================");
System.out.print("Enter article title: ");
String title = s.nextLine();
T.delete(title);
}
private static void search(Scanner s, ArticleTable T)
{
System.out.println();
System.out.println("Search by search phrase");
System.out.println("=======================");
System.out.print("Enter search phrase: ");
String phrase = s.nextLine();
T.reset();
Article a = null;
double cos;
MaxHeap h = new MaxHeap();
while(T.hasNext())
{
a = T.next();
cos = getCosineSimilarity(phrase, a.getBody());
if(cos != 0.0)
h.insert(cos, a);
}
//h.printHeap();
System.out.println();
if(h.isEmpty())
{
System.out.println("No articles found!");
//return;
}
else
{
System.out.println("Top match: " + h.getMaxAsString() + " ");
for(int i = 2; i <= 3; i++) {
if(!h.isEmpty())
System.out.println("Hit #" + i + ": " + h.getMaxAsString() + " ");
else {
System.out.println("no more articles found!");
break;
}
}
}
System.out.println("Press return when finished reading.");
s.nextLine();
}
private static double getCosineSimilarity(String s, String t)
{
TermFrequencyTable termTbl = new TermFrequencyTable();
termTbl.initialize(s, t);
return termTbl.cosineSimilarity();
}
public static void main(String[] args)
{
Scanner user = new Scanner(System.in);
String dbPath = "articles/";
DatabaseIterator db = setupDatabase(dbPath);
System.out.println("Read " + db.getNumArticles() +" articles from disk.");
ArticleTable T = new ArticleTable();
Article[] A = getArticleList(db);
T.initialize(A);
int choice = -1;
do
{
System.out.println();
System.out.println("Welcome to Mini-Google!");
System.out.println("=====================");
System.out.println("Make a selection from the " +
"following options:");
System.out.println();
System.out.println("Manipulating the database");
System.out.println("-------------------------");
System.out.println(" 1. add a new article");
System.out.println(" 2. remove an article");
System.out.println();
System.out.println("Searching the database");
System.out.println("----------------------");
System.out.println(" 3. search using search phrase");
System.out.println();
System.out.print("Enter a selection (1-3, or 0 to quit): ");
choice = user.nextInt();
user.nextLine();
switch (choice)
{
case 0:
return;
case 1:
addArticle(user, T);
break;
case 2:
removeArticle(user, T);
break;
case 3:
search(user, T);
break;
default:
break;
}
choice = -1;
}
while (choice < 0 || choice > 4);
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.