Fastest possible duplicate detection WARNING: On the big file you (might, depend
ID: 3699390 • Letter: F
Question
Fastest possible duplicate detection
WARNING: On the big file you (might, depending on your machine) have to use the following execution command to ensure enough JVM heap space:
Background
This lab is about pure speed/efficiency. Not only must the ouput be correct, it must use the correct container and use that container most efficiently. Your program must take the name of a huge file of strings on the command line. The output of the program will be either "UNIQUE" or "NOT UNIQUE". No other output of any other kind is expected or allowed except a newline after the word.
you can get the output perfect and get a ZERO on this lab! Just use the wrong container and use it sub optimally.
You must test your program on both the small file and the big one.
The small.txt file produces UNIQUE (and a newline) to the screen because it contains no duplicate words
The big file produces NOT UNIQUE (and a newline) to the screen because it contains a duplicate word
You must:
50% - choose the optimal container ArrayList or TreeSet or HashSet
50% - call only the absolute minimal number of optimal method(s) on the container to solve the problem.
You will not just be graded on the correct answer. You must use the right container AND use it most efficiently.
Your program must use the command line and args[0] to read the input file.
there are a lot of words in the list so ill give you just a few. Please use a container such as ArrayList, Treeset, or hashset. and use minimal number of methods on the container. Thank you
small.txt
big.txt
Explanation / Answer
HashSet will be our choice of container in this case. As it offers O(1) time complexity for add() and remove() methods.
Hash set method add() returns true
i.e. A false value means that set already contains that element hence, current word is duplicate
* ArrayList has O(n) complexity of searching an element and thus finding duplicate.
* TreeSet has O(log(n)) time complexity for add() and remove().
Below code tests uniqueness of words in a file
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
public class Lab10 {
public static void main(String args[]) {
String file = args[0];
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
HashSet<String> set = new HashSet<>();
boolean unique = true;
while ((line = br.readLine()) != null) {
boolean added = set.add(line);
if (!added) {
unique = false;
break;
}
}
if (unique) {
System.out.println("UNIQUE");
} else {
System.out.println("NOT UNIQUE");
}
} catch(IOException ioe) {
ioe.printStackTrace();
}
}
}
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.