PLEASE WRITE CODE IN JAVA. PLEASE INCLUDE WHICH WIKI PAGE USED We are going to p
ID: 3918962 • Letter: P
Question
PLEASE WRITE CODE IN JAVA.
PLEASE INCLUDE WHICH WIKI PAGE USED
We are going to play with analyzing the contents of the Wikipedia pages. The Wikipedia pages are in xml format and were downloaded from: In the above directory are articles from Wikipedia. In class, we learned about the idea of stopwords. You can find a list of stopwords at (http://www.ranks.nl/stopwords) Go to the wikipedia page: https:/dumps.wikimedia.org/backup- index.html. Pick any of the wikis to download. A3: Create a program that records the number of occurrences of these stop words. Output the stop word and its occurrence number to a file. A4: Create a program that records all words that are not stop words and the number of occurrences of these words. Be careful if you are using arrays- you might run out of space.Explanation / Answer
Hii...I have written program against the above 2. I have taken dewiki set in wikipedia.
Program 1:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Test {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
File f1 = new File("C:/Users/Santhosh/Desktop/dewiki-20180720-pages-articles-multistream-index.txt");
BufferedReader br = new BufferedReader(new FileReader(f1));
String line = br.readLine();
List<String> main = new ArrayList<String>();
while(line!=null){
//System.out.println(line);
String ab[] = line.split(":");
if(ab.length>2){
main.add(ab[2]);
}
line = br.readLine();
}
System.out.println(main.size());
File f2 = new File("C:/Users/Santhosh/Desktop/stopwords.txt");
br = new BufferedReader(new FileReader(f2));
line = br.readLine();
List<String> stopwords = new ArrayList<String>();
while(line!=null){
//System.out.println(line);
stopwords.add(line);
line = br.readLine();
}
br.close();
BufferedWriter bw = new BufferedWriter(new FileWriter(new File("C:/Users/Santhosh/Desktop/output.txt")));
StringBuffer sb = new StringBuffer();
for(String s1:stopwords){
sb.append(s1 + ": " + Collections.frequency(main, s1) +" ");
}
bw.write(sb.toString());
bw.close();
}
}
Program 2:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.*;
public class Test {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
File f1 = new File("C:/Users/Santhosh/Desktop/dewiki-20180720-pages-articles-multistream-index.txt");
BufferedReader br = new BufferedReader(new FileReader(f1));
String line = br.readLine();
List<String> main = new ArrayList<String>();
while(line!=null){
//System.out.println(line);
String ab[] = line.split(":");
if(ab.length>2){
main.add(ab[2]);
}
line = br.readLine();
}
System.out.println(main.size());
File f2 = new File("C:/Users/Santhosh/Desktop/stopwords.txt");
br = new BufferedReader(new FileReader(f2));
line = br.readLine();
List<String> stopwords = new ArrayList<String>();
while(line!=null){
//System.out.println(line);
stopwords.add(line);
line = br.readLine();
}
br.close();
List<String> collList = new ArrayList<String>();
for(String s1:main){
if(!stopwords.contains(s1)){
collList.add(s1);
}
}
Set<String> set1 = new HashSet<String>();
BufferedWriter bw = new BufferedWriter(new FileWriter(new File("C:/Users/Santhosh/Desktop/output.txt")));
StringBuffer sb = new StringBuffer();
for(String s1:set1){
sb.append(s1 + ": " + Collections.frequency(collList, s1) +" ");
}
bw.write(sb.toString());
bw.close();
}
}
Please test it and let me know any issues. Thanks. All the best.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.