Write a method named stripHtmlTags that accepts a Scanner representing an input
ID: 3622329 • Letter: W
Question
Write a method named stripHtmlTags that accepts a Scanner representing an input file as itsparameter, then reads that file, assuming that the file contains an HTML web page, and prints
the file's text with all HTML tags removed. A tag is any text between < and > characters. For
example, if the file contains the following text:
<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome</font>
stuff about my trip to Vegas.<p>
Here's my cat now: <img src="cat.jpg">
</body>
</html>
Your program should output the following text:
379
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
You may assume that the file is a well-formed HTML document and that no tag contains a <
or > character inside itself.
Explanation / Answer
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class StripHtmlTags {
public static void stripHtmlTags(Scanner sc){
while( sc.hasNextLine()) {
int charPrinted=0;
String line = sc.nextLine();
for(int i=0; i<line.length(); i++){
if(line.charAt(i) == '<'){
while( i < line.length() && line.charAt(i++)!='>');
}
if( i< line.length() ){
System.out.print(line.charAt(i));
charPrinted++;
}
}
if( charPrinted >=1 )
System.out.println("");
}
}
public static void main(String[] args) {
File f = new File("htm.txt");
Scanner sc;
try {
sc = new Scanner(f);
stripHtmlTags(sc );
sc.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.