Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

We will look at a small subset of the HTML specification. HTML documents consist

ID: 3685722 • Letter: W

Question

We will look at a small subset of the HTML specification.  HTML documents consist of text separated by tags.  Tags are formed by a ‘<’ character, followed by the tag information, followed by a ‘>’ character.  Some tags are given as single elements, while others are given in pairs, an opening tag and an ending tag.  For an opening tag, the tag information is a tag name, possibly followed by other information.  For a closing tag, the tag information is a ‘/’ character, followed by the tag name.  When tags come in pairs, they must be nested; if there is an opening tag <a> then <b>, then</b> must occur before </a>.  The same tag may not be nested within itself.  The exception is lists, which can be nested inside each other.

The structure for an HTML document is as follows:

Document: <head> … </head> <body> … </body>

               The text after the head cannot contain tags.

               The body can contain text and tags, with the following tags defined:

Begin new paragraph: <p>

Line break: <br>

Bold: <b> … </b>

Italic: <i> … </i>

Hyperlink: <a href=”address”> … </a>

               The address must consist of text (no tags).

Lists:  Lists can be ordered or unordered.  An ordered list will consist of an opening and closing pair, followed by a sequence of list items.

Ordered list: <ol> … </ol>

Unordered list: <ul> … </ul>

List item: <li>

Note that whitespace, including spaces, tabs, and newline characters in the HTML document are ignored.

For example, the following is a valid HTML document:

<head> Example 1 </head>

<body>

<p> Here is some <b> bold </b> and some <i> italic </i> text.

<p> In another paragraph, <a href=”test.doc”> here </a> is a hyperlink.  Now we’ll create a list:

<ul>

<li> Item 1

More of item 1

<p> Second paragraph in item 1

<li> Item 2

<ul>

<li> Subitem 2.1

<li> Subitem 2.2

</ul> <li> Item 3

</ul>

</body>

On the other hand, here are some invalid HTML documents:

<head> Bad Example A </head> <body>

<a href=”test.doc”> another <a href=”test2.doc”> link </a> </a>

<p> Creating <i> Italics and <b> bold italics </i> and now something else. </b>

<ol>

<li> Item 1

<li> Item 2

</body>

Problems with this document include:

·        Nested hyperlinks

·        Italics and bold are not nested correctly

·        No closing tag for the ordered list

If you are interested in more about HTML, you can look at the full specification:

http://www.w3.org/TR/html-markup

Write a Haskell parser, named html, that will return either True if the given string is a valid HTML document, or False, otherwise.  You can import Parser.hs and use any of its functions in your code.  As an example, if we have:

doc1 = “<head> Test1 </head> <body> Testing <p> Test <b> bold </b> text.  <ol> <li> test item <li> another one </ol> </body>

doc2 = “<head> Test2 <body> Testing </body>”

doc3 = “<head> Test3 </head> <body> <a href=”test.doc”>hyperlink <i> text </a> </i> </body>”

(Note that quotation marks inside the string can potentially be problematic.)

Then we would get:

> parse hmtl doc1

(True, “”)

> parse html doc2

(False, “<head> Test2 <body> Testing </body>”)

> parse html doc3

(False, “<head> Test3 </head> <body> <a href=”test.doc”>hyperlink <i> text </a> </i> </body>”)

Explanation / Answer

It is best if we do not expose the implementation of our parser to our users. When we explicitly used pairs for state earlier, we found ourselves in trouble almost immediately, once we considered extending the capabilities of our parser. To stave off a repeat of that difficulty, we will hide the details of our parser type using a newtype declaration --file: doc1/Parse.hs newtype Parse doc1 = Parse { runParse :: ParseState -> Either String (doc1, ParseState) } -- file: doc2/Parse.hs newtype Parse doc2 = Parse { runParse :: ParseState -> Either String (doc2, ParseState) } -- file: doc3/Parse.hs newtype Parse doc3 = Parse { runParse :: ParseState -> Either String (doc3, ParseState) }

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote