Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I am having an issue with the following: I would turn it into an NFA, then to a

ID: 3553237 • Letter: I

Question

I am having an issue with the following:



I would turn it into an NFA, then to a DFA, switch the terminal/non-terminal states, then back to a regex...but not sure how to show that in an "algorithm". Is there an easier more efficient way??

Give an algorithm that takes an input a regular expression E and outputs a regular expression F such that L(F) = L(E)-. L(E)- is the complement of L{E), i.e. the language Sigma*- L(E). Let Sigma = {a,b}. Show how your algorithm works on regular expression a cup b. Do not just give a regular expression for L(a cup b)-. Rather show how your algorithm obtains such a regular expression.

Explanation / Answer

[N.B. If all you want to do is parse, this section is particularly irrelevant. But if you want to hack parsing tools, this section may be useful.]

A grammar implies a set of grammar items which can be interpreted as states in an NFA. An item corresponds to a choice of one production from the grammar, and a position within that production.

For example, if a production is:

Then the items are:

An empty production has exactly one item:

If an item has nothing to the left of the star, it is called an initial item.

If an item has nothing to the right of the star, it is called an final item.

An item says what we think we are parsing (an A, in the above examples) and what we expect the next bit of input to be (for example, in the item A -> * B C D, we next expect to parse a B).

These items are states in an NFA, the LR(0) NFA, with edges defined between them.

The first item in the grammar is conventionally the start state, although any initial item can be used for this purpose.

Final items are final NFA states. A final state implies that an entire expansion has been seen and now a reduction of that expansion is possible. Final states have no outward edges.

An NFA state (item) has two kinds of edges corresponding to whether the symbol to the right of the star is a terminal or non-terminal.

Terminal edges indicate that a particular type of input token is expected. If the lookahead is of that type, it can be shifted onto the parse stack whie crossing the terminal edge to a new state.

Non-terminal edges indicate a point where a nested construct should be parsed. If in state I the symbol following the star is non-terminal B, then there is an epsilon edge to all initial items associated with productions of B. In case one of those epsilon edges leads to a correct parse, there is also an edge labeled B that can be used to shift the B generated by a reduction.

For example, in the state:

it is legal to shift the non-terminal B (this is sometimes called a "goto" move). It is also legal to make an epsilon transition to states:

presuming the corresponding productions exist. Informally: you're allowed to absorb the next burst of input as a reduction to B, therefore, you are also allowed at this point to begin working on a parse of B.

The rule applies transitively so that if A -> w * B u epsilon transitions to B -> * C v, that in turn epsilon transitions to C -> * z.

To summarize: The lr(0) NFA has final states which specify reductions, terminal transitions, which specify permissible input, non-terminal transitions, which specify permissible feedback from the parsing of nested constructs, and epsilon edges, which specify points at which nested constructs may begin.

The above describes all the states, designates the start and final states and defines the transition function of a non-deterministic automata. The epsilon edges associated with non-terminal transitions prevent the automata from being deterministic.

But the lr(1) parser requires a deterministic lr(0) automata So the code in this file not only must construct the lr(0) NFA, but it must convert that NFA into a DFA.

To parse deterministically, we apply a few stern measures. First, we combine all states accessible by epsilon transitions into superstates. Second, when components of a superstate disagree about whether to shift some input or reduce because of it, we favor shifting. Third, we only permit reductions if we can guarantee that after a series of reductions, the look-ahead token will be shifted. Fourth, if the components of a superstate disagree about which reduction rule to apply, we pick whichever one we discover first (clearly this could be improved). Fifth and finally, if only some components of a superstate think the look-ahead token is an error, we ignore them (if all components agree the look-ahead is an error, then so do we).

Its useful to prove that a particular grammar has the property that components of an achievable superstate will never disagree about what reduction rule to apply. This can be proven automatically. One convenient technique for lalr(1) grammars is to convert them to yacc syntax and run a table generator on them. Some day, grammar debugging tools will be added to this library.

In the usual way, we can construct a DFA from an LR(0) NFA, combining multiple NFA states accessible along epsilon paths into singular DFA superstates.