input: S, a list of characters (reverse of the original formula) output: T, a li
ID: 3639897 • Letter: I
Question
input: S, a list of characters (reverse of the original formula)output: T, a list of tokens.
variables: T:list<list<Nat>>, P: list<nat>, A: state
// T is the list of tokens processed so far, and P is the current token in progress.s is the state of P as described above.
initialization: T=[], P=[], A=FRESH
Algorithm:
while true do
If last(S) is a white space character then
Remove any and all white space from the back of S
if P is a valid token then
push P onto the back of T
clear P to empty
set A = FRESH
else if P is not empty and does not contain a valid token
halt and print "Tokenizing Error"
if S is empty then
if P is empty then halt and return T
else if valid_token(A) then
push P onto the back of T
halt and return T
else
halt and print "Tokenizing Error"
if P++[last(S)] is the beginning of some valid token (that is, if can_push(A, last(S)) then
push last(S) onto the back of P
set A to next_state(s, last(S))
pop the last item off of S.
else
if P is a token (that is, if valid_token(A)) then
push P onto the back of T
set P to []
set A to FRESH
else
halt and print "Tokenizing Error"
Tokens of the new language will be numerals, identifiers, special tokens, and white space tokens, where
a numeral is a string of one or more digits,
an identifier is a string of letters, digits, and underscores, beginning with a letter, and
the allowed special characters are ~ & ^ * + - ( ) [ ] : , < > =
a special token is a string of length 1, whose only member is an allowed special character
a white space token is a string of one or more white space characters (tab, space, line feed, or new line).
enum state {FRESH, ID, NUM, SPEC};
state(P,FRESH) -- P is empty.
state(P,ID) -- P begins with a letter and contains only letters, digits, and underscores.
state(P,NUM) -- P is not empty and contains only digits
state(P,SPEC) -- P contains a single allowed special character.
state(P,WHITE) -- P contains a nonempty stack of white space characters (space, tab, line feed, or new line)
The maxmunch loop will have to be modified to account for the fact that white space is no longer ignored. . The goal for today's lab session, however, is just to write the new versions of can_push, valid_token, and next_state, and to get help doing it from your neighbors and from the TA. The specification of these three functions are the same as previously, but they will change in the context of the new, simplified token set. The completed version of the new tokenizer will be due at the end of next week's lab (Mar 9).
bool can_push(state A, char c)
// Tells whether the result of pushing character c onto a stack in state A
// forms the beginning of a valid token.
//
// all P: all A: all c: state(L,A) & can_push(A,c) -> some L: token(L) & prefix(P++[c],L)
//
// see Test 1 solutions for definition of prefix
bool valid_token(state A)
// tells whether stacks in state s are valid tokens
//
// all P: all A: state(P,A) -> (valid_token(A) <-> token(P))
state next_state(state A, char c)
// tells the state of the stack resulting from pushing character c onto a stack in state s
//
// all A: all P: all c: state(P,A) & can_push(A,c) -> state(P++[c], next_state(A,c))
Explanation / Answer
input: S, a list of characters (reverse of the original formula) output: T, a list of tokens. variables: T:list, P: list, A: state // T is the list of tokens processed so far, and P is the current token in progress.s is the state of P as described above. initialization: T=[], P=[], A=FRESH Algorithm: while true do If last(S) is a white space character then Remove any and all white space from the back of S if P is a valid token then push P onto the back of T clear P to empty set A = FRESH else if P is not empty and does not contain a valid token halt and print "Tokenizing Error" if S is empty then if P is empty then halt and return T else if valid_token(A) then push P onto the back of T halt and return T else halt and print "Tokenizing Error" if P++[last(S)] is the beginning of some valid token (that is, if can_push(A, last(S)) then push last(S) onto the back of P set A to next_state(s, last(S)) pop the last item off of S. else if P is a token (that is, if valid_token(A)) then push P onto the back of T set P to [] set A to FRESH else halt and print "Tokenizing Error" Tokens of the new language will be numerals, identifiers, special tokens, and white space tokens, where a numeral is a string of one or more digits, an identifier is a string of letters, digits, and underscores, beginning with a letter, and the allowed special characters are ~ & ^ * + - ( ) [ ] : , < > = a special token is a string of length 1, whose only member is an allowed special character a white space token is a string of one or more white space characters (tab, space, line feed, or new line). enum state {FRESH, ID, NUM, SPEC}; state(P,FRESH) -- P is empty. state(P,ID) -- P begins with a letter and contains only letters, digits, and underscores. state(P,NUM) -- P is not empty and contains only digits state(P,SPEC) -- P contains a single allowed special character. state(P,WHITE) -- P contains a nonempty stack of white space characters (space, tab, line feed, or new line) The maxmunch loop will have to be modified to account for the fact that white space is no longer ignored. . The goal for today's lab session, however, is just to write the new versions of can_push, valid_token, and next_state, and to get help doing it from your neighbors and from the TA. The specification of these three functions are the same as previously, but they will change in the context of the new, simplified token set. The completed version of the new tokenizer will be due at the end of next week's lab (Mar 9). bool can_push(state A, char c) // Tells whether the result of pushing character c onto a stack in state A // forms the beginning of a valid token. // // all P: all A: all c: state(L,A) & can_push(A,c) -> some L: token(L) & prefix(P++[c],L) // // see Test 1 solutions for definition of prefix bool valid_token(state A) // tells whether stacks in state s are valid tokens // // all P: all A: state(P,A) -> (valid_token(A) token(P)) state next_state(state A, char c) // tells the state of the stack resulting from pushing character c onto a stack in state s // // all A: all P: all c: state(P,A) & can_push(A,c) -> state(P++[c], next_state(A,c))Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.