Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

*In all problems, the dot operator of regular expressions is omitted. Consider t

ID: 3888858 • Letter: #

Question

*In all problems, the dot operator of regular expressions is omitted.

Consider the following four regular expressions

R1 = (a+b)*c(a+b)*ca*

R2 = (a+b)*c(a+b)*c(a+b)*

R3 = (a+b)*c(a+b)*c*(a+b)*

R4 = (a+b)*c*(a+b)*c(a+b)*

R5 = R1 + R2

a) Assuming we have a function getToken() that returns a token which consists of two fields: a token_type and a lexeme. Assume that calling getToken() when the end of input is reached returns a token whose token_type is EOF.

b) Give an input for which two successive calls to getToken() returns two tokens whose token_types are R1 and EOF respectively

c)Given an input for which two successive calls to getToken() returns two tokens whose token_types are R2 and EOF respectively

d)Given an input for which two successive calls to getToken() returns two tokens whose token_types are R3 and EOF respectively

e)Given an input for which two successive calls to getToken() returns two tokens whose token_types are R4 and EOF respectively

f)Explain why there is no input for which getToken() returns a token whose token_type is R5

Explanation / Answer

b) Input : abcaabcaa

     (a+b)* This expression qualify any combination of 'a' and 'b'.

     (a+b)*c This expression qualify any combination of 'a' and 'b' followed by a 'c'

     (a+b)*c(a+b)* This expression qualify any combination of 'a' and 'b' followed by a 'c' followed by any   combination of 'a' and 'b'

     (a+b)*c(a+b)*ca* : This expression qualify any combination of 'a' and 'b' followed by a 'c' followed by any   combination of 'a' and 'b' followed by 'c' and then by any number of 'a'

   Thus, input abcaabcaa will return token type R1 and EOF when two successive calls to getToken() will be made.

(b) Input: abcabbcabb

     Any string that can be generated from expression R1 can also be generated from R2 but not vice-versa as the only difference between R1 and R2 is bold part i.e. (a+b)*c(a+b)*ca* and (a+b)*c(a+b)*c(a+b)*

Expression R2 can generate any combination of 'a' and 'b' after last 'c' but R1 can generate string of 'a' of any length after last 'c'

(c) Input: acaccaba

This string can be generated by R3 and the explanation is same as above and hence two successive calls to getToken() will return R3 and EOF

(d) Input: abccabcaabb

Two successive calls to the given input will return token R4 and EOF. This follows the same explanation as above

(e) As R5 =R1+ R2 that meansR5 is a combination of R1 and R2. A string that can be generated by R5 can also be generated by R1 or R2 as it is a combination of both and hence there is no input for which getToken() returns R5 as token_type because getToken() will either return R1 or R2

For example, if input is aabbcabcabb

This string can be generated by both regular expression R2 and R5 but Token_type returned will be R2 because getToken() will start checking from regular expression R1 , R2 and so on. When it checks R2 it gets it's match and returns R2. It will never reach R5 for returning token_type.