Algorithm Tokenize ( string s ): 1. list ( string ) tokens := [ ] 2. while not i

ID: 3808134 • Letter: A

Question

Algorithm Tokenize(string s):

1. list(string) tokens := [ ]

2. while not isEmpty(s)

a. If s begins with a token, remove the longest possible token from the beginning of s and push that token onto the back of tokens

b. If head(s) is a whitespace character, pop(s).

3. return tokens

For example, below is a trace of the call tokenize("hello, world 3+1")

# it

tokens

[]

"hello, world 3+1"

["hello"]

", world 3+1"

["hello",","]

" world 3+1"

["hello",","]

"world 3+1"

["hello",",","world"]

" 3+1"

["hello",",","world"]

"3+1"

["hello",",","world","3"]

"+1"

["hello",",","world","3","+"]

"1"

["hello",",","world","3","+","1"]

Homework:

Implement Tokenize in Python. That is, write a Python function called tokenize such that if s is a tring for which Tokenize(s) is defined, then Tokenize(s) == tokenize(s).

Test cases:

Note the following

tokenize("hello world")

["hello", "world"]

tokenize("hello'world'foo")

["hello", "'world'", "foo"]

['hello', ''world'', 'foo']

tokenize("'hello\'world'")

["'hello\'world'"]

tokenize("'hello world'")

["'hello world'"]

tokenize("3.33(33..")

not part of our agreement

tokenize("\")

["\"]

tokenize(" ")

[ ]

tokenize("'a'e'c'")

["'a'", "e", "'c'"]

tokenize("3.3+1")

["3.3","+","1"]

tokenize("''")

["''"]

tokenize("<59>=6")

["<","59",">=","6"]

tokenize("+")

["+"]

tokenize("foo'4$!\\h\'32\t'88")

["foo","'4$!\\h\'32\t'"],

["88"]]

tokenize("'hello'wor'ld")

not part of our agreement

For example, you could enter into repl.it or IDLE,

>> tokenize("'hello\'world'")== ["'hello\'world'"]

>> True

Send your tokenizer as an attached .py file

# it

tokens

[]

"hello, world 3+1"

["hello"]

", world 3+1"

["hello",","]

" world 3+1"

["hello",","]

"world 3+1"

["hello",",","world"]

" 3+1"

["hello",",","world"]

"3+1"

["hello",",","world","3"]

"+1"

["hello",",","world","3","+"]

"1"

["hello",",","world","3","+","1"]

Explanation / Answer

we will require to use python regular expression -

let us take an example to understand its use

[..] if i put any character inside this and apply it to my string then i will get the output with the string consisting of all previous characters except those written inside the [].

to use this for characters like -> ' which already have a special meaning in python we use "" before them. "" means escape sequence character so it makes the python compiler not see ' as a special character in the python but as a general character. please read further on Python regex.

also to escape "" itself use 4 \\ - the 2 at front and back to escape it from others, while one inside escapes the actual slash.

code -

import re
scanner=re.Scanner([
(r"[0-9]+", lambda scanner,token:(token)),
(r"[a-zA-z]+", lambda scanner,token:(token)),
(r"[,.'"\\]+", lambda scanner,token:(token)),
(r"s+", None), # None == skip token.
])

s="hello\'world"

results, remainder=scanner.scan(s)
u=""

for i in results:

u=u+i

if s==u:

print True

else:

priint False

--------------------------------------------------------

thank you

Navigate

Algorithm Tokenize ( string s ): 1. list ( string ) tokens := [ ] 2. while not i

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Algorithm Tokenize ( string s ): 1. list ( string ) tokens := [ ] 2. while not i

Question

Explanation / Answer

Related Questions

Navigate