Algorithm Tokenize ( string s ): 1. list ( string ) tokens := [ ] 2. while not i
ID: 3808134 • Letter: A
Question
Algorithm Tokenize(string s):
1. list(string) tokens := [ ]
2. while not isEmpty(s)
a. If s begins with a token, remove the longest possible token from the beginning of s and push that token onto the back of tokens
b. If head(s) is a whitespace character, pop(s).
3. return tokens
For example, below is a trace of the call tokenize("hello, world 3+1")
# it
tokens
s
0
[]
"hello, world 3+1"
1
["hello"]
", world 3+1"
2
["hello",","]
" world 3+1"
3
["hello",","]
"world 3+1"
4
["hello",",","world"]
" 3+1"
5
["hello",",","world"]
"3+1"
6
["hello",",","world","3"]
"+1"
7
["hello",",","world","3","+"]
"1"
8
["hello",",","world","3","+","1"]
""
Homework:
Implement Tokenize in Python. That is, write a Python function called tokenize such that if s is a tring for which Tokenize(s) is defined, then Tokenize(s) == tokenize(s).
Test cases:
Note the following
tokenize("hello world")
["hello", "world"]
tokenize("hello'world'foo")
["hello", "'world'", "foo"]
["hello", "'world'", "foo"]
['hello', ''world'', 'foo']
tokenize("'hello\'world'")
["'hello\'world'"]
tokenize("'hello world'")
["'hello world'"]
tokenize("3.33(33..")
not part of our agreement
tokenize("\")
["\"]
tokenize(" ")
[ ]
tokenize("'a'e'c'")
["'a'", "e", "'c'"]
tokenize("3.3+1")
["3.3","+","1"]
tokenize("''")
["''"]
tokenize("<59>=6")
["<","59",">=","6"]
tokenize("+")
["+"]
tokenize("foo'4$!\\h\'32\t'88")
["foo","'4$!\\h\'32\t'"],
["88"]]
tokenize("'hello'wor'ld")
not part of our agreement
For example, you could enter into repl.it or IDLE,
>> tokenize("'hello\'world'")== ["'hello\'world'"]
>> True
Send your tokenizer as an attached .py file
# it
tokens
s
0
[]
"hello, world 3+1"
1
["hello"]
", world 3+1"
2
["hello",","]
" world 3+1"
3
["hello",","]
"world 3+1"
4
["hello",",","world"]
" 3+1"
5
["hello",",","world"]
"3+1"
6
["hello",",","world","3"]
"+1"
7
["hello",",","world","3","+"]
"1"
8
["hello",",","world","3","+","1"]
""
Explanation / Answer
we will require to use python regular expression -
let us take an example to understand its use
[..] if i put any character inside this and apply it to my string then i will get the output with the string consisting of all previous characters except those written inside the [].
to use this for characters like -> ' which already have a special meaning in python we use "" before them. "" means escape sequence character so it makes the python compiler not see ' as a special character in the python but as a general character. please read further on Python regex.
also to escape "" itself use 4 \\ - the 2 at front and back to escape it from others, while one inside escapes the actual slash.
code -
import re
scanner=re.Scanner([
(r"[0-9]+", lambda scanner,token:(token)),
(r"[a-zA-z]+", lambda scanner,token:(token)),
(r"[,.'"\\]+", lambda scanner,token:(token)),
(r"s+", None), # None == skip token.
])
s="hello\'world"
results, remainder=scanner.scan(s)
u=""
for i in results:
u=u+i
if s==u:
print True
else:
priint False
--------------------------------------------------------
thank you
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.