Lab 16 - Parsing
Due by 11:59pm on 2024-03-14.
Starter Files
Download lab16.zip. Inside the archive, you will find starter files for the questions in this lab.
Topics
Calculator Language
In Project 3 you will build an interactive interpreter for the Calculator language.
The interpreter will allow the user to enter valid calculator expressions and evaluate them. For example:
calc >> (+ 3 4)
7
calc >> (+ 3 (+ 2 2))
7
To do that, we must first review the calculator language.
The Calculator language has primitive expressions, such as 2
, -4
, 5.6
, and call expressions.
A call expression is a combination that begins with an operator (+
, -
, *
, /
) followed by 0 or more expressions.
For example:
(+ 1 2)
(+ 1 2 3)
(/ 3 (+ 4 5))
Notice that the operator comes first and the operands follow. (So, instead of having (3 + 4)
, we have (+ 3 4)
.)
Additionally, the operands can be another call expression.
The rules of evaluation are as follows:
- Evaluate the operator/function (in other words, what function is it?)
- Evaluate the operands
- If an operand is another call expression, then evaluate that call expression. For example, if the expression is
(/ 3 (+ 4 5))
, evaluate(+ 4 5)
to get9
, then do(/ 3 9)
to get0.33
- If an operand is another call expression, then evaluate that call expression. For example, if the expression is
- Apply the function to the operands and achieve the result
Using these rules of evaluation on the previous examples gives us,
(+ 1 2)
= 3(+ 1 2 3)
= 6(/ 3 (+ 4 5))
= 0.33
Notice that the expression (+ 1 2 3)
evaluates to $(1+2)+3 = 6$.
Likewise, (/ 20 5 4)
evaluates to $(20 / 5) / 4 = 1$.
Practice
Evaluate the following calculator expressions (it may be useful to go to the expression tree segment if you are struggling).
(+ 3 2)
(* (- 8 4) 4)
(/ (+ 15 5) (* 2 2))
(+ 1 2 3 4 5)
(+ (- 10 2 2) (* (+ 3 2) 4))
Answers:
5
- Add 3 to 2
16
- Multiply
(- 8 4)
or 4 by 4
- Multiply
5
- Divide
(+ 15 5)
or 20 by(* 2 2)
or 4
- Divide
15
(((1 + 2) + 3) + 4) + 5
= 15
26
- Add
(- 10 2 2)
or 6 to(* (+ 3 2) 4)
or 20
- Add
Expression Tree
Expression Trees allow us to break down complicated calculator expressions. If we have a call expression, we represent that as a box with edges or lines leading to the operator/function and the operands. If an operand is another call expression, then we represent that as a box with its own edges.
For example, if the expression is (+ 2 2)
, we diagram that like so:
Additionally, if the expression is (* 3 (+ 4 5) (* 6 7 8))
:
Representing Expression Trees
We can represent expression trees using slightly different version of a linked list. Instead of using the Link
class, we will be using the Pair
class to construct a pair list of the operators and operands. Like Link
, every Pair
has a .first
and .rest
attribute; however, all Pair
lists end with nil
. For example, Pair(2, nil)
or Pair(1, Pair(2, nil))
. Representing expressions as pair lists will make it easier to evaluate the expressions later down the road.
If necessary, refer to
pair.py
for the implementation forPair
andnil
To demonstrate how to represent expressions as pair lists, look at the following examples that convert a calculator expression to a tokens list and then to a pair list:
(+ 1 1) -> ['(', '+', '1', '1', ')'] -> Pair('+', Pair(1, Pair(1, nil)))
# pair list of operands
(+ 1.5 1.5) -> ['(', '+', '1.5', '1.5', ')'] -> Pair('+', Pair(1.5, Pair(1.5, nil)))
Notice that at the start of an expression, we immediately start with a Pair
followed by the operator and a Pair
list of operands where each .first
value is an operand. Additionally, each operand is converted to a float
or int
accordingly. In the context of converting the python list to a pair list, the open parenthesis (
signifies the start of a pair list and the closing parenthesis )
signifies the end of the pair list or nil
.
In the case that there is an operand is another expression, we represent that with another pair list. For example:
(* (+ 1 1) 4) --> Pair('*', Pair(Pair('+', Pair(1, Pair(1, nil))), Pair(4, nil)))
^-----------------------------^
(+ 1 1)
For Q2, here's the algorithm for converting a python list of tokens to a Pair
list. It uses a recursive function, called parse_tokens()
that has two parameters: a list containg the tokens
and an index
value.
- If the token at the provided
index
is an open parenthesis'('
- The next token should be an operator. Store the operator in a variable.
- If the
index
is not zero, then the current tokens is a start of a sub-expression, so...- Call
parse_tokens()
by passing in the list of tokenstokens
with theindex
incremented by2
(this puts the token after the operator as the next one to be processed). The recursive call will return two values: a new pair list and a new updated index. Save the new pair list in a variable and updateindex
to equal the updated index returned by the function. - Reassign the variable you stored the operator in step 1.1 to a
Pair
object with the operator as.first
and the new pair list (returned in step 1.2.1) as.rest
- Call
- If the
index
of the current token is0
- increment the
index
by2
- increment the
- After those conditions are checked call
parse_tokens()
with the token listtokens
andindex
. Save the new pair list in a variable and updateindex
to new index returned by the function. This function call will either return the rest of the syntax tree if the input index was 0 ornil
if it wasn't. - Return a
Pair
object with the variable holding the operator from 1.1 (which was possibly updated in 1.2.2) as the.first
and the pair list returned in step 1.4 as the.rest
along with theindex
returned in 1.4 (in that order).
- If the token is a close parenthesis
')'
- return a
nil
object and theindex
incremented by 1.
- return a
- All other tokens should be operands for expression as integers or floating point numbers. Do this entire part in a try/except
block that raises a
TypeError
if converting the current token to float or integer fails.- If the current token has a decimal point
.
, convert it to afloat
, and store it in a variable. - If the current token does not have a decimal point, convert it to an
int
, and store it in a variable. - Call
parse_tokens()
with the token listtokens
and the index incremented by1
to process the next token. Save the new pair list and updateindex
to the new index returned. - Return a
Pair
object with the variable created in 3.1 or 3.2 as.first
and the pair object returned in step 3.3 as.rest
along with the index returned in step 3.3.
- If the current token has a decimal point
Required Question
Q1: Tokenize
To evaluate user input, our program needs to format that input in such a way that make it easier to evaluate. Build a tokenize()
function that takes a string expression
like the ones above and returns a list where each item in the list is a parenthesis, one of the four operators, or a number literal.
def tokenize(expression):
""" Takes a string and returns a list where each item
in the list is a parenthesis, one of the four operators (/, *, -, +),
or a number literal.
>>> tokenize("(+ 3 2)")
['(', '+', '3', '2', ')']
>>> tokenize("(- 9 3 3)")
['(', '-', '9', '3', '3', ')']
>>> tokenize("(+ 10 100)")
['(', '+', '10', '100', ')']
>>> tokenize("(+ 5.5 10.5)")
['(', '+', '5.5', '10.5', ')']
>>> expr = "(* (- 8 4) 4)"
>>> tokenize(expr)
['(', '*', '(', '-', '8', '4', ')', '4', ')']
>>> expr = "(* (- 6 8) (/ 18 3) (+ 10 1 2))"
>>> tokenize(expr)
['(', '*', '(', '-', '6', '8', ')', '(', '/', '18', '3', ')', '(', '+', '10', '1', '2', ')', ')']
"""
# Write your code here
Hint: The
.split()
or.replace()
methods may be helpful here.
Q2: Parse Tokens
You will be writing this function in Homework 6, so if you have time get started on it in the lab!
Start implementing the parse_tokens()
function that takes a list of tokens and an index and converts the list of tokens to a Pair
list. Refer to the steps above for instructions how to implement it.
def parse_tokens(tokens, index):
""" Takes a list of tokens and an index and converts the tokens to a Pair list
>>> parse_tokens(['(', '+', '1', '1', ')'], 0)
(Pair('+', Pair(1, Pair(1, nil))), 5)
>>> parse_tokens(['(', '*', '(', '-', '8', '4', ')', '4', ')'], 0)
(Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil))), 9)
"""
# Write your code here
Submit
Submit the lab16.py
file on Canvas to Gradescope in the window on the assignment page.