Homework 6 - Parsing
Due by 11:59pm on March 26, 2025
Example Input and Output
>>> parse_tokens(['(', '+', '1', '1', ')'], 0)
(Pair('+', Pair(1, Pair(1, nil))), 5)
>>> parse_tokens(['(', '*', '(', '-', '8', '4', ')', '4', ')'], 0)
(Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil))), 9)
Objectives
- Build an expression tree from a parsed calculator input expression
Starter Files
Download homework06.zip. Inside the archive, you will find the pair.py
library and test files for this homework.
Introduction
In this homework, you’ll build off of your work from Lab 17 and write a parse_tokens()
function that takes as input the tokenized list of symbols that the tokenize()
function you wrote generates and parses that into an expression tree that represents the mathematical expression the user input. Project 3 will then use these functions to build a full interpreter for the Calculator language. The auto grader for this homework tests parse_tokens()
specifically so make sure you name your function that.
Representing Expression Trees
This section is a review from lab 17.
We can represent expression trees using slightly different version of a linked list. Instead of using Link
, we will be using Pair
to construct a pair list of the operators and operands. Like Link
, every Pair
has a first
and rest
attribute; however, all Pair
lists end with nil
. For example, Pair(2, nil)
or Pair(1, Pair(2, nil))
. Representing expressions as pair lists will make it even easier to evaluate the expressions later down the road.
If necessary, refer to
pair.py
for the implementation forPair
andnil
To demonstrate how to represent expressions as pair lists, look at the following examples:
(+ 1 1) --> ['(', '+', '1', '1', ')'] --> Pair('+', Pair(1, Pair(1, nil)))
(+ 1.5 1.5) --> ['(', '+', '1.5', '1.5', ')'] --> Pair('+', Pair(1.5, Pair(1.5, nil)))
Notice that at the start of an expression, we immediately start with a Pair
followed by the operator and a Pair
list of operands where each .first
value is an operand. Additionally, each operand is converted to a float
or int
accordingly. In the context of converting the python list to a pair list, the open parenthesis (
signifies the start of a pair list and the closing parenthesis )
signifies the end of the pair list or nil
.
In the case that there is an operand is another expression, we represent that with another pair list. For example:
(* (- 8 4) 4) --> Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil)))
^
(- 8 4)
Here’s the algorithm for converting a python list of tokens to a Pair
list. It uses a recursive function that has two parameters, the token list and an index value:
- If the token at the provided
index
is an open parenthesis'('
?- The next token should be an operator. Store the operator in a variable.
- If the
index
is not zero, then the current tokens is a start of a sub-expression, so…- Call
parse_tokens()
by passing in the list of tokens with theindex
incremented by2
(this puts the token after the operator as the next one to be processed). The recursive call will return two values: a new pair list and a new index respectively. Capture the new pair list returned by the function in a new variable and updateindex
to the new index returned by the function. - Set the variable you stored the operator in step 1.1 to a
Pair
object with the operator as.first
and the new pair list (returned in step 1.2.1) as.rest
- Call
- If the
index
of the current token is0
, increment theindex
by2
. - Call
parse_tokens()
with the token list andindex
. Update theindex
to new index returned by the function and capture the new pair list returned in a new variable. This function call will either return the rest of the syntax tree if the input index was 0 ornil
if it wasn’t. - Return a
Pair
object with the variable from 1.1 (possibly updated in 1.2.2) as the.first
and the pair list returned in step 1.4 as the.rest
along with theindex
returned in 1.4 (in that order).
- If the token is a close parenthesis
')'
, return anil
object and theindex
incremented by 1. - Everything else should be operands and should be integers or floating point numbers. Do this entire part in a try/except block that raises a
TypeError
if converting the current token to float or integer fails.- If the current token has a decimal point, convert it to a
float
, and store it in a variable. - If the current token does not have a decimal point, convert it to an
int
, and store it in a variable. - Call
parse_tokens()
with the token list and the index incremented by1
to process the next token. Capture the new pair list and update theindex
to the new index returned. - Return a
Pair
object with the variable created in 3.1 or 3.2 as.first
and the pair object returned in step 3.3 as.rest
along with the index returned in step 3.3.
- If the current token has a decimal point, convert it to a
Part 1 - Getting started
Task 1 - Set up
Start by creating a new calculator.py
file to hold your code in. Copy your tokenize()
function from lab 16 into this file. You’ll also want to import the Pair
class and nil
object from the pair.py
file that was included in the zip file.
from pair import Pair, nil
Next, create a parse_tokens()
function that takes a list of tokens and an index as input parameters:
def parse_tokens(tokens, index):
""" Takes a list of tokens and an index and converts the tokens to a Pair list
>>> parse_tokens(['(', '+', '1', '1', ')'], 0)
(Pair('+', Pair(1, Pair(1, nil))), 5)
>>> parse_tokens(['(', '*', '(', '-', '8', '4', ')', '4', ')'], 0)
(Pair('*', Pair(Pair('-', Pair(8, Pair(4, nil))), Pair(4, nil))), 9)
"""
# Write your code here
The calculator.py
file will be the basis of Project 3 as well.
Task 2 - Write some tests
While not necessary, it will help you in your debugging if you have some test cases generated that you could use to verify that your code is creating the correct output. We’ve provided some with the zip file, but you should add a few of your own. You could write these as either doctests or pytest functions. Think about some example input expressions and what the generated tree should look like for those expressions. Write the tests to verify that you get what you expect. Now you’re ready to start coding
Task 3 - Implement parse_tokens()
With the list of tokens from our tokenize()
function, we need to convert that into a form we can evaluate. You are going to build a syntax tree using the Pair
class (which we’ve provided you in the the pair.py
file). Remember that each Pair
has a data value called first and a link to the rest of the structures called rest. first can either be an operator, a value, or another Pair
object that contains a sub-expression. rest is either a Pair
object containing the next item in the expression or a nil
object (also provided in pair.py
) that signifies that this is the end of the expression. Our task is to convert our token list into that syntax tree.
The full algorithm for this is above in the Representing Expression Trees section. If at any point, something is not correct, for example a string where there should be numbers, an invalid operator, or extra or nor parentheses, you should have your function throw an exception reporting the error.
Since our syntax tree is a recursive structure, you’ll be following a recursive process to build it as described above and you will need to encapsulate parsing logic into a recursive function (the parse_tokens()
function). This function will return either a Pair
or nil
object and the index of the next token to process. The resulting Pair
object will be the head of a linked list of Pair
objects representing the syntax tree of the expression.
Before you start coding, review the algorithm to be sure you understand what is happening. You might try running the algorithm by hand on some simple expressions to make sure you know how it works. Once you understand it, add the code to your function to actually implement the algorithm. If you started the parse_tokens()
function in Lab 16, you’ve already started on this. Copy your code over and continue from where you left off.
Turn in your work
You’ll submit your calculator.py
file on Canvas via Gradescope where it will be checked via the auto grader. Make sure that you haven’t “hard coded” anything specific to the test data we gave you. We do not guarantee that all scenarios are tested by the test data that we have provided you.