Lab 06 - Testing
Due by 11:59pm on February 4, 2025
Starter Files
Download lab06.zip. Inside the archive, you will find starter files for the questions in this lab.
Topics
Pytest
If you ever want to use your code for something of importance like medical software, you need to make sure it works properly. To ensure it works, we use testing. There’s several libraries and modules you can use to test your code in an efficient way. For this class, we are going to use pytest
.
Installing Pytest
To install pytest, run one of the following:
pip install pytest
python3 -m pip install pytest
To make sure you installed it correctly, run one of the following:
pytest -h
python3 -m pytest -h
You can uninstall a python library by typing into the terminal
pip uninstall <library name>
orpython3 -m pip uninstall <library name>
BYU Pytest Utils
The autograder uses pytest with an extra library containing extra testing utility tools. To run the autograder’s tests locally, install the BYU pytest utils library if you have not already:
pip install byu_pytest_utils
python3 -m pip install byu_pytest_utils
Using Pytest
Let’s say we are trying to test our functions square()
and find_factors()
in example.py
:
def square(x):
return x * x
def find_factors(n):
factors = []
for i in range(1, n):
if n % i == 0:
factors.append(i)
return factors
We can create and run our tests using pytest by writing test functions in example.py
that check if the output matches what we expect. To do this, we use the assert
statement. It takes a condition to assert is True
, and if it is False
it will raise an AssertionError
. Optionally, you can provide an error message if the assertion is False
:
assert <condition>, "<some error message>"
For example, using the assert statements to create the test_square()
and test_find_factors()
functions is given below:
def test_square():
assert square(4) == 16
assert square(0) == 0
assert square(1/2) == 0.25, "The square of 1/2 is not 0.25"
def test_find_factors():
assert find_factors(15) == [1,3,5,15]
assert find_factors(20) == [1,2,4,5,10,20]
Notice that each of the test functions start with test_*
where the *
represents any amount of most characters. In order for pytest to realize that these function are used to verify our code, the function name must start with test_*
. At this point, we can type one of the following commands into the terminal:
pytest example.py
python3 -m pytest example.py
and it will run all the functions in example.py
that start with test_*
without us ever needing to call them in our code.
Here’s what the output looks like:
data:image/s3,"s3://crabby-images/14fad/14fad4521835e39a8319bc8d1f648f78c7efe95f" alt="pytest_example"
We can see that find_factors()
does not work.
If we want to display more about the failed test case, we run pytest with the -v
option.
pytest example.py -v
python3 -m pytest example.py -v
Additionally, if we only wanted to run one test function from a test file, we can do follow the format of pytest <test_file.py>::<test_function>
pytest example.py::test_square
python3 -m pytest example.py::test_square
Note: If you’re having trouble running pytest, try each of the terminal commands. It’s possible only one of the commands work for your operating system
Organization with Test Files
When there is a large amount of code in one file, it is worth moving our tests into a different file for better organization. We will move all our tests into test_example.py
and import the functions from example.py
:
from example import * # from example.py import everything
def test_square():
assert square(4) == 16
assert square(0) == 0
assert square(1/2) == 0.25
def test_find_factors():
assert find_factors(15) == [1,3,5,15]
assert find_factors(20) == [1,2,4,5,10,20]
An additional benefit of doing this is that running pytest without specifying a file will cause pytest to automatically run all files in the format test_*.py
or *_test.py
in the current directory and subdirectories.
Running the command
pytest
python3 -m pytest
will make pytest automatically run test_example.py
in this case.
Writing Tests
A very common problem in testing is figuring out how many tests you need and what inputs to test with to verify that some code works properly. On one hand, a programmer can write hundreds of tests for some code to verify it works properly at the tradeoff of the tests taking a lot of resources to execute. On the other, not having enough tests will not take a lot of resources to execute, but may fail to verify that the code works properly. One wants to find a good middle ground, and to do so requires critical thinking about what the code is intended to do. In black-box testing, a programmer only thinks about the code’s inputs and the outputs and does not consider the underlying implementation.
For example, the test_square()
function follow black-box testing. It provides the square()
function with the argument 4
and it expects 16
. It does not care if the function does addition, multiplication, or anything else to get the right output – just as long as the function provides the right output.
Figuring out what types of inputs to test a function is also a hard problem that requires critical thinking about the code you are testing. Generally, there are three criteria of inputs you want to consider - invalid cases, valid cases, and border cases. We will use the following specifications for a factorial function as an example:
The factorial is the product of all positive integers less than or equal to n
. Computing the factorial of a negative number is not valid.
Valid Cases:
- These are the scenarios where the code is given a good input and executes properly. Here, this is the case where
n
is positive and returns a number. There should be at least two tests ensuring that givenn
it provides the correct output. For example,3!
should return6
, and5!
should return120
.
Invalid Cases:
- These are the scenarios where the code is given a bad input and fails. Here, this is the case where
n
is a negative and it raises an error. There should be a test ensuring that an exception is thrown.
Border Cases:
- These are cases where behavior changes based on some condition or boundary in the code. For example, there is a boundary at zero. If
n
is less than zero, then an exception is thrown. Ifn
is greater than or equal to zero, then an answer is computed. There should be three tests ensuring for proper behavior whenn
is equal to-1
,0
, and1
.
Note: In some cases you may be working with a function that performs some mathematical computation like our
square()
function. Whenever testing a function like this, there is not really a valid or invalid case. Here, you should test positive numbers, negative numbers, zero, and any other potential concerns.
Features of Pytest
Approx
When dealing with floating point numbers (i.e. decimals), computers have a hard time storing particular numbers within memory. For example,
>>> 0.1 + 0.2 == 0.3
False
To compensate for this limitation, pytest has a approx
function.
>>> import pytest
>>> 0.1 + 0.2 == pytest.approx(0.3)
True
By default, the tolerance on the approximation is 1e-6
. Provide a second argument to change the tolerance.
>>> import pytest
>>> 1.5 + 0.4 == pytest.approx(2)
False
>>> 1.5 + 0.4 == pytest.approx(2, 0.1)
True
>>> 1.5 + 0.6 == pytest.approx(2, 0.1)
True
Raises
Sometimes we design our code to raise errors. To test that our code does that, we can use pytest’s raises
function. We’ll talk more about Errors in Lab 7.
import pytest
def square_root(x):
if x < 0:
raise ValueError("Negative numbers not allowed")
return sqrt(x)
def test_square_root_raises_exception():
with pytest.raises(ValueError):
square_root(-4)
Required Questions
Write your code in lab06.py and your tests in test_lab06.py
Q1: Product and Summation
Write the tests for the
product
andsummation
functions first before writing any code for the function.
Write tests for a function called product
that takes in a integer parameter n
. product
returns the result of 1 · 2 · 3 · ... · n
; however, if n
is less than one or not an integer, raise
a ValueError
.
Additionally, write tests for a similar function called summation
that takes in a integer parameter n
. summation
returns the result of 1 + 2 + ... + n
; however, if n
is less than zero or not an integer, raise
a ValueError
.
To check if a number is an integer, use the
isinstance()
function. For example,
>>> value_in_question = 5 >>> isinstance(value_in_question, float) False >>> isinstance(value_in_question, int) True
When writing the tests, make sure to consider all cases. For example, product
should do the following:
- If
n
is less than one or not an integer,raise
aValueError
- If
n
is greater than or equal to one, compute1 · 2 · ... · n
Write tests that check if your code follows these rules by thinking of what inputs would cause each case.
Make sure to use the
raises
function that comes with pytest.
After writing the tests, for both functions, implement both functions. When you are done, run one of the following pairs in your terminal:
pytest test_lab06.py::test_summation
pytest test_lab06.py::test_product
python3 -m pytest test_lab06.py::test_summation
python3 -m pytest test_lab06.py::test_product
If you get an error, it is either due to poorly written tests or a poorly written function. If you are confident that your tests are correct, find the bug in the respective function.
Q2: Statistics
Your younger sibling (or cousin) was covering statistics in math class today and learned about the mean, median, mode, and standard deviation of a dataset. After working on two problems where they had to calculate each statistic by hand, they had had enough. They chose to write a program with functions that would do their homework for them; however, it does not work 😞. Your sibling has already spent more time trying to debug their program than it would have taken to complete their homework, and they are too tired to keep debugging. Now, they need your help to figure out what is wrong.
Write tests for each function they wrote – square
, sqrt
, mean
, etc. If the functions fail the tests, try to find the error in their code and fix it.
When fixing errors, do not delete an entire line or rewrite a function. The errors are small and should require you to add, delete, or replace a few things.
Some of their functions may work while others do not. Some functions may rely on other broken functions. To find what the expected outputs should be, rather than calculating them by hand, it is worth searching for a calculator on the web that will do it for you. Down below is a quick review of the mean, median, mode, and standard deviation of a dataset that your sibling (or cousin) used as reference.
Mean
To calculate the mean, find the sum of the dataset and divide it by the size/length of the dataset. For example, if the dataset was [1, 1, 1, 3, 4]
. The sum would be 10
and the size would be 5
, so the mean would be 10/5
or 2
.
Median
The median is the middle value of a sorted dataset. For example, if the dataset was [1, 2, 3, 4, 5]
, the median would be 3
. If there is no middle value in the dataset because there is an even amount of elements, the median would be the mean/average of the two values closest to the middle. For example, if the dataset was [1, 2, 3, 4, 5, 6]
, the two values closest to the middle are 3
and 4
. Taking the mean/average of those numbers gives 3.5
which would be the median.
Mode
The mode is the most common element in a dataset. For example, if the dataset was [1,2,1,1]
, the mode of the dataset would be 1
because it appears the most times. If two elements appear the same amount of times, the mode will be (for this lab) the element that appeared the most times first. For example, if the dataset was [1,1,2,2]
, the mode would be 1
.
Standard Deviation
The standard deviation represents the amount of variation of all the values in a dataset. To calculate it, we use the following formula:
$$\sigma = \sqrt{ \frac{\sum (x_i - \mu)^2 }{n} }$$
where
$\sigma$ = standard deviation
$x_i$ = individual data value
$\mu$ = mean
$n$ = dataset’s size
We can read this formula as:
- For each data value in the dataset
- Find the data value minus the mean. Square that result, and add it to a sum.
- Divide the sum by the size of the dataset.
- Take the square root of the result from step 2
Hint: Whenever you are working with floating point numbers, it is good practice to use the
approx()
function. Additionally, remember that the optional second parametertolerance
will be helpful.
Stat Analysis
This function is not broken and simply runs all of the above statistics functions on a list of numbers and returns a dictionary with all of the results. All you need to do for this function is write code to test it in test_lab06.py
. The tests for this function should cover the same general cases as the above functions (like incorrect input), as well as a good number of valid inputs. If you would like you can pool all the test cases for all the previous functions to create test cases for this function.
Submit
Submit the lab06.py
and test_lab06.py
files on Canvas to Gradescope in the window on the assignment page.
Grading on Gradescope
If you submit your lab to Gradescope, you will be graded on two things:
- Submitting working functions
- This will require you to write tests to identify the bugs in both the functions you write and the starter functions you’re given
- This will be graded with regular tests
- Submitting passing tests
- You should just submit the tests you wrote as you looked for bugs in the functions
- This will be graded by running your tests to make sure they pass
Normally, the starter files come with the tests that the autograder will run. But in this case, doing so would defeat the purpose of having you write tests in the first place! So, unlike other assignments, you won’t be given any tests in the starter files.
Note: Gradescope has two naming conventions. As an example, test_invert
will test the actual invert
function you submit, and test_test_invert
will test the test_invert
test you submit.
Optional Questions
Q3: Refactoring Product and Summation
You may have noticed that product
and summation
are very similar to each other in that they both raise
a ValueError
if n
is less than some number or if n
is not an integer. Additionally, both functions take the total of a function (add or multiply) applied on some range of values. Because of this, we can refactor our code so the functions have the same behavior but with a cleaner design.
To refactor our code, create three new functions:
product_short(n)
- same behavior asproduct
, but with a cleaner designsummation_short(n)
- same behavior assummation
, but with a cleaner designaccumulate(merger, initial, n)
accumulate
with contain the logic of applying some function merger
to initial
and to each value in the range from one to n
. It will then return the total after merger
has been applied to each value. (merger
will either be the add
or mul
functions.) Additionally, if n
is less than the initial
or not an integer, raise a ValueError
. For example,
>>> from operator import add, mul
>>> accumulate(add, 0, 3) # 0 + 1 + 2 + 3
6
>>> accumulate(add, 2, 3) # 2 + 1 + 2 + 3
8
>>> accumulate(mul, 2, 4) # 2 * 1 * 2 * 3 * 4
48
>>> accumulate(mul, 5, 0) # Raises a ValueError
Write tests for accumulate
and then implement accumulate
. (Feel free to use the examples given above in addition to the tests you write yourself.)
pytest test_lab06.py::test_accumulate
python3 -m pytest test_lab06.py::test_accumulate
Hint: Using the second example given above, add(2,1) gives 3, then add(3, 2) gives 5, then add(5, 3) gives 10
After implementing accumulate
, use the same tests from test_product
and test_summation
for test_product_short
and test_summation_short
to ensure that the new versions of each of the functions work the exact same. After that,implement product_short
and summation_short
by calling accumulate
with the right arguments. product_short
and summation_short
should contain one line each in their function bodies.
pytest test_lab06.py::test_summation_short
pytest test_lab06.py::test_product_short
python3 -m pytest test_lab06.py::test_summation_short
python3 -m pytest test_lab06.py::test_product_short
Q4: Invert and Change
Write the tests for the
invert
andchange
functions first before writing any code for the function
Write the tests for a function invert
that takes in a number x
and limit
as parameters. invert
calculates 1/x
, and if the quotient is less than the limit
, the function returns 1/x
; otherwise the function returns limit
. However, if x
is zero, the function raise
s a ZeroDivisionError
.
Write the tests second function change
that takes in numbers x
, y
and limit
as parameters and returns abs(y - x) / x
if it is less than the limit; otherwise the function returns the limit
. If x
is zero, raise
a ZeroDivisionError
.
Tests for Invert and Change
When writing the tests, make sure to consider all cases. For example, invert
should do the following:
- If
1/x
is less than thelimit
return1/x
- If
1/x
is greater than thelimit
returnlimit
- If
x
is zero, raise aZeroDivisionError
Write tests that check if your code follows these rules by thinking of what inputs would cause each case.
Now implement invert
and change
.
Check your work and run pytest in the terminal:
pytest
Q5: Refactor
Notice that invert
and change
have very similar logic in that you are dividing some numerator by x
and if the result is greater than the limit
then the function returns the limit
. Because of this, we can refactor our code so it has the same behavior but with a cleaner design.
To do this we are going to add three new functions:
invert_short
- same behavior asinvert
but designed differentlychange_short
- same behavior aschange
but designed differentlylimited
limited
will have three parameters numerator
, denominator
and limit
. It will contain the logic of dividing a numerator by the denominator, and if the result is greater than the limit
then the function returns the limit
, and it returns the result otherwise. However, if the denominator is zero, it raise
s a ZeroDivisionError
.
Now have invert_short
and change_short
call limited
appropriately to maintain the same behavior as invert
and change
.
Note:
invert_short
andchange_short
should have only one line in its body
Tests for Refactor
Implement two more test functions test_invert_short
and test_change_short
that ensures that those two functions behave the same as invert
and change
.
Check your work and run pytest in the terminal:
pytest
Additional Info
Code Coverage
The effectiveness of tests can be assessed with code coverage, which measures the number of lines of code that have been executed. Ideally, your tests should cover most if not all of your program.
Statement Coverage
The percentage of lines executed in a program is measured with statement coverage. This is the most general type of code coverage, meaning it is the least specific. It only tells us the quantity of code that was reached, and not much else.
Branch Coverage
A more accurate depiction of how extensive your tests are can be measured with branch coverage. We can represent a program with multiple outcomes into a tree with several branches. Here’s a simple example:
def greater_than_five(num):
if num > 5:
print("This number is greater than five.")
elif num == 5:
print("This number is five.")
else:
print("This number is less than five.")
Within this function, there are three possible branches we can take, which depend upon the parameter num
. Therefore, in order to get 100% branch coverage, we must write at least three tests for greater_than_five()
, changing num
so that each of the three branches is executed.
Multiple Condition Decision Coverage (MC/DC)
Often, branches of code are dependent on multiple conditions. We can represent all the possible outcomes of this function with a table.
def bool_operators(a,b):
if a and b:
print("Both are true!")
elif a or b:
print("One is true.")
else:
print("Neither are true. :(")
a | b | result |
---|---|---|
True | True | Both are true! |
True | False | One is true. |
False | True | One is true. |
False | False | Neither are true. :( |
In order to get 100% condition coverage, we would need test cases for all four of these combinations of conditions, even though two of them produce the same output.