Homework 5 - Merge Sort
Due by 11:59pm on 2023-11-01.
Example Input and Output
- Input: A txt file of unsorted numbers
- Output: A new sorted txt file from the original data (sorted from lowest to highest using merge sort)
Objectives
- Practice list manipulations
- Practice the use of recursion
- Practice the use of file I/O
- Practice the use of command line arguments
- (maybe) Practice writing tests (we will need to add some test writing into this if we want this as an objective)
Introduction
The merge sort algorithm is a simple, recursive, divide and conquer algorithm that quickly sorts a list of data. It works like this:
- Sort the left half of the list
- Sort the right half of the list
- Merge the sorted lists
The sorting is where the recursion comes in as it is done by simply calling the merge sort function again on the smaller piece of the list. The base case is when there is only one item in the list. In that case, the list is sorted and can be returned.
The merging process is done by iterating through each list (the left and right half) and picking the smallest item to be added back into the the original list. Once one list is empty, you simply copy all the remaining items from that list back into the original.
For this homework you will write a simple sort utility that takes as command line arguments two filenames, an input file with data to be sorted, one item per line, and an output file that the sorted data will be written to, again one item per line.
The homework05.zip archive has the test files for this homework.
Part 1 - Getting started
Task 1 - Set up
In the mergesort.py file, which is provided in the folder download above, you should set it up so that the file can be run as a program (add the if __name__ == "__main__"
statement we've had in other projects and homeworks). It's up to you if you want to have a main()
function to call or not.
Task 2 - File I/O
Add code to your program to read in the data from the input file and store it in a list. Then write code to write the list to the output file. You'll need to access the command line arguments to get the file names. If you run your program and the output file looks exactly like the input file, you've done this part correctly. At this point your program is just a file copy command. It's time to add sorting.
Part 2 - Sorting
For this homework, we won't be specifying what your functions should be named or even what their input parameters should be. You have enough experience that you should be able to determine that for yourself from the problem requirements. The auto grader will not be testing individual function in this homework but rather just that your program completes the necessary task, in this case, sorting an arbitrary set of values.
Task 1 - The merge function
The first step is to write a function to merge two lists as described in the introduction. It should take as input two sorted lists and return a single merged list that is still sorted.
To do this you need to look at the first element of each list, pick the smallest, remove it from the list (or move a pointer to the next index) and store that smallest value in the new list to be return. You will continue to do this until one of the lists is empty. At that point, you copy all the remaining elements from the other list into the merged list which is then returned.
Write some doc tests for this function to verify that it is working properly.
Task 2 - The sorting function
Now that we can merge two sorted lists, all that is needed is to write the sorting function. As input it should take a list, and should return a sorted list. This is the function that implements the steps described in the introduction and has the recursion.
In this function you simply split the list in half, recursively call the function on both parts and then call your merge function to merge the two sorted halves. You'll need to think about what your base case is (how do you absolutely know that an arbitrary list is sorted?) and add that case as well.
Add some doc tests to verify that your sorting function is working.
Task 3 - Putting it all together
Now that your sorting function works, you just need to add code to call it from the main part of your program. You should pass in the list you read from the input file and write the sorted list to the output file.
Turn in your work
You'll submit your mergesort.py file on Canvas via Gradescope where it will be checked via the auto grader. We will be testing your program using a number of different data files, some number, some strings, and some mixed data. Make sure that you haven't "hard coded" anything specific to the test data we gave you. We do not guarantee that all scenarios are tested by the test data that we have provided you.