Lab 23 - Plotting With MatPlotLib
Due by 11:59pm on April 15, 2025
Starter Files
Download lab23.zip. Inside the archive, you will find starter files for the questions in this lab.
Topics
Being able to visualize big datasets can help us recognize patterns. Fortunately, the MatPlotLib library helps us to create all sorts of graphs that can be used to do visualize data. We are going to be using the dataset from HW1 as well as another dataset containing brown dwarf data to practice creating plots and scatter plots. We will also learn to save the plots as PNG files.
Installing MatPlotLib (#installing-matplotlib)
Try one of the following to install matplotlib:
pip install matplotlib
python3 -m pip install matplotlib
Remember, you can always uninstall a library by doing
pip uninstall <library_name>
orpython3 -m pip uninstall <library_name>
To test if you did it right, paste and run the following code:
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.show()
A graph similar to the following should show up.

MatPlotLib Review
There are various graphs and plots that you can graph using MatPlotLib
Plot Type | Example | Code |
---|---|---|
plot | ![]() |
|
scatter | ![]() |
|
bar | ![]() |
|
hist ogram | ![]() |
|
pie | ![]() |
|
Saving a Graph to a File and Clearing
If you want to save a graph to an output fie, then you can use .savefig()
method. For example,
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file.png") # <----------
If you are trying to save several different graphs and plots to a few different files, make sure that once you are done creating the current figure that you clear the current figure using plt.clf()
before you start creating the next one. Forgetting to clear the figure, as shown in the code below, will generate an image with mixed plots.
Code:
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
counts = [4, 1, 2, 3]
plt.pie(counts)
plt.savefig("output_file3.png")
Output of ‘output_file3.png’:

Using plt.clf()
to clear the figure prevents this issue.
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.clf() # <--------------------
plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
plt.clf() # <--------------------
counts = [4, 1, 2, 3]
plt.pie(counts)
plt.savefig("output_file3.png")
plt.clf() # <--------------------
Creating a Graph with Multiple Lines
If you want to create a graph with multiple lines, plot the y points of each line while using the same x points list. For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
line1_y_points = [6, 7, 8, 9, 10, 10]
line2_y_points = [1, 2, 3, 4, 5, 6]
# more line_y_points
plt.plot(x_points, line1_y_points)
plt.plot(x_points, line2_y_points)
# more line plots
plt.show()
Colors
To add colors, you can provide an additional argument when plot
ing. For example, if we wanted to make the line red, we can use 'r'
within the .plot() method to specify we want the line to be red,
x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]
plt.plot(x_points, y_points, 'r')
# x_list y_list color
Color | Syntax |
---|---|
Red | ‘r’ |
Green | ‘g’ |
Blue | ‘b’ |
Cyan | ‘c’ |
Magenta | ‘m’ |
Yellow | ‘y’ |
Black | ‘k’ |
White | ‘w’ |
Labels
Note: TAs should skip this section if there is not enough time.
To add axis labels, we can use
plt.title(<string>)
plt.xlabel(<string>)
plt.ylabel(<string>)
Each of these functions take in a string to use as the axis label.
For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]
plt.plot(x_points, y_points)
plt.title("Your Coolness")
plt.xlabel("Months")
plt.ylabel("Coolness Level")
plt.show()

Legends
Note: TAs should skip this section if there is not enough time.
If we want to give a legend detailing what each line describes, when we plot it we have to provide a label
argument set to a string. Once that is done, we can use plt.legend()
to create a legend. For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
y1_points = [6, 7, 8, 9, 10, 10]
y2_points = [4, 5, 6, 5, 7, 10]
plt.plot(x_points, y1_points, label="Isaih") # <------------
plt.plot(x_points, y2_points, label="Jake") # <------------
plt.title("TA Coolness Levels")
plt.xlabel("Months")
plt.ylabel("Coolness Level")
plt.legend() # <----------------
plt.show()

Additional Info About Histograms
Note: TAs should skip this section if there is not enough time.
For project 4, you will need to create a histogram. In this histogram, you’ll need to specify the number of bins you’ll use to sort your data. Recall that a histogram details the frequency of some value, like numbers.
Number of Bins
By default, when you create a histogram, the graph will have 10 bins. If you want to change the number of bins, provide a second argument when ploting the histogram. For example, if we provided a 3
as the second argument, all the data would now go into 3 different bins.
import matplotlib.pyplot as plt
frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, 3)
plt.show()
Compare this histogram to the histogram plotted without the
3
argument.
Bin Ranges
Additionally, you can specify the range of each bin by passing in a list of of numbers. For example, if bins is [1, 2, 3, 4, 5, 6]
, the first bin would be between 1
and 2
(excluding 2
); the second bin would be between 2
and 3
(excluding 3
); the third bin would be between 3
and 4
(excluding 4
); etc.
import matplotlib.pyplot as plt
frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
plt.show()
Demo this code and play with changing the bin ranges
Return Values of hist()
The hist()
function returns three things. For this class, we will only use the first two. The first item is a list of the number of items within each bin. The second item is a list of the bin values.
For example,
import matplotlib.pyplot as plt
frequencies = [
1,1,1,1,1,1, # 6 ones
2,2,2, # 3 twos
3, # 1 three
4,4, # 2 fours
5 # 1 five
]
bin_counts, bin_nums, item = plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
print(bin_counts)
print(bin_nums)
Would result in:
[6. 3. 1. 2. 1.] # the number of items within each bin
[1. 2. 3. 4. 5. 6.] # list of the bin values
(Note that each number in the lists are floats)
Required Questions
Plot Party In Provo!
Q1: GPA and SAT Histograms
Open and read admission_algorithms_dataset.csv
and store each of the students’ GPA and SAT into their own respective lists. Using those two lists, generate two histograms with one histogram displaying data of all the students’ GPA and the other displaying all the students’ SAT data. The GPA histogram should be saved in a file called gpa.png
, and the SAT histogram should be saved in a file called sat_score.png
. The naming must be exact so that the tests can find these files.
Recall that the file is organized like the following
Student,SAT,GPA,Interest,High School Quality,Sem1,Sem2,Sem3,Sem4
Write your code under the plot_histogram
function.
Hint(s): Recall the
.split(<delimiter>)
method of a string, and remember to convert the strings that are representing numbers into floats.
Q2: Correlation between GPA and SAT
Using the same GPA and SAT lists from the previous problem, create a scatter plot between those two lists with the GPA as the x-axis and SAT as the y-axis. Save the graph to a file called correlation.png
.
Are there outliers? What is off about those students’ stats?
Q3: Spectra
Open and read spectrum1.txt
and spectrum2.txt
. These files will contain two columns representing the Wavelength and Flux respectively (which detail the intensity of light of an astronomical object). On a single graph, plot both datasets as a line plot with different colors. Data from spectrum1.txt
will be blue, and data from spectrum2.txt
will be green. Wavelength should be on the x-axis, and Flux should be on the y-axis. Save the final graph as spectra.png
.
Note: The data in the files are separated by four spaces. To separate a line into its separate number components, use
line.split()
to get a list with the two numbers represented as strings
For those interested, these are models of brown dwarfs, which are objects bigger than planets but not quite big enough to start hydrogen fusion in their cores and become stars. Both have surface temperatures of 1700 Kelvin (about 2600 degrees Fahrenheit). Spectrum 1 has no silicate clouds and spectrum 2 has fairly dense silicate clouds. The spectra look similar but there are differences in the wavelength where the flux is emitted because while they emit the same amount of energy, the clouds block light in some regions so the energy has to come out somewhere else.
Submit
Submit the lab23.py
file on Canvas to Gradescope in the window on the assignment page. The autograder will test each function individually, so it doesn’t matter what your main function does when you submit it.