Lab 23 - Plotting With MatPlotLib

Due by 11:59pm on April 15, 2025

Starter Files

Download lab23.zip. Inside the archive, you will find starter files for the questions in this lab.

Topics

Being able to visualize big datasets can help us recognize patterns. Fortunately, the MatPlotLib library helps us to create all sorts of graphs that can be used to do visualize data. We are going to be using the dataset from HW1 as well as another dataset containing brown dwarf data to practice creating plots and scatter plots. We will also learn to save the plots as PNG files.

Installing MatPlotLib (#installing-matplotlib)

Try one of the following to install matplotlib:

pip install matplotlib
python3 -m pip install matplotlib

Remember, you can always uninstall a library by doing pip uninstall <library_name> or python3 -m pip uninstall <library_name>

To test if you did it right, paste and run the following code:

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.show()

A graph similar to the following should show up.

plot example

MatPlotLib Review

There are various graphs and plots that you can graph using MatPlotLib

Plot TypeExampleCode
plot
x_points = [1, 5]
y_points = [1, 5]

plt.plot(x_points, y_points)
plt.show() # display the plot
scatter
x_points = [1, 2, 3, 4, 5]
y_points = [1, 3, 2, 4, 5]

plt.scatter(x_points, y_points)
plt.show() # display the plot
bar
categories = ['A', 'B', 'C', 'D']
y_points = [5, 1, 3, 1]

plt.bar(categories, y_points)
plt.show() # display the plot
histogram
frequencies = [
        1,1,1,1,1,1,  # 6 ones
        2,2,2,        # 3 twos
        3,            # 1 three
        4,4,          # 2 fours
        5             # 1 five
    ]

plt.hist(frequencies)
plt.show() # display the plot
pie
counts = [4, 1, 2, 3]

plt.pie(counts)
plt.show() # display the plot

Saving a Graph to a File and Clearing

If you want to save a graph to an output fie, then you can use .savefig() method. For example,

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file.png") # <----------

If you are trying to save several different graphs and plots to a few different files, make sure that once you are done creating the current figure that you clear the current figure using plt.clf() before you start creating the next one. Forgetting to clear the figure, as shown in the code below, will generate an image with mixed plots.

Code:

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file1.png")

plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")

counts = [4, 1, 2, 3]

plt.pie(counts)
plt.savefig("output_file3.png")

Output of ‘output_file3.png’:

Using plt.clf() to clear the figure prevents this issue.

import matplotlib.pyplot as plt

x_points = [1,5]
y_points = [1,5]

plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.clf() # <--------------------

plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
plt.clf() # <--------------------

counts = [4, 1, 2, 3]

plt.pie(counts)
plt.savefig("output_file3.png")
plt.clf() # <--------------------

Creating a Graph with Multiple Lines

If you want to create a graph with multiple lines, plot the y points of each line while using the same x points list. For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
line1_y_points = [6, 7, 8, 9, 10, 10]
line2_y_points = [1, 2, 3, 4, 5, 6]
# more line_y_points

plt.plot(x_points, line1_y_points)
plt.plot(x_points, line2_y_points)
# more line plots
plt.show()

Colors

To add colors, you can provide an additional argument when ploting. For example, if we wanted to make the line red, we can use 'r' within the .plot() method to specify we want the line to be red,

x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]

plt.plot(x_points, y_points, 'r')
#        x_list    y_list    color
ColorSyntax
Red‘r’
Green‘g’
Blue‘b’
Cyan‘c’
Magenta‘m’
Yellow‘y’
Black‘k’
White‘w’

Labels

Note: TAs should skip this section if there is not enough time.

To add axis labels, we can use

  • plt.title(<string>)
  • plt.xlabel(<string>)
  • plt.ylabel(<string>)

Each of these functions take in a string to use as the axis label.

For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]

plt.plot(x_points, y_points)

plt.title("Your Coolness")
plt.xlabel("Months")
plt.ylabel("Coolness Level")

plt.show()

Legends

Note: TAs should skip this section if there is not enough time.

If we want to give a legend detailing what each line describes, when we plot it we have to provide a label argument set to a string. Once that is done, we can use plt.legend() to create a legend. For example,

import matplotlib.pyplot as plt

x_points = [0, 1, 2, 3, 4, 5]
y1_points = [6, 7, 8, 9, 10, 10]
y2_points = [4, 5, 6, 5, 7, 10]

plt.plot(x_points, y1_points, label="Isaih") # <------------
plt.plot(x_points, y2_points, label="Jake") # <------------

plt.title("TA Coolness Levels")
plt.xlabel("Months")
plt.ylabel("Coolness Level")

plt.legend() # <----------------

plt.show()

Additional Info About Histograms

Note: TAs should skip this section if there is not enough time.

For project 4, you will need to create a histogram. In this histogram, you’ll need to specify the number of bins you’ll use to sort your data. Recall that a histogram details the frequency of some value, like numbers.

Number of Bins

By default, when you create a histogram, the graph will have 10 bins. If you want to change the number of bins, provide a second argument when ploting the histogram. For example, if we provided a 3 as the second argument, all the data would now go into 3 different bins.

import matplotlib.pyplot as plt

frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, 3)
plt.show()

Compare this histogram to the histogram plotted without the 3 argument.

Bin Ranges

Additionally, you can specify the range of each bin by passing in a list of of numbers. For example, if bins is [1, 2, 3, 4, 5, 6], the first bin would be between 1 and 2 (excluding 2); the second bin would be between 2 and 3 (excluding 3); the third bin would be between 3 and 4 (excluding 4); etc.

import matplotlib.pyplot as plt

frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
plt.show()

Demo this code and play with changing the bin ranges

Return Values of hist()

The hist() function returns three things. For this class, we will only use the first two. The first item is a list of the number of items within each bin. The second item is a list of the bin values.

For example,

import matplotlib.pyplot as plt
frequencies = [
        1,1,1,1,1,1,  # 6 ones
        2,2,2,        # 3 twos
        3,            # 1 three
        4,4,          # 2 fours
        5             # 1 five
    ]
bin_counts, bin_nums, item = plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
print(bin_counts)
print(bin_nums)

Would result in:

[6. 3. 1. 2. 1.] # the number of items within each bin
[1. 2. 3. 4. 5. 6.] # list of the bin values

(Note that each number in the lists are floats)

Required Questions

Plot Party In Provo!

Q1: GPA and SAT Histograms

Open and read admission_algorithms_dataset.csv and store each of the students’ GPA and SAT into their own respective lists. Using those two lists, generate two histograms with one histogram displaying data of all the students’ GPA and the other displaying all the students’ SAT data. The GPA histogram should be saved in a file called gpa.png, and the SAT histogram should be saved in a file called sat_score.png. The naming must be exact so that the tests can find these files.

Recall that the file is organized like the following

Student,SAT,GPA,Interest,High School Quality,Sem1,Sem2,Sem3,Sem4

Write your code under the plot_histogram function.

Hint(s): Recall the .split(<delimiter>) method of a string, and remember to convert the strings that are representing numbers into floats.

Q2: Correlation between GPA and SAT

Using the same GPA and SAT lists from the previous problem, create a scatter plot between those two lists with the GPA as the x-axis and SAT as the y-axis. Save the graph to a file called correlation.png.

Are there outliers? What is off about those students’ stats?

Q3: Spectra

Open and read spectrum1.txt and spectrum2.txt. These files will contain two columns representing the Wavelength and Flux respectively (which detail the intensity of light of an astronomical object). On a single graph, plot both datasets as a line plot with different colors. Data from spectrum1.txt will be blue, and data from spectrum2.txt will be green. Wavelength should be on the x-axis, and Flux should be on the y-axis. Save the final graph as spectra.png.

Note: The data in the files are separated by four spaces. To separate a line into its separate number components, use line.split() to get a list with the two numbers represented as strings

For those interested, these are models of brown dwarfs, which are objects bigger than planets but not quite big enough to start hydrogen fusion in their cores and become stars. Both have surface temperatures of 1700 Kelvin (about 2600 degrees Fahrenheit). Spectrum 1 has no silicate clouds and spectrum 2 has fairly dense silicate clouds. The spectra look similar but there are differences in the wavelength where the flux is emitted because while they emit the same amount of energy, the clouds block light in some regions so the energy has to come out somewhere else.

Submit

Submit the lab23.py file on Canvas to Gradescope in the window on the assignment page. The autograder will test each function individually, so it doesn’t matter what your main function does when you submit it.