Lab 22 - Plotting With MatPlotLib
Due by 11:59pm on 2023-04-11.
Starter Files
Download lab22.zip. Inside the archive, you will find starter files for the questions in this lab.
Topics
Being able to visualize big datasets can help us recognize patterns. Fortunately, the MatPlotLib library helps us to create all sorts of graphs that can be used to do visualize data. We are going to be using the dataset from HW1 and brown dwarfs to demonstrate creating plots, scatter plots, and saving them as PNG files.
Installing MatPlotLib (#installing-matplotlib)
Try one of the following to install matplotlib:
pip install matplotlib
python3 -m pip install matplotlib
Remember, you can always uninstall a library by doing
pip uninstall <library_name>
orpython3 -m pip uninstall <library_name>
To test if you did it right, paste the following code and run it:
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.show()
A graph similar to the following should show up.
MatPlotLib Review
There are various graphs and plots that you can graph using MatPlotLib
Plot Type | Example | Code |
---|---|---|
plot
|
|
|
scatter
|
|
|
bar
|
|
|
hist ogram
|
|
|
pie
|
|
Saving a Graph to a File and Clearing
If you want to save a graph to some output fie, then you can use .savefig()
method. For example,
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file.png") # <----------
If you are trying to save to some amount files several different graphs and plots, make sure to clear it after you are done creating the figure and before you start creating the next one. Not doing so, like the code below, will generate a image with mixed plots.
Code:
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
counts = [4, 1, 2, 3]
plt.pie(counts)
plt.savefig("output_file3.png")
Output of 'output_file3.png':
You can prevent this by using plt.clf
to clear the figure.
import matplotlib.pyplot as plt
x_points = [1,5]
y_points = [1,5]
plt.plot(x_points, y_points)
plt.savefig("output_file1.png")
plt.clf() # <--------------------
plt.scatter(x_points, y_points)
plt.savefig("output_file2.png")
plt.clf() # <--------------------
counts = [4, 1, 2, 3]
plt.pie(counts)
plt.savefig("output_file3.png")
plt.clf() # <--------------------
Creating a Graph with Multiple Lines
If you want to create a graph with multiple lines, plot the y points of each line while using the same x points list. For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
line1_y_points = [6, 7, 8, 9, 10, 10]
line2_y_points = [1, 2, 3, 4, 5, 6]
# more line_y_points
plt.plot(x_points, line1_y_points)
plt.plot(x_points, line2_y_points)
# more line plots
plt.show()
Colors
To add colors, you can provide an additional argument when plot
ing. For example, if we wanted
to make the line red, we can use 'r'
when plotting,
x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]
plt.plot(x_points, y_points, 'r')
# x_list y_list color
Color | Syntax |
---|---|
Red | 'r' |
Green | 'g' |
Blue | 'b' |
Cyan | 'c' |
Magenta | 'm' |
Yellow | 'y' |
Black | 'k' |
White | 'w' |
Labels
Note: TAs should skip this section if there is not enough time.
To add labels, we can use
plt.title(<string>)
plt.xLabel(<string>)
plt.yLabel(<string>)
Each of these functions take in a string to use.
For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
y_points = [6, 7, 8, 9, 10, 10]
plt.plot(x_points, y_points)
plt.title("Your Coolness")
plt.xlabel("Months")
plt.ylabel("Coolness Level")
plt.show()
Legends
Note: TAs should skip this section if there is not enough time.
If we want to give a legend detailing what each line describes, when we plot it we have to provide
a label
argument set to a string. Once that is done, we can use plt.legend()
to create a legend.
For example,
import matplotlib.pyplot as plt
x_points = [0, 1, 2, 3, 4, 5]
y1_points = [6, 7, 8, 9, 10, 10]
y2_points = [4, 5, 6, 5, 7, 10]
plt.plot(x_points, y1_points, label="Isaih") # <------------
plt.plot(x_points, y2_points, label="Jake") # <------------
plt.title("TA Coolness Levels")
plt.xlabel("Months")
plt.ylabel("Coolness Level")
plt.legend() # <----------------
plt.show()
Additional Info About Histograms
Note: TAs should skip this section if there is not enough time.
For project 4, you will have to specify the number of bins wanted in your histogram graph. Recall that a histogram details the frequency of some value, like numbers.
Number of Bins
By default, when you create a histogram, the graph will have 10 bins. If you want to change the number of bins,
provide a second argument when ploting the histogram. For example, if we provided a 3
as the second argument,
all the data would now go into 3 different bins.
import matplotlib.pyplot as plt
frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, 3)
plt.show()
Compare this histogram to the histogram plotted without the
3
argument.
Bin Ranges
Additionally, you can specify the range of each bin by passing in a list of of numbers.
For example, if bins is [1, 2, 3, 4, 5, 6]
, the first bin would be between 1
and 2
(excluding 2
);
the second bin would be between 2
and 3
(excluding 3
); The third bin would be between 3
and 4
(excluding 4
); etc.
import matplotlib.pyplot as plt
frequencies = [1,1,1,1,1,1, 2,2,2, 3, 4,4, 5]
plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
plt.show()
Demo this code and play with changing the bin ranges
Return Values of hist()
Additionally, the hist()
function returns three things. For this class, we will only use the first two.
The first item is a list of the number of items within each bin.
The second item is a list of the bin values.
For example,
import matplotlib.pyplot as plt
frequencies = [
1,1,1,1,1,1, # 6 ones
2,2,2, # 3 twos
3, # 1 three
4,4, # 2 fours
5 # 1 five
]
bin_counts, bin_nums, item = plt.hist(frequencies, [1, 2, 3, 4, 5, 6])
print(bin_counts)
print(bin_nums)
Would result in:
[6. 3. 1. 2. 1.]
[1. 2. 3. 4. 5. 6.]
(Note that each number in the lists are floats)
Required Questions
Plot Party In Provo!
Q1: GPA and SAT Histograms
Open and read admission-algorithms-dataset.csv
and store each of the students' GPA and SAT into
their own respective lists. Using those two lists, generate two histograms with one histogram displaying
data of all the students' GPA and the other displaying all the students' SAT data. The GPA histogram
should be saved in a file called gpa.png
, and the SAT histogram should be saved in a file called sat_score.png
.
Recall that the file is organized like the following
Student,SAT,GPA,Interest,High School Quality,Sem1,Sem2,Sem3,Sem4
Write your code under the plot_histogram
function.
Recall the
.split(<delimiter>)
method of a string and remember to convert the strings that are representing numbers into floats.
Q2: Correlation between GPA and SAT
Using the same GPA and SAT lists from the previous problem, create a scatter plot between those two
lists with the GPA as the x-axis and SAT as the y-axis. Save the graph to a file called correlation.png
.
Are there outliers? What is off about those students' stats?
Q3: Spectra
Open and read spectrum1.txt
and spectrum2.txt
. These files will contain two columns representing the
wavelength and flux respectively (which detail the intensity of light of an astronomical object). On a single graph,
plot both datasets as a line plot with different colors. Data from spectrum1.txt
will be blue, and data from
spectrum2.txt
will be green. Wavelength should be on the x-axis, and Flux should be on the y-axis. Save the
final graph as spectra.png
.
Note: The data in the files are separated by four spaces. To separate a line into its separate number components, use
line.split()
to get a list with the two numbers represented as strings
For those interested, these are models of brown dwarfs, which are objects bigger than planets but not quite big enough to start hydrogen fusion in their cores and become stars. Both have surface temperatures of 1700 Kelvin (about 2600 degrees Fahrenheit). Spectrum 1 has no silicate clouds and spectrum 2 has fairly dense silicate clouds. The spectra look similar but there are differences in the wavelength where the flux is emitted because while they emit the same amount of energy, the clouds block light in some regions so the energy has to come out somewhere else.
Submit
Submit the lab22.py
file on Canvas
to Gradescope in the window on the assignment page.