Running and Quitting
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How can I run Python programs?
Objectives
Launch the Jupyter Notebook, create new notebooks, and exit the Notebook.
Create Markdown cells in a notebook.
Create and run Python cells in a notebook.
Python programs are plain text files.
- They have the
.py
extension to let everyone (including the operating system) know it is a Python program.- This is convention, not a requirement.
- It’s common to write them using a text editor but we are going to use the Jupyter Notebook to present the tutorials.
- The bit of extra setup is well worth it because the Notebook provides code completion and other helpful features.
- Notebook files have the extension
.ipynb
to distinguish them from plain-text Python programs.- Can export as “pure Python” to run from the command line.
Use the Jupyter Notebook for editing and running Python.
- The Anaconda package manager is an automated way to install the Jupyter notebook.
- See the setup instructions for Anaconda installation instructions.
- It also installs all the extra libraries it needs to run.
-
Once you have installed Python and the Jupyter Notebook requirements, open a shell and type:
$ jupyter notebook
- This will start a Jupyter Notebook server and open your default web browser.
- The server runs locally on your machine only and does not use an internet connection.
- The server sends messages to your browser.
- The server does the work and the web browser renders the notebook.
- You can type code into the browser and see the result when the web page talks to the server.
- This has several advantages:
- You can easily type, edit, and copy and paste blocks of code.
- Tab complete allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc. to make it more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
Spyder
You can also use a Python Integrated Development Environment
PyCharm
You can also use a Python Integrated Development Environment
Jupyter notebook
Math with numbers
- Addition
2+3
5
- Multiplication
2*2
4
- More complex
9.81*10.2**2/2
510.3162
- Power
3 ** 2
9
Math with strings (?)
- Concatenation
"cat " + "dog"
'cat dog'
- repetition
"cat "*3 + "dog"
'cat cat cat dog'
- subtraction
"cat " - "dog"
...
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Operators
- Addition
6 + 7
13
- Multiplication
6 * 7
42
- Division
6 / 7
0.8571428571428571
- Power
6 ** 7 # power
279936
- Modulo
66 % 7 # Modulo (remainder)
3
Logic
- Less than
6 < 7
True
- less than or equal to
6 <= 7
True
- equal to
6 == 7
False
- greater than or equal to
6 >= 7
False
- greater than
6 > 7
False
- logical and
True and False
False
- logical or
True or False
True
- logical not
not False
True
Key Points
Python programs are plain text files.
Use the Jupyter Notebook for editing and running Python.
The Notebook has Command and Edit modes.
Use the keyboard and mouse to select and edit cells.
The Notebook will turn Markdown into pretty-printed documentation.
Markdown does most of what HTML does.
Variables and Assignment
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I store data in programs?
Objectives
Write programs that assign scalar values to variables and perform calculations with those values.
Correctly trace value changes in programs that use scalar assignment.
Use variables to store values.
- Variables are names for values.
- In Python the
=
symbol assigns the value on the right to the name on the left. - The variable is created when a value is assigned to it.
- Here, Python assigns an age to a variable
age
and a name in quotes to a variablefirst_name
.
age = 42
first_name = 'Ahmed'
- Variable names
- can only contain letters, digits, and underscore
_
(typically used to separate words in long variable names) - cannot start with a digit
- can only contain letters, digits, and underscore
- Variable names that start with underscores like
__alistairs_real_age
have a special meaning so we won’t do that until we understand the convention.
Use print
to display values.
- Python has a built-in function called
print
that prints things as text. - Call the function (i.e., tell Python to run it) by using its name.
- Provide values to the function (i.e., the things to print) in parentheses.
- To add a string to the printout, wrap the string in single or double quotes.
- The values passed to the function are called ‘arguments’
print(first_name, 'is', age, 'years old')
Ahmed is 42 years old
print
automatically puts a single space between items to separate them.- And wraps around to a new line at the end.
Variables must be created before they are used.
- If a variable doesn’t exist yet, or if the name has been mis-spelled,
Python reports an error.
- Unlike some languages, which “guess” a default value.
print(last_name)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-c1fbb4e96102> in <module>()
----> 1 print(last_name)
NameError: name 'last_name' is not defined
- The last line of an error message is usually the most informative.
- We will look at error messages in detail later.
Variables can be used in calculations.
- We can use variables in calculations just as if they were values.
- Remember, we assigned 42 to
age
a few lines ago.
- Remember, we assigned 42 to
age = age + 3
print('Age in three years:', age)
Age in three years: 45
Use an index to get a single character from a string.
- The characters (individual letters, numbers, and so on) in a string are ordered. For example, the string ‘AB’ is not the same as ‘BA’. Because of this ordering, we can treat the string as a list of characters.
- Each position in the string (first, second, etc.) is given a number. This number is called an index or sometimes a subscript.
- Indices are numbered from 0.
- Use the position’s index in square brackets to get the character at that position.
atom_name = 'helium'
print(atom_name[0])
h
Use a slice to get a substring.
- A part of a string is called a substring. A substring can be as short as a single character.
- An item in a list is called an element. Whenever we treat a string as if it were a list, the string’s elements are its individual characters.
- A slice is a part of a string (or, more generally, any list-like thing).
- We take a slice by using
[start:stop]
, wherestart
is replaced with the index of the first element we want andstop
is replaced with the index of the element just after the last element we want. - Mathematically, you might say that a slice selects
[start:stop)
. - The difference between stop and start is the slice’s length.
- Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.
atom_name = 'sodium'
print(atom_name[0:3])
sod
Use the built-in function len
to find the length of a string.
print(len('helium'))
6
- Nested functions are evaluated from the inside out, just like in mathematics.
Python is case-sensitive.
- Python thinks that upper- and lower-case letters are different,
so
Name
andname
are different variables. - There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.
Use meaningful variable names.
- Python doesn’t care what you call variables as long as they obey the rules (alphanumeric characters and the underscore).
flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')
- Use meaningful variable names to help other people understand what the program does.
- The most important “other person” is your future self.
Predicting Values
What is the final value of
position
in the program below? (Try to predict the value without running the program, then check your prediction.)initial = 'left' position = initial initial = 'right'
Solution
'left'
The
initial
variable is assigned the value ‘left’. In the second line, theposition
variable also receives the string value ‘left’. In third line, theinitial
variable is given the value ‘right’, but theposition
variable retains its string value of ‘left’.
Challenge
If you assign
a = 123
, what happens if you try to get the second digit ofa
viaa[1]
?Solution
Numbers are not stored in the written representation, so they can’t be treated like strings.
a = 123 print(a[1])
TypeError: 'int' object is not subscriptable
Choosing a Name
Which is a better variable name,
m
,min
, orminutes
? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:
ts = m * 60 + s
tot_sec = min * 60 + sec
total_seconds = minutes * 60 + seconds
Solution
minutes
is better becausemin
might mean something like “minimum” (and actually does in Python, but we haven’t seen that yet).
Slicing
What does the following program print?
atom_name = 'carbon' print('atom_name[1:3] is:', atom_name[1:3])
atom_name[1:3] is: ar
- What does
thing[low:high]
do?- What does
thing[low:]
(without a value after the colon) do?- What does
thing[:high]
(without a value before the colon) do?- What does
thing[:]
(just a colon) do?- What does
thing[number:negative-number]
do?- What happens when you choose a
high
value which is out of range? (i.e., tryatom_name[0:15]
)
Key Points
Use variables to store values.
Use
Variables persist between cells.
Variables must be created before they are used.
Variables can be used in calculations.
Use an index to get a single character from a string.
Use a slice to get a substring.
Use the built-in function
len
to find the length of a string.Python is case-sensitive.
Use meaningful variable names.
Data Types and Type Conversion
Overview
Teaching: 5 min
Exercises: 5 minQuestions
What kinds of data do programs store?
How can I convert one type to another?
Objectives
Explain key differences between integers and floating point numbers.
Explain key differences between numbers and character strings.
Use built-in functions to convert between integers, floating point numbers, and strings.
Every value has a type.
- Every value in a program has a specific type.
- Integer (
int
): represents positive or negative whole numbers like 3 or -512. - Floating point number (
float
): represents real numbers like 3.14159 or -2.5. - Character string (usually called “string”,
str
): text.- Written in either single quotes or double quotes (as long as they match).
- The quote marks aren’t printed when the string is displayed.
Use the built-in function type
to find the type of a value.
- Use the built-in function
type
to find out what type a value has. - Works on variables as well.
- But remember: the value has the type — the variable is just a label.
print(type(52))
<class 'int'>
fitness = 'average'
print(type(fitness))
<class 'str'>
Types control what operations (or methods) can be performed on a given value.
- A value’s type determines what the program can do to it.
print(5 - 3)
2
print('hello' - 'h')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-67f5626a1e07> in <module>()
----> 1 print('hello' - 'h')
TypeError: unsupported operand type(s) for -: 'str' and 'str'
You can use the “+” and “*” operators on strings.
- “Adding” character strings concatenates them.
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)
Ahmed Walsh
- Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
- Since multiplication is repeated addition.
separator = '=' * 10
print(separator)
==========
Strings have a length (but numbers don’t).
- The built-in function
len
counts the number of characters in a string.
print(len(full_name))
11
- But numbers don’t have a length (not even zero).
print(len(52))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-f769e8e8097d> in <module>()
----> 1 print(len(52))
TypeError: object of type 'int' has no len()
Must convert numbers to strings or vice versa when operating on them.
- Cannot add numbers and strings.
print(1 + '2')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-fe4f54a023c6> in <module>()
----> 1 print(1 + '2')
TypeError: unsupported operand type(s) for +: 'int' and 'str'
- Not allowed because it’s ambiguous: should
1 + '2'
be3
or'12'
? - Some types can be converted to other types by using the type name as a function.
print(1 + int('2'))
print(str(1) + '2')
3
12
Can mix integers and floats freely in operations.
- Integers and floating-point numbers can be mixed in arithmetic.
- Python 3 automatically converts integers to floats as needed. (Integer division in Python 2 will return an integer, the floor of the division.)
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)
half is 0.5
three squared is 9.0
Variables only change value when something is assigned to them.
- If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
- This does not happen in programming languages.
first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)
first is 2 and second is 5
- The computer reads the value of
first
when doing the multiplication, creates a new value, and assigns it tosecond
. - After that,
second
does not remember where it came from.
Fractions
What type of value is 3.4? How can you find out?
Solution
It is a floating-point number (often abbreviated “float”).
print(type(3.4))
<class 'float'>
Automatic Type Conversion
What type of value is 3.25 + 4?
Solution
It is a float: integers are automatically converted to floats as necessary.
result = 3.25 + 4 print(result, 'is', type(result))
7.25 is <class 'float'>
Choose a Type
What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?
- Number of days since the start of the year.
- Time elapsed from the start of the year until now in days.
- Serial number of a piece of lab equipment.
- A lab specimen’s age
- Current population of a city.
- Average population of a city over time.
Solution
The answers to the questions are:
- Integer, since the number of days would lie between 1 and 365.
- Floating point, since fractional days are required
- Character string if serial number contains letters and numbers, otherwise integer if the serial number consists only of numerals
- This will vary! How do you define a specimen’s age? whole days since collection (integer)? date and time (string)?
- Choose floating point to represent population as large aggreates (eg millions), or integer to represent population in units of individuals.
- Floating point number, since an average is likely to have a fractional part.
Division Types
In Python 3, the
//
operator performs integer (whole-number) floor division, the/
operator performs floating-point division, and the ‘%’ (or modulo) operator calculates and returns the remainder from integer division:print('5 // 3:', 5//3) print('5 / 3:', 5/3) print('5 % 3:', 5%3)
5 // 3: 1 5 / 3: 1.6666666666666667 5 % 3: 2
However in Python2 (and other languages), the
/
operator between two integer types perform a floor (//
) division. To perform a float division, we have to convert one of the integers to float.print('5 // 3:', 1) print('5 / 3:', 1 ) print('5 / float(3):', 1.6666667 ) print('float(5) / 3:', 1.6666667 ) print('float(5 / 3):', 1.0 ) print('5 % 3:', 2)
If
num_subjects
is the number of subjects taking part in a study, andnum_per_survey
is the number that can take part in a single survey, write an expression that calculates the number of surveys needed to reach everyone once.Solution
We want the minimum number of surveys that reaches everyone once, which is the rounded up value of
num_subjects / num_per_survey
. This is equivalent to performing an integer division with//
and adding 1.num_subjects = 600 num_per_survey = 42 num_surveys = num_subjects // num_per_survey + 1 print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
600 subjects, 42 per survey: 15
Strings to Numbers
Where reasonable,
float()
will convert a string to a floating point number, andint()
will convert a floating point number to an integer:print("string to float:", float("3.4")) print("float to int:", int(3.4))
string to float: 3.4 float to int: 3
If the conversion doesn’t make sense, however, an error message will occur
print("string to float:", float("Hello world!"))
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-5-df3b790bf0a2> in <module>() ----> 1 print("string to float:", float("Hello world!")) ValueError: could not convert string to float: 'Hello world!'
Given this information, what do you expect the following program to do?
What does it actually do?
Why do you think it does that?
print("fractional string to int:", int("3.4"))
Solution
What do you expect this program to do? It would not be so unreasonable to expect the Python 3
int
command to convert the string “3.4” to 3.4 and an additional type conversion to 3. After all, Python 3 performs a lot of other magic - isn’t that part of its charm?However, Python 3 throws an error. Why? To be consistent, possibly. If you ask Python to perform two consecutive typecasts, you must convert it explicitly in code.
int("3.4") int(float("3.4"))
In [2]: int("3.4") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-ec6729dfccdc> in <module>() ----> 1 int("3.4") ValueError: invalid literal for int() with base 10: '3.4' 3
Arithmetic with Different Types
Which of the following will print 2.0? Note: there may be more than one right answer.
first = 1.0 second = "1" third = "1.1"
first + float(second)
float(second) + float(third)
first + int(third)
first + int(float(third))
int(first) + int(float(third))
2.0 * second
Solution
Answer: 1 and 4
Complex Numbers
Python provides complex numbers, which are written as
1.0+2.0j
. Ifval
is an imaginary number, its real and imaginary parts can be accessed using dot notation asval.real
andval.imag
.
- Why do you think Python uses
j
instead ofi
for the imaginary part?- What do you expect
1+2j + 3
to produce?- What do you expect ‘4j’ to be? What about
4 j
or `4 + j’? >Solution
- Standard mathematics treatments typically use
i
to denote an imaginary number. However, from media reports it was an early convention established from electrical engineering that now presents a technically expensive area to change. Stack Overflow provides additional explanation and discussion- 4+2j
- 4j, syntax error, depends on the value of j
Key Points
Every value has a type.
Use the built-in function
type
to find the type of a value.Types control what operations can be done on values.
Strings can be added and multiplied.
Strings have a length (but numbers don’t).
Must convert numbers to strings or vice versa when operating on them.
Can mix integers and floats freely in operations.
Variables only change value when something is assigned to them.
Built-in Functions and Help
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How can I use built-in functions?
How can I find out what they do?
What kind of errors can occur in programs?
Objectives
Explain the purpose of functions.
Correctly call built-in Python functions.
Correctly nest calls to built-in functions.
Use help to display documentation for built-in functions.
Correctly describe situations in which SyntaxError and NameError occur.
Use comments to add documentation to programs.
# This sentence isn't executed by Python.
adjustment = 0.5 # Neither is this - anything after '#' is ignored.
A function may take zero or more arguments.
- We have seen some functions already — now let’s take a closer look.
- An argument is a value passed into a function.
len
takes exactly one.int
,str
, andfloat
create a new value from an existing one.print
takes zero or more.print
with no arguments prints a blank line.- Must always use parentheses, even if they’re empty, so that Python knows a function is being called.
print('before')
print()
print('after')
before
after
Commonly-used built-in functions include max
, min
, and round
.
- Use
max
to find the largest value of one or more values. - Use
min
to find the smallest. - Both work on character strings as well as numbers.
- “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.
print(max(1, 2, 3))
print(min('a', 'A', '0'))
3
0
Functions may only work for certain (combinations of) arguments.
max
andmin
must be given at least one argument.- “Largest of the empty set” is a meaningless question.
- And they must be given things that can meaningfully be compared.
print(max(1, 'a'))
TypeError: unorderable types: str() > int()
Functions may have default values for some arguments.
round
will round off a floating-point number.- By default, rounds to zero decimal places.
round(3.712)
4
- We can specify the number of decimal places we want.
round(3.712, 1)
3.7
Use the built-in function help
to get help for a function.
- Every built-in function has online documentation.
help(round)
Help on built-in function round in module builtins:
round(...)
round(number[, ndigits]) -> number
Round a number to a given precision in decimal digits (default 0 digits).
This returns an int when called with one argument, otherwise the
same type as the number. ndigits may be negative.
Python reports a syntax error when it can’t understand the source of a program.
- Won’t even try to run the program if it can’t be parsed.
# Forgot to close the quote marks around the string.
name = 'Feng
SyntaxError: EOL while scanning string literal
# An extra '=' in the assignment.
age = = 52
SyntaxError: invalid syntax
- Look more closely at the error message:
print("hello world"
File "<ipython-input-6-d1cc229bf815>", line 1
print ("hello world"
^
SyntaxError: unexpected EOF while parsing
- The message indicates a problem on first line of the input (“line 1”).
- In this case the “ipython-input” section of the file name tells us that we are working with input into IPython, the Python interpreter used by the Jupyter Notebook.
- The
-6-
part of the filename indicates that the error occurred in cell 6 of our Notebook. - Next is the problematic line of code,
indicating the problem with a
^
pointer.
Python reports a runtime error when something goes wrong while a program is executing.
age = 53
remaining = 100 - aege # mis-spelled 'age'
NameError: name 'aege' is not defined
- Fix syntax errors by reading the source and runtime errors by tracing execution.
The Jupyter Notebook has two ways to get help.
- Place the cursor inside the parenthesis of the function,
hold down
shift
, and presstab
. - Or type a function name with a question mark after it.
Every function returns something.
- Every function call produces some result.
- If the function doesn’t have a useful result to return,
it usually returns the special value
None
.
result = print('example')
print('result of print is', result)
example
result of print is None
What Happens When
- Explain in simple terms the order of operations in the following program: when does the addition happen, when does the subtraction happen, when is each function called, etc.
- What is the final value of
radiance
?radiance = 1.0 radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
Solution
1.
1.1 * radiance = 1.1
1.1 - 0.5 = 0.6
min(randiance, 0.6) = 0.6
2.0 + 0.6 = 2.6
max(2.1, 2.6) = 2.6
- At the end,
radiance = 2.6
Spot the Difference
- Predict what each of the
- Does
max(len(rich), poor)
run or produce an error message? If it runs, does its result make any sense?easy_string = "abc" print(max(easy_string)) rich = "gold" poor = "tin" print(max(rich, poor)) print(max(len(rich), len(poor)))
Solution
print(max(easy_string))
c
print(max(rich, poor))
tin
print(max(len(rich), len(poor)))
4
- It throws a TypeError. The command is trying to run
max(4, 'tin')
and you can’t compare a string and an integer
Why Not?
Why don’t
max
andmin
returnNone
when they are given no arguments?Solution
max
andmin
return TypeErrors in this case because the correct number of parameters was not supplied. If it just returnedNone
, the error would be much harder to trace as it would likely be stored into a variable and used later in the program, only to likely throw a runtime error.
Last Character of a String
If Python starts counting from zero, and
len
returns the number of characters in a string, what index expression will get the last character in the stringname
? (Note: we will see a simpler way to do this in a later episode.)Solution
name[len(name) - 1]
Key Points
Use comments to add documentation to programs.
A function may take zero or more arguments.
Commonly-used built-in functions include
max
,min
, andround
.Functions may only work for certain (combinations of) arguments.
Functions may have default values for some arguments.
Use the built-in function
help
to get help for a function.The Jupyter Notebook has two ways to get help.
Every function returns something.
Python reports a syntax error when it can’t understand the source of a program.
Python reports a runtime error when something goes wrong while a program is executing.
Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.
Libraries
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I use software that other people have written?
How can I find out what that software does?
Objectives
Explain what software libraries are and why programmers create and use them.
Write programs that import and use libraries from Python’s standard library.
Find and read documentation for standard libraries interactively (in the interpreter) and online.
Most of the power of a programming language is in its libraries.
- A library is a collection of files (called modules) that contains
functions for use by other programs.
- May also contain data values (e.g., numerical constants) and other things.
- Library’s contents are supposed to be related, but there’s no way to enforce that.
- The Python standard library is an extensive suite of modules that comes with Python itself.
- Many additional libraries are available from PyPI (the Python Package Index).
- We will see later how to write new libraries.
Libraries and modules
A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.
A program must import a library module before using it.
- Use
import
to load a library module into a program’s memory. - Then refer to things from the module as
module_name.thing_name
.- Python uses
.
to mean “part of”.
- Python uses
- Using
math
, one of the modules in the standard library:
import math
print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))
pi is 3.141592653589793
cos(pi) is -1.0
- Have to refer to each item with the module’s name.
math.cos(pi)
won’t work: the reference topi
doesn’t somehow “inherit” the function’s reference tomath
.
Use help
to learn about the contents of a library module.
- Works just like help for a function.
help(math)
Help on module math:
NAME
math
MODULE REFERENCE
http://docs.language-python.org/3.5/library/math
The following documentation is automatically generated from the Python
source files. It may be incomplete, incorrect or include features that
are considered implementation detail and may vary between Python
implementations. When in doubt, consult the module reference at the
location listed above.
DESCRIPTION
This module is always available. It provides access to the
mathematical functions defined by the C standard.
FUNCTIONS
acos(...)
acos(x)
Return the arc cosine (measured in radians) of x.
⋮ ⋮ ⋮
Import specific items from a library module to shorten programs.
- Use
from ... import ...
to load only specific items from a library module. - Then refer to them directly without library name as prefix.
from math import cos, pi
print('cos(pi) is', cos(pi))
cos(pi) is -1.0
Create an alias for a library module when importing it to shorten programs.
- Use
import ... as ...
to give a library a short alias while importing it. - Then refer to items in the library using that shortened name.
import math as m
print('cos(pi) is', m.cos(m.pi))
cos(pi) is -1.0
- Commonly used for libraries that are frequently used or have long names.
- E.g.,
matplotlib
plotting library is often aliased asmpl
.
- E.g.,
- But can make programs harder to understand, since readers must learn your program’s aliases.
Exploring the Math Module
- What function from the
math
module can you use to calculate a square root without usingsqrt
?- Since the library contains this function, why does
sqrt
exist?Solution
- Using
help(math)
we see that we’ve gotpow(x,y)
in addition tosqrt(x)
, so we could usepow(x, 0.5)
to find a square root.The
sqrt(x)
function is arguably more readable thanpow(x, 0.5)
when implementing equations. Readability is a cornerstone of good programming, so it makes sense to provide a special function for this specific common case.Also, the design of Python’s
math
library has its origin in the C standard, which includes bothsqrt(x)
andpow(x,y)
, so a little bit of the history of programming is showing in Python’s function names.
Locating the Right Module
You want to select a random character from a string:
bases = 'ACTTGCTTGAC'
- Which standard library module could help you?
- Which function would you select from that module? Are there alternatives?
- Try to write a program that uses the function.
Solution
The random module seems like it could help you.
The string has 11 characters, each having a positional index from 0 to 10. You could use
random.randrange
function (or the aliasrandom.randint
if you find that easier to remember) to get a random integer between 0 and 10, and then pick out the character at that position:from random import randrange random_index = randrange(len(bases)) print(bases[random_index])
or more compactly:
from random import randrange print(bases[randrange(len(bases))])
Perhaps you found the
random.sample
function? It allows for slightly less typing:from random import sample print(sample(bases, 1)[0])
Note that this function returns a list of values. We will learn about lists in episode 11.
There’s also other functions you could use, but with more convoluted code as a result.
Jigsaw Puzzle (Parson’s Problem) Programming Example
Rearrange the following statements so that a random DNA base is printed and its index in the string. Not all statements may be needed. Feel free to use/add intermediate variables.
bases="ACTTGCTTGAC" import math import random ___ = random.randrange(n_bases) ___ = len(bases) print("random base ", bases[___], "base index", ___)
Solution
import math import random bases = "ACTTGCTTGAC" n_bases = len(bases) idx = random.randrange(n_bases) print("random base", bases[idx], "base index", idx)
When Is Help Available?
When a colleague of yours types
help(math)
, Python reports an error:NameError: name 'math' is not defined
What has your colleague forgotten to do?
Solution
Importing the math module (
import math
)
Importing With Aliases
- Fill in the blanks so that the program below prints
90.0
.- Rewrite the program so that it uses
import
withoutas
.- Which form do you find easier to read?
import math as m angle = ____.degrees(____.pi / 2) print(____)
Solution
import math as m angle = m.degrees(m.pi / 2) print(angle)
can bewritten as
import math angle = math.degrees(math.pi / 2) print(angle)
Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.
There Are Many Ways To Import Libraries!
Match the following print statements with the appropriate library calls.
Print commands:
print("sin(pi/2) =",sin(pi/2))
print("sin(pi/2) =",m.sin(m.pi/2))
print("sin(pi/2) =",math.sin(math.pi/2))
Library calls:
from math import sin,pi
import math
import math as m
from math import *
Solution
- Library calls 1 and 4. In order to directly refer to
sin
andpi
without the library name as prefix, you need to use thefrom ... import ...
statement. Whereas library call 1 specifically imports the two functionssin
andpi
, library call 4 imports all functions in themath
module.- Library call 3. Here
sin
andpi
are referred to with a shortened library namem
instead ofmath
. Library call 3 does exactly that using theimport ... as ...
syntax - it creates an alias formath
in the form of the shortened namem
.- Library call 2. Here
sin
andpi
are referred to with the regular library namemath
, so the regularimport ...
call suffices.
Importing Specific Items
- Fill in the blanks so that the program below prints
90.0
.- Do you find this version easier to read than preceding ones?
- Why wouldn’t programmers always use this form of
import
?____ math import ____, ____ angle = degrees(pi / 2) print(angle)
Solution
from math import degrees, pi angle = degrees(pi / 2) print(angle)
Most likely you find this version easier to read since it’s less dense. The main reason not to use this form of import is to avoid name clashes. For instance, you wouldn’t import
degrees
this way if you also wanted to use the namedegrees
for a variable or function of your own. Or if you were to also import a function nameddegrees
from another library.
Reading Error Messages
- Read the code below and try to identify what the errors are without running it.
- Run the code, and read the error message. What type of error is it?
from math import log log(0)
Solution
- The logarithm of
x
is only defined forx > 0
, so 0 is outside the domain of the function.- You get an error of type “ValueError”, indicating that the function received an inappropriate argument value. The additional message “math domain error” makes it clearer what the problem is.
Key Points
Most of the power of a programming language is in its libraries.
A program must import a library module in order to use it.
Use
help
to learn about the contents of a library module.Import specific items from a library to shorten programs.
Create an alias for a library when importing it to shorten programs.
Numpy and Scipy
Overview
Teaching: 40 min
Exercises: 10 minQuestions
How do I deal with tabular scientific data?
Objectives
Import the numpy library.
Understand the NDArray object.
Import the numpy library.
Get some basic information about a numpy and scipy objects and methods.
Numpy is the main Python library for scientific computation
- Numpy provides a new data type, the
array
arrays
are multi-dimensional collections of data of the same intrinsic type (int, float, etc.)
Import numpy before using it
numpy
is not built in, but is often installed by default.- use
import numpy
to import the entire package. - use
from numpy import ...
to import some functions. - use
import numpy as np
to use the most common alias.
import numpy as np
import numpy
from numpy import cos
print(numpy.cos, np.cos, cos)
<ufunc 'cos'> <ufunc 'cos'> <ufunc 'cos'>
Use numpy.zeros
to create empty arrays
f10 = numpy.zeros(10)
i10 = numpy.zeros(10, dtype=int)
print("default array of zeros: ", f10)
print("integer array of zeros: ", i10)
default array of zeros: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
integer array of zeros: [0 0 0 0 0 0 0 0 0 0]
Use numpy.ones
to create an array of ones.
print("Using numpy.ones : ", numpy.ones(10))
print("is the same thing as: ", numpy.zeros(10)+1)
Using numpy.ones : [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
is the same thing as: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Using numpy.arange
to generate sets of numbers
- arange takes from one to three arguments. By default arange will generate numbers starting from 0 with a step of 1
arange(N)
generates numbers from 0..N-1arange(M,N)
generates numbers from M..N-1arange(M,N,P)
generates numbers from M..N-1 including only ever Pth number.
numpy.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
- generate an array of numbers from 1 to 10
numpy.arange(1,10)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
- generate an array of odd numbers from 1 to 10
numpy.arange(1,10,2)
array([1, 3, 5, 7, 9])
- incorrectly generate an array of odd numbers from 1 to 10, backwards
numpy.arange(1,10,-2)
array([], dtype=int64)
- generate an array of even numbers from 10 to 2, backwards
numpy.arange(10,1,-2)
array([10, 8, 6, 4, 2])
Numpy arrays have a shape
- Numpy arrays have a
shape
parameter associated with them - You can change the shape with the
reshape
method
a = numpy.arange(10)
print("a's shape is ",a.shape)
b=a.reshape(5,2)
print("b's shape is ",b.shape)
a's shape is (10,)
b's shape is (5, 2)
Numpy arrays can be treated like single numbers in arithmetic
- Arithmetic using numpy arrays is element-by-element
- Matrix operations are possible with functions or methods.
- The size and shape of the arrays should match.
a = numpy.arange(5)
b = numpy.arange(5)
print("a=",a)
print("b=",b)
print("a*b=",a*b)
print("a+b=",a+b)
a= [0 1 2 3 4]
b= [0 1 2 3 4]
a*b= [ 0 1 4 9 16]
a+b= [0 2 4 6 8]
c = numpy.ones((5,2))
d = numpy.ones((5,2)) + 100
c+d
array([[102., 102.],
[102., 102.],
[102., 102.],
[102., 102.],
[102., 102.]])
e = c.reshape(2,5)
c+e #c and e have different shapes
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-46-0e32881b9afe> in <module>()
1 e = c.reshape(2,5)
----> 2 c+e #c and e have different shapes
ValueError: operands could not be broadcast together with shapes (5,2) (2,5)
---------------------------------------------------------------------------
The Numpy library has many functions that work on arrays
- Aggregation functions like
sum
,mean
,size
a=numpy.arange(5)
print("a = ", a)
a = [0 1 2 3 4]
- Add all of the elements of the array together.
print("sum(a) = ", a.sum())
sum(a) = 10
- Calculate the average value of the elements in the array.
print("mean(a) = ", a.mean())
mean(a) = 2.0
- Calculate something called
std
of the array.
print("std(a) = ", a.std()) #what is this?
std(a) = 1.4142135623730951
- Calculate the
sin
of each element in the array
print("np.sin(a) = ", np.sin(a))
np.sin(a) = [ 0. 0.84147098 0.90929743 0.14112001 -0.7568025 ]
Check the numpy
help and webpage for more functions
https://docs.scipy.org/doc/numpy/reference/routines.html
Use the axis
keyword to use the function over a subset of the data.
- Many functions take the
axis
keyword to perform the aggregation of that dimension
a = numpy.arange(10).reshape(5,2)
print("a=",a)
print("mean(a)=" ,numpy.mean(a))
print("mean(a,0)=",numpy.mean(a,axis=0))
print("mean(a,1)=",numpy.mean(a,axis=1))
a= [[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
mean(a)= 4.5
mean(a,0)= [4. 5.]
mean(a,1)= [0.5 2.5 4.5 6.5 8.5]
Use square brackets to access elements in the array
- Single integers in square brackets returns one element
- ranges of data can be accessed with slices
a=numpy.arange(10)
- Access the fifth element
a[5]
5
- Access elements 5 through 10
a[5:10]
array([5, 6, 7, 8, 9])
- Access elements from 5 to the end of the array
a[5:] #No second number means "rest of the array"
array([5, 6, 7, 8, 9])
- Access all elements from the start of the array to the fifth element.
a[:5] #No first number means "from the start of the array"
array([0, 1, 2, 3, 4])
- Access every 2nd element from the 5th to the 10th
a[5:10:2] #A third number means "every Nth element"
array([5, 7, 9])
- Access every -2nd element from the 5th to the 10th. (incorrect)
a[5:10:-2] #negative numbers mean "count backwards"
array([], dtype=int64)
- Access every -2nd element from the 10th to the 5th. (correct)
a[10:5:-2] #but you need to start and stop in the same order
array([9, 7])
Challenge 1
There is an
arange
function andlinspace
function, that take similar arguments. Explain the difference. For example, what does the following code do?print (numpy.arange(1.,9,3)) print (numpy.linspace(1.,9,3))
Solution
arange
takes the arguments start, stop, step, and generates numbers from start to stop (excluding stop) stepping by step each time.linspace
takes the arguments start, stop, number, and generates numbers from start to stop (including stop) with number of steps.print (numpy.arange(1.,9,3)) print (numpy.linspace(1.,9,3))
[1. 4. 7.] [1. 5. 9.]
Challenge 2
Generate a 10 x 3 array of random numbers (using
numpy.random.rand
). From each row, find the minimum absolute value. Make use of numpy.abs and numpy.min. The result should be a one-dimensional array.Solution
The important part of the solution is passing the
axis
keyword to the min function:a = numpy.random.rand(30).reshape(10,3) print("a is ", a) print() print("min(a) along each row is ", numpy.min( numpy.abs( a ), axis=0))
Use the scipy
library for common scientific and numerical methods
scipy
contains functions to generate random numbers, calculate Fourier transforms, integrate- Check the
scipy
website for more help: https://docs.scipy.org/doc/scipy/reference/
Example : integrate y=x^2 from 0 to 10
x = numpy.arange(11)
y = x**2
import scipy.integrate
#by default, trapz assumes the independent variable is a list of integers from 0..N
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333
integral of x^2 from 0 to 10 335.0
- Numerical integration can be inprecise with a coarse grid. (this time, incorrectly!)
x = numpy.linspace(0,10,1000) # finer grid
y=x**2
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 33300.01668335002
- Passing the
x
values totrapz
allows it to integrate correctly
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y,x) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 333.333500333834
We’ll come back to scipy.optimize
later.
Key Points
Use the numpy library to get basic statistics out of tabular data.
Print numpy arrays.
Use mean, sum, std to get summary statistics.
Add numpy arrays together.
Study the scipy website
Use scipy to integrate tabular data.
Reading Tabular Data into arrays
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How can I read tabular data?
How can I save tabular data?
Objectives
Import the numpy library.
Use numpy to load a simple CSV data set.
Get some basic information about a numpy array.
Use the Numpy package to load data using the loadtxt
command
- Numpy provides the function loadtxt to read and parse numeric data from a text file.
- The file can be delimited with commas (a ‘comma separated file’), tabs, or other common delimiters
- Numerical data can be converted to floating point data or integers
- Headers and comments can be ignored during the reading of the file.
import numpy
data = numpy.loadtxt('data/galileo_flat.empty')
print(data)
[[1500. 1000.]
[1340. 828.]
[1328. 800.]
[1172. 600.]
[ 800. 300.]]
Read a comma separated file of data with headers
- If you have a delimiter in your file (a comma, tab, vertical line), specify that with the
delimiter
keyword. - If you use a comment character consistently, using the
comments
keyword. - If you have a header you want to skip, use
skiprows
data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
print(data)
[[1500. 1000.]
[1340. 828.]
[1328. 800.]
[1172. 600.]
[ 800. 300.]]
Remember your data has the shape ROWS X COLUMNS
- Your data will be shaped with the rows first.
- You can change the order with
transpose
print("data shape is ",data.shape)
data shape is (5, 2)
Split the data into variables using unpack
- You can split data using the
unpack
keyword
D,H = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',unpack=True)
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172. 800.] [1000. 828. 800. 600. 300.]
D shape is (5,)
H shape is (5,)
- You can split data after loading too
data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
D,H = data.T
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172. 800.] [1000. 828. 800. 600. 300.]
D shape is (5,)
H shape is (5,)
Save data with numpy.savetxt
Saving text data is made possible with the savetxt
command. It mirrors the loadtxt
command
numpy.savetxt("data/mydata.txt", data, delimiter=',')
1.500000000000000000e+03,1.000000000000000000e+03
1.340000000000000000e+03,8.280000000000000000e+02
1.328000000000000000e+03,8.000000000000000000e+02
1.172000000000000000e+03,6.000000000000000000e+02
8.000000000000000000e+02,3.000000000000000000e+02
Control the data format with the fmt
keyword
- The default format for the data is floating point data with 16 digits
- You can change the format with the
fmt
keyword
numpy.savetxt("data/mydata2.txt", data, delimiter=',', fmt='%.6g')
1500,1000
1340,828
1328,800
1172,600
800,300
Add a header string with header
- Add header text to the file with the
header
keyword. - Include column titles in the
header
keyword.
header="Distance (D), Header(H)"
newdata = numpy.vstack([D,H]).T
numpy.savetxt("data/mydata3.txt", newdata, delimiter=', ', header=header, fmt='%.6g')
# Distance (D), Header(H)
1500, 1000
1340, 828
1328, 800
1172, 600
800, 300
More complex loadtxt commands can make your data more flexible
- Using the
dtype
keyword allows fine control over the types of data you read. - Using
dtype
allows you to ‘name’ your data columns and reference them with the name.
data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',\
dtype={'names':("Distance","Height"), 'formats':('f4','f4')})
print("data shape is ", data.shape)
print("Distance data is ", data["Distance"])
data shape is (5,)
Distance data is [1500. 1340. 1328. 1172. 800.]
Key Points
Use numpy.loadtxt library to load tabular data.
Use numpy.savetxt library to save tabular data.
Use delimiters to make your text file cleaner.
Use comments in your file to describe the contents.
Plotting
Overview
Teaching: 25 min
Exercises: 15 minQuestions
How can I plot my data?
How can I save my plot for publishing?
Objectives
Create a time series plot showing a single data set.
Create a scatter plot showing relationship between two data sets.
matplotlib
is the most widely used scientific plotting library in Python
- Usually using a sub-library called matplotlib.pyplot. imported using the alias
plt
- The Jupyter Notebook will render plots inline if we ask it to using a “magic” command.
%matplotlib inline
import matplotlib.pyplot as plt
- Simple plots are then (fairly) simple to create.
import numpy
time = numpy.array([0,1,2,3])
position = numpy.array([0,100,200,300])
plt.plot(time, position)
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
Text(0, 0.5, 'Position (km)')
The color and format of lines and markers can be changed.
- A shortcut for simple formatting is to use the third argument string.
- ‘b-‘ means blue line, ‘ro’ means red circles, ‘g+-‘ means green + with a line
import numpy
time = numpy.arange(10)
p1 = time
p2 = time*2
p3 = time*4
plt.plot(time, p1,'b-')
plt.plot(time, p2,'ro')
plt.plot(time, p3,'g+-')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
Text(0, 0.5, 'Position (km)')
More complex formatting can be achieved using the plot
keywords
linewidth
controls the thickness of the linelinestyle
controls the type of linemarker
controls the shape of the markercolor
controls the color of the line and markerlabel
controls the labelling of the line for use withplt.legend
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9b88472b0>
Built in “styles” provide consistent plots
print("available style names: ", plt.style.available)
available style names: ['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']
plt.style.use("ggplot")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9a8405bb0>
plt.style.use("fivethirtyeight")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9a843aac0>
plt.style.use("seaborn-whitegrid")
plt.plot(time, p1,linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'o', markersize=10, label="red dots")
plt.plot(time, p3,'-', marker='+') #where's the marker?
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9780b4070>
Plots can be scatter plots with points and no lines
numpy.random.seed(20)
x,y = numpy.random.randint(0,100,100), numpy.random.randn(100)
x=numpy.cumsum(x)
y=numpy.cumsum(y)
plt.scatter( x, y)
plt.scatter( x, 10-y**2, color='green',marker='<')
plt.xlabel("Labels still work")
plt.title("title")
Text(0.5, 1.0, 'title')
Plot data with associated uncertainties using errorbar
- Don’t join the data with a line by setting the
linestyle
to an empty string. - Set a marker shape using
marker
. - Use the same color for marker and errorbars.
numpy.random.seed(42)
x = numpy.random.rand(10)*10
x=numpy.cumsum(x)
error = numpy.random.randn(10)*4
y=x + numpy.random.randn(10)*0.5
plt.errorbar( x, y, yerr=error,color='green',marker='o',ls='',lw=1,label="data")
plt.xlabel("Labels still work")
plt.title("errobar")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9b888f040>
plt.errorbar?
matplotlib
also makes bar charts and histograms
- If you have data grouped into counts already,
bar
can make a chart
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("Bar chart")
Text(0.5, 1.0, 'Bar chart')
- If you have raw data,
hist
can calculate and plot the histogram.
x = numpy.random.randint(0,100,50)
bin_count, bin_edges, boxes = plt.hist(x, bins=10)
print("The counts are ", bin_count)
The counts are [8. 5. 0. 5. 5. 6. 3. 6. 8. 4.]
bin_count, bin_edges, boxes = plt.hist(x, bins=10, rwidth=0.9)
plt.title("cleaner histogram")
Text(0.5, 1.0, 'cleaner histogram')
# Compute pie slices
N = bin_count.size
theta = 0.5*(bin_edges[1:] + bin_edges[:-1])
theta = theta * 2*numpy.pi/theta.max()
width = numpy.pi / 4 * numpy.random.rand(N)
ax = plt.subplot(111, projection='polar')
bars = ax.bar(theta, bin_count, width=width, bottom=0.0,alpha=0.5)
# Use custom colors and opacity
for r, bar in zip(bin_count, bars):
bar.set_facecolor(plt.cm.viridis(r / bin_count.max()))
bar.set_alpha(0.5)
t=plt.title("Something more exotic")
Define the figure size before plotting using the figure
command
plt.figure
pre-defines a figure for you- The keyword
figsize
takes two values to define the width and height
plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("narrow bar chart")
Text(0.5, 1.0, 'narrow bar chart')
Place multiple figures on one plot with subplot
plt.subplot
takes three arguments : (number_of_rows, number_of_columns, location)
plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(2,2,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(2,2,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(2,2,4)
plt.bar(x,y)
plt.title("sometimes the formatting is awkward")
Text(0.5, 1.0, 'sometimes the formatting is awkward')
plt.figure(figsize=(8,3))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(1,3,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(1,3,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(1,3,3)
plt.bar(x,y)
plt.title("less awkward")
Text(0.5, 1.0, 'less awkward')
Figures can be saved with savefig
- After plotting, use
plt.savefig
to save the figure to a file - The figure size you specified is (approximately) the size in inches.
- For PNG/JPG images you can specify the resolution with
dpi
plt.figure(figsize=(8,3))
plt.plot(x,y)
plt.savefig("data/fig1.pdf") #PDF format
plt.savefig("data/fig1.png", dpi=150, transparent=True) #PNG format
Note that functions in
plt
refer to a global figure variable and after a figure has been displayed to the screen (e.g. withplt.show
) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you callplt.savefig
before the plot is displayed to the screen, otherwise you may find a file with an empty plot.It is also possible to save the figure to file by first getting a reference to the figure with
plt.gcf
, then calling thesavefig
class method from that variable.fig = plt.gcf() # get current figure data.plot(kind='bar') fig.savefig('my_figure.png')
Key Points
matplotlib
is the most widely used scientific plotting library in Python.Plot data directly from a Pandas dataframe.
Select and transform data, then plot it.
Many styles of plot are available: see the Python Graph Gallery for more options.
Can plot many sets of data together.
Lists
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I store multiple values?
Objectives
Explain why programs need collections of values.
Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.
A list stores many values in a single structure.
- Doing calculations with a hundred variables called
pressure_001
,pressure_002
, etc., would be at least as slow as doing them by hand. - Use a list to store many values together.
- Contained within square brackets
[...]
. - Values separated by commas
,
.
- Contained within square brackets
- Use
len
to find out how many values are in a list.
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
print('pressures:', pressures)
print('length:', len(pressures))
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
length: 5
Use an item’s index to fetch it from a list.
- Just like strings.
print('zeroth item of pressures:', pressures[0])
print('fourth item of pressures:', pressures[4])
zeroth item of pressures: 0.273
fourth item of pressures: 0.276
Lists’ values can be replaced by assigning to them.
- Use an index expression on the left of assignment to replace a value.
pressures[0] = 0.265
print('pressures is now:', pressures)
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]
Appending items to a list lengthens it.
- Use
list_name.append
to add items to the end of a list.
primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
primes.append(9)
print('primes has become:', primes)
primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7, 9]
append
is a method of lists.- Like a function, but tied to a particular object.
- Use
object_name.method_name
to call methods.- Deliberately resembles the way we refer to things in a library.
- We will meet other methods of lists as we go along.
- Use
help(list)
for a preview.
- Use
extend
is similar toappend
, but it allows you to combine two lists. For example:
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)
primes is currently: [2, 3, 5, 7, 9]
primes has now become: [2, 3, 5, 7, 9, 11, 13, 17, 19]
primes has finally become: [2, 3, 5, 7, 9, 11, 13, 17, 19, [37, 41, 43, 47]]
Note that while extend
maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional.
Use del
to remove items from a list entirely.
del list_name[index]
removes an item from a list and shortens the list.- Not a function or a method, but a statement in the language.
primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)
primes before removing last item: [2, 3, 5, 7, 9]
primes after removing last item: [2, 3, 5, 7]
The empty list contains no values.
- Use
[]
on its own to represent a list that doesn’t contain any values.- “The zero of lists.”
- Helpful as a starting point for collecting values (which we will see in the next episode).
Lists may contain values of different types.
- A single list may contain numbers, strings, and anything else.
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
Character strings can be indexed like lists.
- Get single characters from a character string using indexes in square brackets.
element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])
zeroth character: c
third character: b
Character strings are immutable.
- Cannot change the characters in a string after it has been created.
- Immutable: can’t be changed after creation.
- In contrast, lists are mutable: they can be modified in place.
- Python considers the string to be a single value with parts, not a collection of values.
element[0] = 'C'
TypeError: 'str' object does not support item assignment
- Lists and character strings are both collections.
Indexing beyond the end of the collection is an error.
- Python reports an
IndexError
if we attempt to access a value that doesn’t exist.- This is a kind of runtime error.
- Cannot be detected as the code is parsed because the index might be calculated based on data.
print('99th element of element is:', element[99])
IndexError: string index out of range
Fill in the Blanks
Fill in the blanks so that the program below produces the output shown.
values = ____ values.____(1) values.____(3) values.____(5) print('first time:', values) values = values[____] print('second time:', values)
first time: [1, 3, 5] second time: [3, 5]
Solution
values = [] values.append(1) values.append(3) values.append(5) print('first time:', values) values = values[1:] print('second time:', values)
How Large is a Slice?
If ‘low’ and ‘high’ are both non-negative integers, how long is the list
values[low:high]
?Solution
The list
values[low:high]
hashigh - low
elements. For example,values[1:4]
has the 3 elementsvalues[1]
,values[2]
, andvalues[3]
. Note that the expression will only work ifhigh
is less than the total length of the listvalues
.
From Strings to Lists and Back
Given this:
print('string to list:', list('tin')) print('list to string:', ''.join(['g', 'o', 'l', 'd']))
['t', 'i', 'n'] 'gold'
- Explain in simple terms what
list('some string')
does.- What does
'-'.join(['x', 'y'])
generate?Solution
list('some string')
“splits” a string into a list of its characters.x-y
Working With the End
What does the following program print?
element = 'helium' print(element[-1])
- How does Python interpret a negative index?
- If a list or string has N elements, what is the most negative index that can safely be used with it, and what location does that index represent?
- If
values
is a list, what doesdel values[-1]
do?- How can you display all elements but the last one without changing
values
? (Hint: you will need to combine slicing and negative indexing.)Solution
The program prints
m
.
- Python interprets a negative index as starting from the end (as opposed to starting from the beginning). The last element is
-1
.- The last index that can safely be used with a list of N elements is element
-N
, which represents the first element.del values[-1]
removes the last element from the list.values[:-1]
Stepping Through a List
What does the following program print?
element = 'fluorine' print(element[::2]) print(element[::-1])
- If we write a slice as
low:high:stride
, what doesstride
do?- What expression would select all of the even-numbered items from a collection?
Solution
The program prints
furn eniroulf
stride
is the step size of the slice- The slice
1::2
selects all even-numbered items from a collection: it starts with element1
(which is the second element, since indexing starts at0
), goes on until the end (since noend
is given), and uses a step size of2
(i.e., selects every second element).
Slice Bounds
What does the following program print?
element = 'lithium' print(element[0:20]) print(element[-1:3])
Solution
lithium
Sort and Sorted
What do these two programs print? In simple terms, explain the difference between
sorted(letters)
andletters.sort()
.# Program A letters = list('gold') result = sorted(letters) print('letters is', letters, 'and result is', result)
# Program B letters = list('gold') result = letters.sort() print('letters is', letters, 'and result is', result)
Solution
Program A prints
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
Program B prints
letters is ['d', 'g', 'l', 'o'] and result is None
sorted(letters)
returns a sorted copy of the listletters
(the original listletters
remains unchanged), whileletters.sort()
sorts the listletters
in-place and does not return anything.
Copying (or Not)
What do these two programs print? In simple terms, explain the difference between
new = old
andnew = old[:]
.# Program A old = list('gold') new = old # simple assignment new[0] = 'D' print('new is', new, 'and old is', old)
# Program B old = list('gold') new = old[:] # assigning a slice new[0] = 'D' print('new is', new, 'and old is', old)
Solution
Program A prints
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
Program B prints
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
new = old
makesnew
a reference to the listold
;new
andold
point towards the same object.
new = old[:]
however creates a new list objectnew
containing all elements from the listold
;new
andold
are different objects.
Key Points
A list stores many values in a single structure.
Use an item’s index to fetch it from a list.
Lists’ values can be replaced by assigning to them.
Appending items to a list lengthens it.
Use
del
to remove items from a list entirely.The empty list contains no values.
Lists may contain values of different types.
Character strings can be indexed like lists.
Character strings are immutable.
Indexing beyond the end of the collection is an error.
For Loops
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can I make a program do many things?
Objectives
Explain what for loops are normally used for.
Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
Write for loops that use the Accumulator pattern to aggregate values.
A for loop executes commands once for each value in a collection.
- Doing calculations on the values in a list one by one
is as painful as working with
pressure_001
,pressure_002
, etc. - A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection.
- “for each thing in this group, do these operations”
for number in [2, 3, 5]:
print(number)
- This
for
loop is equivalent to:
print(2)
print(3)
print(5)
- And the
for
loop’s output is:
2
3
5
The first line of the for
loop must end with a colon, and the body must be indented.
- The colon at the end of the first line signals the start of a block of statements.
- Python uses indentation rather than
{}
orbegin
/end
to show nesting.- Any consistent indentation is legal, but almost everyone uses four spaces.
for number in [2, 3, 5]:
print(number)
IndentationError: expected an indented block
- Indentation is always meaningful in Python.
firstName="Jon"
lastName="Smith"
File "<ipython-input-7-f65f2962bf9c>", line 2
lastName="Smith"
^
IndentationError: unexpected indent
- This error can be fixed by removing the extra spaces at the beginning of the second line.
A for
loop is made up of a collection, a loop variable, and a body.
for number in [2, 3, 5]:
print(number)
- The collection,
[2, 3, 5]
, is what the loop is being run on. - The body,
print(number)
, specifies what to do for each value in the collection. - The loop variable,
number
, is what changes for each iteration of the loop.- The “current thing”.
Loop variables can be called anything.
- As with all variables, loop variables are:
- Created on demand.
- Meaningless: their names can be anything at all.
for kitten in [2, 3, 5]:
print(kitten)
The body of a loop can contain many statements.
- But no loop should be more than a few lines long.
- Hard for human beings to keep larger chunks of code in mind.
primes = [2, 3, 5]
for p in primes:
squared = p ** 2
cubed = p ** 3
print(p, squared, cubed)
2 4 8
3 9 27
5 25 125
Use range
to iterate over a sequence of numbers.
- The built-in function
range
produces a sequence of numbers.- Not a list: the numbers are produced on demand to make looping over large ranges more efficient.
range(N)
is the numbers 0..N-1- Exactly the legal indices of a list or character string of length N
print('a range is not a list: range(0, 3)')
for number in range(0,3):
print(number)
a range is not a list: range(0, 3)
0
1
2
The Accumulator pattern turns many values into one.
- A common pattern in programs is to:
- Initialize an accumulator variable to zero, the empty string, or the empty list.
- Update the variable with values from a collection.
# Sum the first 10 integers.
total = 0
for number in range(10):
total = total + (number + 1)
print(total)
55
- Read
total = total + (number + 1)
as:- Add 1 to the current value of the loop variable
number
. - Add that to the current value of the accumulator variable
total
. - Assign that to
total
, replacing the current value.
- Add 1 to the current value of the loop variable
- We have to add
number + 1
becauserange
produces 0..9, not 1..10.
Classifying Errors
Is an indentation error a syntax error or a runtime error?
Solution
An IndentationError is a syntax error. Programs with syntax errors cannot be started. A program with a runtime error will start but an error will be thrown under certain conditions.
Tracing Execution
Create a table showing the numbers of the lines that are executed when this program runs, and the values of the variables after each line is executed.
total = 0 for char in "tin": total = total + 1
Solution
Line no Variables 1 total = 0 2 total = 0 char = ‘t’ 3 total = 1 char = ‘t’ 2 total = 1 char = ‘i’ 3 total = 2 char = ‘i’ 2 total = 2 char = ‘n’ 3 total = 3 char = ‘n’
Reversing a String
Fill in the blanks in the program below so that it prints “nit” (the reverse of the original character string “tin”).
original = "tin" result = ____ for char in original: result = ____ print(result)
Solution
original = "tin" result = "" for char in original: result = char + result print(result)
Practice Accumulating
Fill in the blanks in each of the programs below to produce the indicated result.
# Total length of the strings in the list: ["red", "green", "blue"] => 12 total = 0 for word in ["red", "green", "blue"]: ____ = ____ + len(word) print(total)
Solution
total = 0 for word in ["red", "green", "blue"]: total = total + len(word) print(total)
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4] lengths = ____ for word in ["red", "green", "blue"]: lengths.____(____) print(lengths)
Solution
lengths = [] for word in ["red", "green", "blue"]: lengths.append(len(word)) print(lengths)
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue" words = ["red", "green", "blue"] result = ____ for ____ in ____: ____ print(result)
Solution
words = ["red", "green", "blue"] result = "" for word in words: result = result + word print(result)
# Create acronym: ["red", "green", "blue"] => "RGB" # write the whole thing
Solution
acronym = "" for word in ["red", "green", "blue"]: acronym = acronym + word[0].upper() print(acronym)
Cumulative Sum
Reorder and properly indent the lines of code below so that they print an array with the cumulative sum of data. The result should be
[1, 3, 5, 10]
.cumulative += [sum] for number in data: cumulative = [] sum += number sum = 0 print(cumulative) data = [1,2,2,5]
Solution
sum = 0 data = [1,2,2,5] cumulative = [] for number in data: sum += number cumulative.append(sum) print(cumulative)
Identifying Variable Name Errors
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. What type of
NameError
do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?- Fix the error.
- Repeat steps 2 and 3, until you have fixed all the errors.
for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (Number % 3) == 0: message = message + a else: message = message + "b" print(message)
Solution
message = "" for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (number % 3) == 0: message = message + "a" else: message = message + "b" print(message)
Identifying Item Errors
- Read the code below and try to identify what the errors are without running it.
- Run the code, and read the error message. What type of error is it?
- Fix the error.
seasons = ['Spring', 'Summer', 'Fall', 'Winter'] print('My favorite season is ', seasons[4])
Solution
seasons = ['Spring', 'Summer', 'Fall', 'Winter'] print('My favorite season is ', seasons[3])
Key Points
A for loop executes commands once for each value in a collection.
The first line of the
for
loop must end with a colon, and the body must be indented.Indentation is always meaningful in Python.
A
for
loop is made up of a collection, a loop variable, and a body.Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
The body of a loop can contain many statements.
Use
range
to iterate over a sequence of numbers.The Accumulator pattern turns many values into one.
Looping Over Data Sets
Overview
Teaching: 5 min
Exercises: 0 minQuestions
How can I process many data sets with a single command?
Objectives
Be able to read and write globbing expressions that match sets of files.
Use glob to create lists of files.
Write for loops to perform operations on files given their names in a list.
Use a for
loop to process files given a list of their names.
- A filename is just a character string.
- And lists can contain character strings.
import numpy
for filename in ["data/galileo_flat.csv","data/galileo_ramp.csv"]:
distance, height = numpy.loadtxt(filename, skiprows=2,\
comments="#", delimiter=',', unpack=True)
print(filename, distance.min(), height.max())
data/galileo_flat.csv 800.0 1000.0
data/galileo_ramp.csv 253.0 1000.0
Use glob.glob
to find sets of files whose names match a pattern.
- In Unix, the term “globbing” means “matching a set of files with a pattern”.
- The most common patterns are:
*
meaning “match zero or more characters”?
meaning “match exactly one character”
- Python contains the
glob
library to provide pattern matching functionality - The
glob
library contains a function also calledglob
to match file patterns - E.g.,
glob.glob('*.txt')
matches all files in the current directory whose names end with.txt
. - Result is a (possibly empty) list of character strings.
import glob
print("all csv files in data/random directory:", glob.glob("data/random/*.csv"))
all csv files in data/random directory: ['data/random/data.029.csv', 'data/random/data.015.csv', 'data/random/data.001.csv', 'data/random/data.000.csv', 'data/random/data.014.csv', 'data/random/data.028.csv', 'data/random/data.002.csv', '...','data/random/data.035.csv', 'data/random/data.009.csv', 'data/random/data.023.csv', 'data/random/data.037.csv', 'data/random/data.036.csv', 'data/random/data.022.csv', 'data/random/data.026.csv', 'data/random/data.032.csv', 'data/random/data.033.csv', 'data/random/data.027.csv', 'data/random/data.031.csv', 'data/random/data.025.csv', 'data/random/data.019.csv', 'data/random/data.018.csv', 'data/random/data.024.csv', 'data/random/data.030.csv']
print("all txt files in data/random directory:", glob.glob("data/random/*.txt"))
all txt files in data/random directory: []
Use glob
and for
to process batches of files.
Helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.
for filename in sorted(glob.glob('data/random/*.csv')):
distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True)
print(filename, distance.mean(), height.std())
data/random/data.000.csv 0.973455156 14.253108991004671
data/random/data.001.csv -4.4384872 18.271604877007015
data/random/data.002.csv -2.28566216 13.972753882460598
...
data/random/data.096.csv 0.724618214 16.429186710317516
data/random/data.097.csv 0.48894924 15.292681284065516
data/random/data.098.csv 1.84267224 11.33741881916356
data/random/data.099.csv -3.772237556 14.155571992376832
Determining Matches
Which of these files is not matched by the expression
glob.glob('data/*as*.csv')
?
data/gapminder_gdp_africa.csv
data/gapminder_gdp_americas.csv
data/gapminder_gdp_asia.csv
Solution
1 is not matched by the glob.
Averaging over datasets
Write a program that calculate the average value from all of the data in the files, instead of individual files
Solution
import glob import numpy as np data = [] for filename in sorted(glob.glob('data/random/*.csv')): distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True) data.append(distance) print(np.mean(data))
Key Points
Use a
for
loop to process files given a list of their names.Use
glob.glob
to find sets of files whose names match a pattern.Use
glob
andfor
to process batches of files.
Writing Functions
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can I create my own functions?
Objectives
Explain and identify the difference between function definition and function call.
Write a function that takes a small, fixed number of arguments and produces a single result.
Break programs down into functions to make them easier to understand.
- Human beings can only keep a few items in working memory at a time.
- Understand larger/more complicated ideas by understanding and combining pieces.
- Components in a machine.
- Lemmas when proving theorems.
- Functions serve the same purpose in programs.
- Encapsulate complexity so that we can treat it as a single “thing”.
- Also enables re-use.
- Write one time, use many times.
Define a function using def
with a name, parameters, and a block of code.
- Begin the definition of a new function with
def
. - Followed by the name of the function.
- Must obey the same rules as variable names.
- Then parameters in parentheses.
- Empty parentheses if the function doesn’t take any inputs.
- We will discuss this in detail in a moment.
- Then a colon.
- Then an indented block of code.
def print_greeting():
print('Hello!')
Defining a function does not run it.
- Defining a function does not run it.
- Like assigning a value to a variable.
- Must call the function to execute the code it contains.
print_greeting()
Hello!
Arguments in call are matched to parameters in definition.
- Functions are most useful when they can operate on different data.
- Specify parameters when defining a function.
- These become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
- If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.
def print_date(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
print_date(1871, 3, 19)
1871/3/19
Or, we can name the arguments when we call the function, which allows us to specify them in any order:
print_date(month=3, day=19, year=1871)
1871/3/19
Functions may return a result to their caller using return
.
- Use
return ...
to give a value back to the caller. - May occur anywhere in the function.
- But functions are easier to understand if
return
occurs:- At the start to handle special cases.
- At the very end, with a final result.
def average(values):
if len(values) == 0:
return None
return sum(values) / len(values)
a = average([1, 3, 4])
print('average of actual values:', a)
2.6666666666666665
print('average of empty list:', average([]))
None
- Remember: every function returns something.
- A function that doesn’t explicitly
return
a value automatically returnsNone
.
result = print_date(1871, 3, 19)
print('result of call is:', result)
1871/3/19
result of call is: None
Identifying Syntax Errors
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. Is it a
SyntaxError
or anIndentationError
?- Fix the error.
- Repeat steps 2 and 3 until you have fixed all the errors.
def another_function print("Syntax errors are annoying.") print("But at least python tells us about them!") print("So they are usually not too hard to fix.")
Solution
def another_function(): print("Syntax errors are annoying.") print("But at least Python tells us about them!") print("So they are usually not too hard to fix.")
Definition and Use
What does the following program print?
def report(pressure): print('pressure is', pressure) print('calling', report, 22.5)
Solution
calling <function report at 0x7fd128ff1bf8> 22.5
A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows
print("calling") report(22.5)
Order of Operations
The example above:
result = print_date(1871, 3, 19) print('result of call is:', result)
printed:
1871/3/19 result of call is: None
Explain why the two lines of output appeared in the order they did.
What’s wrong in this example?
result = print_date(1871,3,19) def print_date(year, month, day): joined = str(year) + '/' + str(month) + '/' + str(day) print(joined)
Solution
- The first line of output (
1871/3/19
) is from the print function insideprint_date()
, while the second line is from the print function below the function call. All of the code insideprint_date()
is executed first, and the program then “leaves” the function and executes the rest of the code.- The problem with the example is that the function is defined after the call to the function is made. Python therefore doesn’t understand the function call.
Encapsulation
Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.
import pandas def min_in_data(____): data = ____ return ____
Solution
import pandas def min_in_data(filename): data = pandas.read_csv(filename) return data.min()
Find the First
Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty?
def first_negative(values): for v in ____: if ____: return ____
Solution
def first_negative(values): for v in values: if v<0: return v
If an empty list is passed to this function, it returns
None
:my_list = [] print(first_negative(my_list)
None
Calling by Name
Earlier we saw this function:
def print_date(year, month, day): joined = str(year) + '/' + str(month) + '/' + str(day) print(joined)
We saw that we can call the function using named arguments, like this:
print_date(day=1, month=2, year=2003)
- What does
print_date(day=1, month=2, year=2003)
print?- When have you seen a function call like this before?
- When and why is it useful to call functions this way?
Solution
2003/2/1
- We saw examples of using named arguments when working with the pandas library. For example, when reading in a dataset using
data = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
, the last argumentindex_col
is a named argument.- Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.
Encapsulate of If/Print Block
The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.
Please re-write the code so that the if-block is folded into a function.
import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams mass=70+20.0*(2.0*random.random()-1.0) print(mass) #egg sizing machinery prints a label if(mass>=85): print("jumbo") elif(mass>=70): print("large") elif(mass<70 and mass>=55): print("medium") else: print("small")
The simplified program follows. What function definition will make it functional?
# revised version import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams mass=70+20.0*(2.0*random.random()-1.0) print(mass,print_egg_label(mass))
- Create a function definition for
print_egg_label()
that will work with the revised program above. Note, the function’s return value will be significant. Sample output might be71.23 large
.- A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that’s less than 50 grams. Modify your
print_egg_label()
function to account for these error conditions. Sample output could be25 too light, probably spoiled
.Solution
def print_egg_label(mass): #egg sizing machinery prints a label if(mass>=90): return("warning: egg might be dirty") elif(mass>=85): return("jumbo") elif(mass>=70): return("large") elif(mass<70 and mass>=55): return("medium") elif(mass<50): return("too light, probably spoiled") else: return("small")
Key Points
Break programs down into functions to make them easier to understand.
Define a function using
def
with a name, parameters, and a block of code.Defining a function does not run it.
Arguments in call are matched to parameters in definition.
Functions may return a result to their caller using
return
.
Variable Scope
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do function calls actually work?
How can I determine where errors occurred?
Objectives
Identify local and global variables.
Identify parameters as local variables.
Read a traceback and determine the file, function, and line number on which the error occurred, the type of error, and the error message.
The scope of a variable is the part of a program that can ‘see’ that variable.
- There are only so many sensible names for variables.
- People using functions shouldn’t have to worry about what variable names the author of the function used.
- People writing functions shouldn’t have to worry about what variable names the function’s caller uses.
- The part of a program in which a variable is visible is called its scope.
pressure = 103.9
def adjust(t):
temperature = t * 1.43 / pressure
return temperature
pressure
is a global variable.- Defined outside any particular function.
- Visible everywhere.
t
andtemperature
are local variables inadjust
.- Defined in the function.
- Not visible in the main program.
- Remember: a function parameter is a variable that is automatically assigned a value when the function is called.
print('adjusted:', adjust(0.9))
print('temperature after call:', temperature)
adjusted: 0.01238691049085659
Traceback (most recent call last):
File "/Users/swcarpentry/foo.py", line 8, in <module>
print('temperature after call:', temperature)
NameError: name 'temperature' is not defined
Local and Global Variable Use
Trace the values of all variables in this program as it is executed. (Use ‘—’ as the value of variables before and after they exist.)
limit = 100 def clip(value): return min(max(0.0, value), limit) value = -22.5 print(clip(value))
Reading Error Messages
Read the traceback below, and identify the following:
- How many levels does the traceback have?
- What is the file name where the error occurred?
- What is the function name where the error occurred?
- On which line number in this function did the error occurr?
- What is the type of error?
- What is the error message?
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-2-e4c4cbafeeb5> in <module>() 1 import errors_02 ----> 2 errors_02.print_friday_message() /Users/ghopper/thesis/code/errors_02.py in print_friday_message() 13 14 def print_friday_message(): ---> 15 print_message("Friday") /Users/ghopper/thesis/code/errors_02.py in print_message(day) 9 "sunday": "Aw, the weekend is almost over." 10 } ---> 11 print(messages[day]) 12 13 KeyError: 'Friday'
Key Points
The scope of a variable is the part of a program that can ‘see’ that variable.
Conditionals
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How can programs do different things for different data?
Objectives
Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators).
Trace the execution of unnested conditionals and conditionals inside loops.
Use if
statements to control whether or not a block of code is executed.
- An
if
statement (more properly called a conditional statement) controls whether some block of code is executed or not. - Structure is similar to a
for
statement:- First line opens with
if
and ends with a colon - Body containing one or more statements is indented (usually by 4 spaces)
- First line opens with
mass = 3.54
if mass > 3.0:
print(mass, 'is large')
mass = 2.07
if mass > 3.0:
print (mass, 'is large')
3.54 is large
Conditionals are often used inside loops.
- Not much point using a conditional when we know the value (as above).
- But useful when we have a collection to process.
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 3.0:
print(m, 'is large')
3.54 is large
9.22 is large
Use else
to execute a block of code when an if
condition is not true.
else
can be used following anif
.- Allows us to specify an alternative to execute when the
if
branch isn’t taken.
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 3.0:
print(m, 'is large')
else:
print(m, 'is small')
3.54 is large
2.07 is small
9.22 is large
1.86 is small
1.71 is small
Use elif
to specify additional tests.
- May want to provide several alternative choices, each with its own test.
- Use
elif
(short for “else if”) and a condition to specify these. - Always associated with an
if
. - Must come before the
else
(which is the “catch all”).
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
if m > 9.0:
print(m, 'is HUGE')
elif m > 3.0:
print(m, 'is large')
else:
print(m, 'is small')
3.54 is large
2.07 is small
9.22 is HUGE
1.86 is small
1.71 is small
Conditions are tested once, in order.
- Python steps through the branches of the conditional in order, testing each in turn.
- So ordering matters.
grade = 85
if grade >= 70:
print('grade is C')
elif grade >= 80:
print('grade is B')
elif grade >= 90:
print('grade is A')
grade is C
- Does not automatically go back and re-evaluate if values change.
velocity = 10.0
if velocity > 20.0:
print('moving too fast')
else:
print('adjusting velocity')
velocity = 50.0
adjusting velocity
- Often use conditionals in a loop to “evolve” the values of variables.
velocity = 10.0
for i in range(5): # execute the loop 5 times
print(i, ':', velocity)
if velocity > 20.0:
print('moving too fast')
velocity = velocity - 5.0
else:
print('moving too slow')
velocity = velocity + 10.0
print('final velocity:', velocity)
0 : 10.0
moving too slow
1 : 20.0
moving too slow
2 : 30.0
moving too fast
3 : 25.0
moving too fast
4 : 20.0
moving too slow
final velocity: 30.0
Create a table showing variables’ values to trace a program’s execution.
i | 0 | . | 1 | . | 2 | . | 3 | . | 4 | . |
velocity | 10.0 | 20.0 | . | 30.0 | . | 25.0 | . | 20.0 | . | 30.0 |
- The program must have a
print
statement outside the body of the loop to show the final value ofvelocity
, since its value is updated by the last iteration of the loop.
Compound Relations Using
and
,or
, and ParenthesesOften, you want some combination of things to be true. You can combine relations within a conditional using
and
andor
. Continuing the example above, suppose you havemass = [ 3.54, 2.07, 9.22, 1.86, 1.71] velocity = [10.00, 20.00, 30.00, 25.00, 20.00] i = 0 for i in range(5): if mass[i] > 5 and velocity[i] > 20: print("Fast heavy object. Duck!") elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20: print("Normal traffic") elif mass[i] <= 2 and velocity[i] <= 20: print("Slow light object. Ignore it") else: print("Whoa! Something is up with the data. Check it")
Just like with arithmetic, you can and should use parentheses whenever there is possible ambiguity. A good general rule is to always use parentheses when mixing
and
andor
in the same condition. That is, instead of:if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
write one of these:
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20: if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
so it is perfectly clear to a reader (and to Python) what you really mean.
Tracing Execution
What does this program print?
pressure = 71.9 if pressure > 50.0: pressure = 25.0 elif pressure <= 50.0: pressure = 0.0 print(pressure)
Solution
25.0
Trimming Values
Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4] result = ____ for value in original: if ____: result.append(0) else: ____ print(result)
[0, 1, 1, 1, 0, 1]
Solution
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4] result = [] for value in original: if value<0.0: result.append(0) else: result.append(1) print(result)
Initializing
Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.
values = [...some test data...] smallest, largest = None, None for v in values: if ____: smallest, largest = v, v ____: smallest = min(____, v) largest = max(____, v) print(smallest, largest)
What are the advantages and disadvantages of using this method to find the range of the data?
Solution
values = [-2,1,65,78,-54,-24,100] smallest, largest = None, None for v in values: if smallest==None and largest==None: smallest, largest = v, v else: smallest = min(smallest, v) largest = max(largest, v) print(smallest, largest)
Key Points
Use
if
statements to control whether or not a block of code is executed.Conditionals are often used inside loops.
Use
else
to execute a block of code when anif
condition is not true.Use
elif
to specify additional tests.Conditions are tested once, in order.
Create a table showing variables’ values to trace a program’s execution.
Programming Style
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How can I make my programs more readable?
How do most programmers format their code?
How can programs check their own operation?
Objectives
Provide sound justifications for basic rules of coding style.
Refactor one-page programs to make them more readable and justify the changes.
Use Python community coding standards (PEP-8).
Coding style
Coding style helps us to understand the code better. It helps to maintain and change the code. Python relies strongly on coding style, as we may notice by the indentation we apply to lines to define different blocks of code. Python proposes a standard style through one of its first Python Enhancement Proposals (PEP), PEP8, and highlight the importance of readability in the Zen of Python.
We may highlight some points:
- document your code
- use clear, meaningful variable names
- use white-space, not tabs, to indent lines
Follow standard Python style in your code.
- PEP8:
a style guide for Python that discusses topics such as how you should name variables,
how you should use indentation in your code,
how you should structure your
import
statements, etc. Adhering to PEP8 makes it easier for other Python developers to read and understand your code, and to understand what their contributions should look like. The PEP8 application and Python library can check your code for compliance with PEP8. - Google style guide on Python supports the use of PEP8 and extend the coding style to more specific structure of a Python code, which may be interesting also to follow.
Use assertions to check for internal errors.
Assertions are a simple, but powerful method for making sure that the context in which your code is executing is as you expect.
def calc_bulk_density(mass, volume):
'''Return dry bulk density = powder mass / powder volume.'''
assert volume > 0
return mass / volume
If the assertion is False
, the Python interpreter raises an AssertionError
runtime exception. The source code for the expression that failed will be displayed as part of the error message. To ignore assertions in your code run the interpreter with the ‘-O’ (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment.
Use docstrings to provide online help.
- If the first thing in a function is a character string that is not assigned to a variable, Python attaches it to the function as the online help.
- Called a docstring (short for “documentation string”).
def average(values):
"Return average of values, or None if no values are supplied."
if len(values) == 0:
return None
return sum(values) / average(values)
help(average)
Help on function average in module __main__:
average(values)
Return average of values, or None if no values are supplied.
Multiline Strings
Often use multiline strings for documentation. These start and end with three quote characters (either single or double) and end with three matching characters.
"""This string spans multiple lines. Blank lines are allowed."""
What Will Be Shown?
Highlight the lines in the code below that will be available as online help. Are there lines that should be made available, but won’t be? Will any lines produce a syntax error or a runtime error?
"Find maximum edit distance between multiple sequences." # This finds the maximum distance between all sequences. def overall_max(sequences): '''Determine overall maximum edit distance.''' highest = 0 for left in sequences: for right in sequences: '''Avoid checking sequence against itself.''' if left != right: this = edit_distance(left, right) highest = max(highest, this) # Report. return highest
Document This
Turn the comment on the following function into a docstring and check that
help
displays it properly.def middle(a, b, c): # Return the middle value of three. # Assumes the values can actually be compared. values = [a, b, c] values.sort() return values[1]
Solution
def middle(a, b, c): '''Return the middle value of three. Assumes the values can actually be compared.''' values = [a, b, c] values.sort() return values[1]
Clean Up This Code
- Read this short program and try to predict what it does.
- Run it: how accurate was your prediction?
- Refactor the program to make it more readable. Remember to run it after each change to ensure its behavior hasn’t changed.
- Compare your rewrite with your neighbor’s. What did you do the same? What did you do differently, and why?
n = 10 s = 'et cetera' print(s) i = 0 while i < n: # print('at', j) new = '' for j in range(len(s)): left = j-1 right = (j+1)%len(s) if s[left]==s[right]: new += '-' else: new += '*' s=''.join(new) print(s) i += 1
Solution
Here’s one solution.
def string_machine(input_string, iterations): """ Takes input_string and generates a new string with -'s and *'s corresponding to characters that have identical adjacent characters or not, respectively. Iterates through this procedure with the resultant strings for the supplied number of iterations. """ print(input_string) old = input_string for i in range(iterations): new = '' # iterate through characters in previous string for j in range(len(input_string)): left = j-1 right = (j+1)%len(input_string) # ensure right index wraps around if old[left]==old[right]: new += '-' else: new += '*' print(new) # store new string as old old = new string_machine('et cetera', 10)
et cetera *****-*** ----*-*-- ---*---*- --*-*-*-* **------- ***-----* --**---** *****-*** ----*-*-- ---*---*-
Key Points
Follow standard Python style in your code.
Use docstrings to provide online help.
Fitting data to models
Overview
Teaching: 50 min
Exercises: 10 minQuestions
How do I fit my data to a scientific model.
Objectives
Import the scipy.optimize library.
Understand the curvefit function.
Print the results from curvefit.
Plot the data from curvefit.
Data analysis with Python
Many physical systems can be modeled as an equation, which in Python would be represented by a function $f$. If an appropriate function $f$ can be found for an experiment we can use the equation to determine physical parameters releted to the experiment, and we can use this new model to predict new things about the world. Galileo used this method to calculate the trajectory of canonballs by rolling them down inclined ramps.
In experimental physics, we constrain these models by designing an experiment with two quantities. The first quantity, that we can control, is the independent variable. The second quantity, that we can measure, is the dependent variable. The relationship between these two quantities can then be used to determine some physical parameters.
A simple example of measuring the path of moving object. We could guess that the model is moving at a constant speed and design an experiment to find that speed using the model:
[s = ut]
Scipy provides functions that can fit model functions to data.
Scipy
provides a number of functions that, given a suitable model function, can return the best estimate of the unknown parameters in the model.
Consider the experiment where the time of flight of an object moving at constant speed is measured. If the experiment is correctly setup. The unknown variable we are trying to determine is the speed $u$. The remaining variables are time $t$ and height $s$. We can design two different experiments, one where we control time (measuring at a fixed interval) and measure distance, or one where we control distance and measure time.
In Python the model function might be written as:
def distance(time, speed):
"""Calculate the distance travelled at a constant speed for a known time."""
return speed * time
def model_function(independent_data, parameter1, parameter2):
"""A template for the model function."""
dependent_data = parameter1 * independent_data + parameter2
return dependent_data
#control time, measure distance
import numpy
# derr is my estimate of errors measuring distance, my ruler is bad.
derr = 5 # metres
measured_times =numpy.arange(10,100,10) #time in seconds
measured_distances = numpy.array([ 108.2, 220.4, 360.2, 482.8,
630.6, 793.9, 947.5, 1125.0, 1314.9]) # distance in metres
distance_errors = numpy.ones_like(measured_distances)*derr
For such a simple model, the average speed can be calculated from the data quite easily.
speeds = numpy.diff(measured_distances) / numpy.diff(measured_times)
average_speed = numpy.average(speeds)
print("Average speed is {:.04g} m/s".format(average_speed))
mean_times_error = numpy.std(speeds, ddof=1)/numpy.sqrt(speeds.size)
mean_times_std = numpy.sqrt( numpy.mean( derr**2 * numpy.ones(speeds.size)) )
#error propagation, sum in quadrature
speed_error = numpy.sqrt( numpy.mean( (distance_errors / measured_distances)**2) )* average_speed
print("Standard error in average speed is {:.03g} m/s".format(mean_times_error))
print("Error in average speed is {:.03g} m/s".format(speed_error))
Average speed is 15.08 m/s
Standard error in average speed is 0.928 m/s
Error in average speed is 0.281 m/s
# Copied here to make it easier to find!
def distance(time, speed):
"""Calculate the distance travelled at a constant speed for a known time."""
return speed * time
- You can also use
scipy.optimize.curve_fit
to perform this calculation.
from scipy.optimize import curve_fit
popt, pcov = curve_fit(distance, measured_times, measured_distances)
print("Speed is %4g m/s" % popt[0])
pvar = numpy.diag(pcov)
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))
Speed is 13.6645 m/s
Error in fitted speed is 0.31 m/s
What is popt
, pvar
?
popt
is a one dimensional array of the best estimates for the parameter values, each entry matches the order in the function definition-
pcov
is the covariance matrix showing the uncertainty and interdependence of each parameter inpopt
. We take the diagonal elements aspvar
for the variance of each parameter inpopt
. - The above error didn’t consider the errors in the individual data points correctly.
- Give
curve_fit
the error values using thesigma
keyword, and always useabsolute_sigma=True
Exercise 1
Predict the value of distance at after 10 seconds and 100s.
Calculate predictions using the model function
curve_fit
needs a model function to make predictions.- Any calculations using that model should also use the function to avoid errors
- e.g. when plotting the predictions you should call the model_function, and not rewrite the equation
# Always predict with the model function!
d10 = distance(10, popt[0])
d100 = distance(100, popt[0])
print("After 10 seconds, predicted distance = {:.4g}m".format(d10))
print("After 100 seconds, predicted distance = {:.4g}m".format(d100))
#dont_do_this
rewrite10 = popt[0] * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(rewrite10))
#or this
hardcoded10 = 13.64 * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(hardcoded10))
After 10 seconds, predicted distance = 136.6m
After 100 seconds, predicted distance = 1366m
After 10 seconds, predicted distance = 136.6m
After 10 seconds, predicted distance = 136.4m
popt, pcov = curve_fit(distance, measured_times, measured_distances,
absolute_sigma=True, sigma = distance_errors)
pvar = numpy.diag(pcov)
print("Average speed is {:.04g} m/s".format(popt[0]))
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))
Average speed is 13.66 m/s
Error in fitted speed is 0.0296 m/s
- With the correct error estimates, the model is more certain about the speed, but the eastimate of the average speed didn’t change.
The model function needs to follow the curve_fit
rules
- The function must take and array of independent data as its first argument
- The function can take any number of additional parameters that will be found using
curve_fit
- The function must return a single prediction of the dependent data for each value in the independent data.
def good_model_function(xdata, parameter_1, parameter_2, parameter_3):
# code_that_calculates_a_model
return prediction
curve_fit
works with multiple parameters
Extending the above experiment, what if the object was actually accelerating? The model function is now
[s = ut + \frac{1}{2} at^2]
where $a$ is the acceleration. We can change the model function and run the curve_fit
code again
def distance_with_acceleration(time, speed, acceleration):
"""Calculate the distance travelled with at a constant speed for a known time
and constant acceleration."""
return speed * time + 0.5 * acceleration * time**2
from scipy.optimize import curve_fit
popt2, pcov2 = curve_fit(distance_with_acceleration, measured_times, measured_distances,
absolute_sigma=True, sigma = distance_errors)
print("Initial speed is {:.04g} m/s".format(popt2[0]))
print("Error in fitted initial speed is {:.03g} m/s".format(numpy.sqrt(pcov2[0,0])))
print("Acceleration is {:.04g} m/s2".format(popt2[1]))
print("Error in fitted acceleration is {:.03g} m/s2".format(numpy.sqrt(pcov2[1,1])))
Initial speed is 10.26 m/s
Error in fitted initial speed is 0.119 m/s
Acceleration is 0.09589 m/s2
Error in fitted acceleration is 0.00325 m/s2
The data use here is fake, generated with an initial speed of 10.86 m/s and an acceleration of 0.1$m/s^2$. The model with constant speed predicted a higher speed to compensate for the acceleration!
Exercise 1
How could we have quickly checked whether our model was good?
A plot would have quickly showed the linear model is not correct, or printing each value predicted might tell us something too for small amounts of data.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.errorbar(measured_times, measured_distances,yerr=distance_errors, marker='o', linestyle='none', label="measured data")
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds)),label='simple average')
plt.plot(measured_times, distance(measured_times, popt[0]),label='$s=ut$')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1]),label=r'$s=ut+\frac{1}{2}at^2$')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")
Always plot your data and model fits.
Plotting residuals
Once you have a model prediction, you can check for problems in the model using a residual plot. Plot the difference between the model prediction and the measured data.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds))-measured_distances,label='simple average',marker='o',ls='')
plt.plot(measured_times, distance(measured_times, popt[0])-measured_distances,label='$s=ut$',marker='s',ls='')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1])-measured_distances,label=r'$s=ut+\frac{1}{2}at^2$',marker='<',ls='')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")
In this example you can see that the linear model (simple average
or s=ut
) deviates from the data in a way that depends on the independent variable
(time). If the model fit is good we expect to see differences betwee model and data that are random in magnitude and location (as in the quadratic fit).
curve_fit
find the best estimate of the parameters using by minimizing chi squared.
- Curve fit works by finding the combination of parameters that gives the lowest value of a parameter $\chi^2$, defined as
[\chi^2 = \sum\frac{(y_i - f(x_i))^2}{\sigma_{y_i}^2}]
- The lower the value of $\chi^2$, the closer the model is on average to each measured data point.
- This metric penalizes outliers disproportionally because of the square factor
- The metric weights the penalty of each point by the inverse of the standard deviation, penalizing (genuinely) noisier outliers less than less noisy outliers.
Reduced chi squared is easier to understand and compare between data sets.
- The value of $\chi^2$ for a good model depends on the number of data points and model parameters.
- A related variable \(\chi_r^2 = \frac{\chi^2}{\mathrm{dof}}\) is defined such that a the ideal value is 1.0.
-
To get the metric we need the number of degrees of freedom (dof) defined as the number of data points (N) minus the number of unknown parameters (m) \(\mathrm{dof} = N - m\).
- High values of $\chi_r^2$ are bad and suggest the model does a poor job of fitting the data.
- Low values («1) are also bad, suggesting the model fits the data too well.
- A low value suggests the model is fitting data better than the average error in the data should allow.
def chi2(y_measure,y_predict,errors):
"""Calculate the chi squared value given a measurement with errors and prediction"""
return numpy.sum( (y_measure - y_predict)**2 / errors**2 )
def chi2reduced(y_measure, y_predict, errors, number_of_parameters):
"""Calculate the reduced chi squared value given a measurement with errors and prediction,
and knowing the number of parameters in the model."""
return chi2(y_measure, y_predict, errors)/(y_measure.size - number_of_parameters)
print("Constant velocity model chi2r=",chi2reduced(measured_distances,
distance(measured_times,popt[0]),
distance_errors,
1))
print("Constant acceleration model chi2r=",chi2reduced(measured_distances,
distance_with_acceleration(measured_times,popt2[0],popt2[1]),
distance_errors,
2))
Constant velocity model chi2r= 109.63987561403505
Constant acceleration model chi2r= 1.1810606671759618
Chi square values should be rounded to a small number of digits, keeping only 1 or 2 significant figures.
print("Constant velocity model chi2r=",round(chi2reduced(measured_distances,
distance(measured_times,popt[0]),
distance_errors,
1),-1))
print("Constant acceleration model chi2r=",round(chi2reduced(measured_distances,
distance_with_acceleration(measured_times,popt2[0],popt2[1]),
distance_errors,
2),1))
Constant velocity model chi2r= 110.0
Constant acceleration model chi2r= 1.2
Exercise 2
Put a print statement inside the model function distance_with_acceleration
to print out the parameter values. What is happening to the parameter values?
def distance_with_acceleration_print(time, speed, acceleration):
"""Calculate the distance travelled with at a constant speed for a known time
and constant acceleration."""
print ("speed=",speed, "acceleration=",acceleration)
return speed * time + 0.5 * acceleration * time**2
popt2, pcov2 = curve_fit(distance_with_acceleration_print, measured_times, measured_distances,
absolute_sigma=True, sigma = distance_errors)
speed= 1.0 acceleration= 1.0
speed= 1.0 acceleration= 1.0
speed= 1.0 acceleration= 1.0
speed= 1.0000000149011612 acceleration= 1.0
speed= 1.0 acceleration= 1.0000000149011612
speed= 10.257717023193093 acceleration= 0.0958943850247661
speed= 10.257717176044988 acceleration= 0.0958943850247661
speed= 10.257717023193093 acceleration= 0.0958943864537038
speed= 10.257717023531002 acceleration= 0.09589438501373611
Non-linear regression
- Mathematically,
curve_fit
is using least squared error regression to find the best parameter estimate. curve_fit
works with non linear model (e.g. $y=at^{(b-1)}+c$) and the error estimates are usually correct.
When fitting non-linear functions, use the p0
keyword to start curve_fit
with a good estimate
p0
is used to provide a first guess of the parameters you are trying to find- If you have some idea of a parameter value, use
p0
to givecurve_fit
a better chance of finding the global minimum error for non-linear functions - Don’t be too precise so as not to bias the fitting process.
iteration=0
def nonlinear_function(t, a, b, c,verbose=True):
global iteration
if verbose:
print (iteration, "a=",a, "b=",b, "c=",c)
iteration = iteration+1
return a*t**(b-1) + c
#generated "good" data
t=numpy.arange(10)
y=numpy.array([-0.173, 2.12, 9.42, 19.69, 37.16, 59.40, 96.59, 119.448, 158.0,201.9])
sigmaNL = numpy.ones(10)*0.5
First, try fitting the non-linear function with no initial guess
iteration=0
poptNL1, pcovNL1 = curve_fit(nonlinear_function, t, y,
absolute_sigma=True, sigma = sigmaNL)
0 a= 1.0 b= 1.0 c= 1.0
1 a= 1.0 b= 1.0 c= 1.0
2 a= 1.0 b= 1.0 c= 1.0
3 a= 1.0000000149011612 b= 1.0 c= 1.0
4 a= 1.0 b= 1.0000000149011612 c= 1.0
5 a= 1.0 b= 1.0 c= 1.0000000149011612
6 a= 77.19199892187382 b= 1.000001167729559 c= 1.0
7 a= 77.19200007212423 b= 1.000001167729559 c= 1.0
8 a= 77.19199892187382 b= 1.0000011826307376 c= 1.0
9 a= 77.19199892187382 b= 1.000001167729559 c= 1.0000000149011612
...
150 a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734594072738433
151 a= 2.507417147966414 b= 2.9990317544021594 c= -0.9734594072738433
152 a= 2.5074171106029874 b= 2.999031799091215 c= -0.9734594072738433
153 a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734593927681677
154 a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605
Try a good guess for the parameters
iteration = 0
poptNL2, pcovNL2 = curve_fit(nonlinear_function, t, y,
absolute_sigma=True, sigma = sigmaNL, p0=(2.5,3,0))
#I think it's 2.5*t^2 with no offset
0 a= 2.5 b= 3.0 c= 0.0
1 a= 2.5 b= 3.0 c= 0.0
2 a= 2.5 b= 3.0 c= 0.0
3 a= 2.500000037252903 b= 3.0 c= 0.0
4 a= 2.5 b= 3.0000000447034836 c= 0.0
5 a= 2.5 b= 3.0 c= 1.4901161193880158e-08
6 a= 2.507540116653946 b= 2.9990074809599334 c= -0.973917163330992
7 a= 2.5075401540192055 b= 2.9990074809599334 c= -0.973917163330992
8 a= 2.507540116653946 b= 2.9990075256486275 c= -0.973917163330992
9 a= 2.507540116653946 b= 2.9990074809599334 c= -0.9739171488184953
10 a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643979860024
11 a= 2.5074184599976044 b= 2.9990315172382234 c= -0.9734643979860024
12 a= 2.5074184226341583 b= 2.9990315619272754 c= -0.9734643979860024
13 a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643834802524
14 a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336
Now try an unreasonable guess for the b
parameter
iteration = 0
poptNL3, pcovNL3 = curve_fit(nonlinear_function, t, y,
absolute_sigma=True, sigma = sigmaNL, p0=(3,-2,0.1))
#I think it's 3/t^3 +0.1
0 a= 3.0 b= -2.0 c= 0.1
1 a= 3.0 b= -2.0 c= 0.1
2 a= 3.0 b= -2.0 c= 0.1
3 a= 3.0000000447034836 b= -2.0 c= 0.1
4 a= 3.0 b= -1.9999999701976776 c= 0.1
5 a= 3.0 b= -2.0 c= 0.10000000149011612
<ipython-input-16-1520d182c2d1>:7: RuntimeWarning: divide by zero encountered in power
return a*t**(b-1) + c
/Users/lee/anaconda3/lib/python3.8/site-packages/scipy/optimize/minpack.py:828: OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated',
It’s always important to check the fit
plt.figure(figsize=(8,6))
plt.errorbar(t,
y,
yerr=sigmaNL, marker='o',ls='none',label="Data")
def plot_and_print(popt,ls,label):
plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2]),label=label,ls=ls,lw=3)
plot_and_print(poptNL1,"-","No guess")
plot_and_print(poptNL2,"--","good guess")
plot_and_print(poptNL3,":","Bad guess")
plt.legend()
plt.xlabel("Time")
plt.ylabel("Value")
plt.figure(figsize=(8,6))
def plot_residual(data, popt,marker,label):
plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2],verbose=False)-data,label=label,marker=marker,ls='',lw=3)
plot_residual(y,poptNL1,"o","No guess")
plot_residual(y,poptNL2,"s","good guess")
plot_residual(y,poptNL3,"<","Bad guess")
plt.legend()
plt.setp(plt.gca(),ylabel="Residual",xlabel="Time (s)")
18 a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605
19 a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336
20 a= 3.0 b= -2.0 c= 0.1
Key Points
scipy
provides tools and functions to fit models to data.Use
curve_fit
to fit linear and non-linear models to experimental dataUse appropriate errors in the
sigma
keyword to get a better estimate of parameter errors.Check the fit using a plot if possible
Check the χ2 value to compare the fit against the errors in the measurements.
Non linear models can be fitted, but may need an initial esimate of the parameters.
Wrap-Up
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What have we learned?
What else is out there and where do I find it?
Objectives
Name and locate scientific Python community sites for software, workshops, and help.
Python supports a large community within and outwith research.
-
The Python 3 documentation covers the core language and the standard library.
-
PyCon is the largest annual conference for the Python community.
-
SciPy is a rich collection of scientific utilities. It is also the name of a series of annual conferences.
-
Jupyter is the home of the Jupyter Notebook.
-
Pandas is the home of the Pandas data library.
-
Stack Overflow’s general Python section can be helpful, as can the sections on NumPy, SciPy, Pandas, and other topics.
Key Points
Python supports a large community within and outwith research.