Running and Quitting

Overview

Teaching: 10 min
Exercises: 0 min

Questions

How can I run Python programs?

Objectives

Launch the Jupyter Notebook, create new notebooks, and exit the Notebook.

Create Markdown cells in a notebook.

Create and run Python cells in a notebook.

Python programs are plain text files.

They have the .py extension to let everyone (including the operating system) know it is a Python program.
- This is convention, not a requirement.
It’s common to write them using a text editor but we are going to use the Jupyter Notebook to present the tutorials.
The bit of extra setup is well worth it because the Notebook provides code completion and other helpful features.
Notebook files have the extension .ipynb to distinguish them from plain-text Python programs.
- Can export as “pure Python” to run from the command line.

Use the Jupyter Notebook for editing and running Python.

The Anaconda package manager is an automated way to install the Jupyter notebook.
- See the setup instructions for Anaconda installation instructions.
It also installs all the extra libraries it needs to run.
Once you have installed Python and the Jupyter Notebook requirements, open a shell and type:
```
$ jupyter notebook
```
This will start a Jupyter Notebook server and open your default web browser.
The server runs locally on your machine only and does not use an internet connection.
The server sends messages to your browser.
The server does the work and the web browser renders the notebook.
You can type code into the browser and see the result when the web page talks to the server.
This has several advantages:
- You can easily type, edit, and copy and paste blocks of code.
- Tab complete allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc. to make it more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.

Spyder

You can also use a Python Integrated Development Environment Example Spyder

PyCharm

You can also use a Python Integrated Development Environment PyCharm

Jupyter notebook

Example Jupyter Notebook

Math with numbers

Addition

2+3

Multiplication

2*2

More complex

9.81*10.2**2/2

510.3162

Power

3 ** 2

Math with strings (?)

Concatenation

"cat " + "dog"

'cat dog'

repetition

"cat "*3 + "dog"

'cat cat cat dog'

subtraction

"cat " - "dog"

...
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Operators

Addition

6 + 7

Multiplication

6 * 7

Division

6 / 7 

0.8571428571428571

Power

6 ** 7 # power

Modulo

66 % 7  # Modulo (remainder)

Logic

Less than

6 < 7

True

less than or equal to

6 <= 7 

True

equal to

6 == 7 

False

greater than or equal to

6 >= 7 

False

greater than

6 > 7 

False

logical and

True and False 

False

logical or

True or False

True

logical not

not False

True

Key Points

Python programs are plain text files.

Use the Jupyter Notebook for editing and running Python.

The Notebook has Command and Edit modes.

Use the keyboard and mouse to select and edit cells.

The Notebook will turn Markdown into pretty-printed documentation.

Markdown does most of what HTML does.

Variables and Assignment

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How can I store data in programs?

Objectives

Write programs that assign scalar values to variables and perform calculations with those values.

Correctly trace value changes in programs that use scalar assignment.

Use variables to store values.

Variables are names for values.
In Python the = symbol assigns the value on the right to the name on the left.
The variable is created when a value is assigned to it.
Here, Python assigns an age to a variable age and a name in quotes to a variable first_name.

age = 42
first_name = 'Ahmed'

Variable names
- can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
- cannot start with a digit
Variable names that start with underscores like __alistairs_real_age have a special meaning so we won’t do that until we understand the convention.

Use `print` to display values.

Python has a built-in function called print that prints things as text.
Call the function (i.e., tell Python to run it) by using its name.
Provide values to the function (i.e., the things to print) in parentheses.
To add a string to the printout, wrap the string in single or double quotes.
The values passed to the function are called ‘arguments’

print(first_name, 'is', age, 'years old')

Ahmed is 42 years old

print automatically puts a single space between items to separate them.
And wraps around to a new line at the end.

Variables must be created before they are used.

If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error.
- Unlike some languages, which “guess” a default value.

print(last_name)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-c1fbb4e96102> in <module>()
----> 1 print(last_name)

NameError: name 'last_name' is not defined

The last line of an error message is usually the most informative.
We will look at error messages in detail later.

Variables can be used in calculations.

We can use variables in calculations just as if they were values.
- Remember, we assigned 42 to age a few lines ago.

age = age + 3
print('Age in three years:', age)

Age in three years: 45

Use an index to get a single character from a string.

The characters (individual letters, numbers, and so on) in a string are ordered. For example, the string ‘AB’ is not the same as ‘BA’. Because of this ordering, we can treat the string as a list of characters.
Each position in the string (first, second, etc.) is given a number. This number is called an index or sometimes a subscript.
Indices are numbered from 0.
Use the position’s index in square brackets to get the character at that position.

atom_name = 'helium'
print(atom_name[0])

Use a slice to get a substring.

A part of a string is called a substring. A substring can be as short as a single character.
An item in a list is called an element. Whenever we treat a string as if it were a list, the string’s elements are its individual characters.
A slice is a part of a string (or, more generally, any list-like thing).
We take a slice by using [start:stop], where start is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
Mathematically, you might say that a slice selects [start:stop).
The difference between stop and start is the slice’s length.
Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

atom_name = 'sodium'
print(atom_name[0:3])

sod

Use the built-in function `len` to find the length of a string.

print(len('helium'))

Nested functions are evaluated from the inside out, just like in mathematics.

Python is case-sensitive.

Python thinks that upper- and lower-case letters are different, so Name and name are different variables.
There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.

Use meaningful variable names.

Python doesn’t care what you call variables as long as they obey the rules (alphanumeric characters and the underscore).

flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')

Use meaningful variable names to help other people understand what the program does.
The most important “other person” is your future self.

Predicting Values

What is the final value of position in the program below? (Try to predict the value without running the program, then check your prediction.)
initial = 'left'
position = initial
initial = 'right'
Solution
'left'
The initial variable is assigned the value ‘left’. In the second line, the position variable also receives the string value ‘left’. In third line, the initial variable is given the value ‘right’, but the position variable retains its string value of ‘left’.

Challenge

If you assign a = 123, what happens if you try to get the second digit of a via a[1]?
Solution

Numbers are not stored in the written representation, so they can’t be treated like strings.
a = 123
print(a[1])
TypeError: 'int' object is not subscriptable

Choosing a Name

Which is a better variable name, m, min, or minutes? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:

ts = m * 60 + s

tot_sec = min * 60 + sec

total_seconds = minutes * 60 + seconds

Solution

minutes is better because min might mean something like “minimum” (and actually does in Python, but we haven’t seen that yet).

Slicing

What does the following program print?
atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])
atom_name[1:3] is: ar
What does thing[low:high] do?

What does thing[low:] (without a value after the colon) do?

What does thing[:high] (without a value before the colon) do?

What does thing[:] (just a colon) do?

What does thing[number:negative-number] do?

What happens when you choose a high value which is out of range? (i.e., try atom_name[0:15])

Key Points

Use variables to store values.

Use print to display values.

Variables persist between cells.

Variables must be created before they are used.

Variables can be used in calculations.

Use an index to get a single character from a string.

Use a slice to get a substring.

Use the built-in function len to find the length of a string.

Python is case-sensitive.

Use meaningful variable names.

Data Types and Type Conversion

Overview

Teaching: 5 min
Exercises: 5 min

Questions

What kinds of data do programs store?

How can I convert one type to another?

Objectives

Explain key differences between integers and floating point numbers.

Explain key differences between numbers and character strings.

Use built-in functions to convert between integers, floating point numbers, and strings.

Every value has a type.

Every value in a program has a specific type.
Integer (int): represents positive or negative whole numbers like 3 or -512.
Floating point number (float): represents real numbers like 3.14159 or -2.5.
Character string (usually called “string”, str): text.
- Written in either single quotes or double quotes (as long as they match).
- The quote marks aren’t printed when the string is displayed.

Use the built-in function `type` to find the type of a value.

Use the built-in function type to find out what type a value has.
Works on variables as well.
- But remember: the value has the type — the variable is just a label.

print(type(52))

<class 'int'>

fitness = 'average'
print(type(fitness))

<class 'str'>

Types control what operations (or methods) can be performed on a given value.

A value’s type determines what the program can do to it.

print(5 - 3)

print('hello' - 'h')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-67f5626a1e07> in <module>()
----> 1 print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

You can use the “+” and “*” operators on strings.

“Adding” character strings concatenates them.

full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)

Ahmed Walsh

Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
- Since multiplication is repeated addition.

separator = '=' * 10
print(separator)

==========

Strings have a length (but numbers don’t).

The built-in function len counts the number of characters in a string.

print(len(full_name))

But numbers don’t have a length (not even zero).

print(len(52))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f769e8e8097d> in <module>()
----> 1 print(len(52))

TypeError: object of type 'int' has no len()

Must convert numbers to strings or vice versa when operating on them.

Cannot add numbers and strings.

print(1 + '2')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-fe4f54a023c6> in <module>()
----> 1 print(1 + '2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Not allowed because it’s ambiguous: should 1 + '2' be 3 or '12'?
Some types can be converted to other types by using the type name as a function.

print(1 + int('2'))
print(str(1) + '2')

3
12

Can mix integers and floats freely in operations.

Integers and floating-point numbers can be mixed in arithmetic.
- Python 3 automatically converts integers to floats as needed. (Integer division in Python 2 will return an integer, the floor of the division.)

print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)

half is 0.5
three squared is 9.0

Variables only change value when something is assigned to them.

If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
This does not happen in programming languages.

first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)

first is 2 and second is 5

The computer reads the value of first when doing the multiplication, creates a new value, and assigns it to second.
After that, second does not remember where it came from.

Fractions

What type of value is 3.4? How can you find out?
Solution

It is a floating-point number (often abbreviated “float”).
print(type(3.4))
<class 'float'>

Automatic Type Conversion

What type of value is 3.25 + 4?
Solution

It is a float: integers are automatically converted to floats as necessary.
result = 3.25 + 4
print(result, 'is', type(result))
7.25 is <class 'float'>

Choose a Type

What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?

Number of days since the start of the year.

Time elapsed from the start of the year until now in days.

Serial number of a piece of lab equipment.

A lab specimen’s age

Current population of a city.

Average population of a city over time.

Solution

The answers to the questions are:

Integer, since the number of days would lie between 1 and 365.

Floating point, since fractional days are required

Character string if serial number contains letters and numbers, otherwise integer if the serial number consists only of numerals

This will vary! How do you define a specimen’s age? whole days since collection (integer)? date and time (string)?

Choose floating point to represent population as large aggreates (eg millions), or integer to represent population in units of individuals.

Floating point number, since an average is likely to have a fractional part.

Division Types

In Python 3, the // operator performs integer (whole-number) floor division, the / operator performs floating-point division, and the ‘%’ (or modulo) operator calculates and returns the remainder from integer division:
print('5 // 3:', 5//3)
print('5 / 3:', 5/3)
print('5 % 3:', 5%3)
5 // 3: 1
5 / 3: 1.6666666666666667
5 % 3: 2
However in Python2 (and other languages), the / operator between two integer types perform a floor (//) division. To perform a float division, we have to convert one of the integers to float.
print('5 // 3:', 1)
print('5 / 3:', 1 )
print('5 / float(3):', 1.6666667 )
print('float(5) / 3:', 1.6666667 )
print('float(5 / 3):', 1.0 )
print('5 % 3:', 2)
If num_subjects is the number of subjects taking part in a study, and num_per_survey is the number that can take part in a single survey, write an expression that calculates the number of surveys needed to reach everyone once.
Solution

We want the minimum number of surveys that reaches everyone once, which is the rounded up value of num_subjects / num_per_survey. This is equivalent to performing an integer division with // and adding 1.
num_subjects = 600
num_per_survey = 42
num_surveys = num_subjects // num_per_survey + 1

print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
600 subjects, 42 per survey: 15

Strings to Numbers

Where reasonable, float() will convert a string to a floating point number, and int() will convert a floating point number to an integer:
print("string to float:", float("3.4"))
print("float to int:", int(3.4))
string to float: 3.4
float to int: 3
If the conversion doesn’t make sense, however, an error message will occur
print("string to float:", float("Hello world!"))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-df3b790bf0a2> in <module>()
----> 1 print("string to float:", float("Hello world!"))

ValueError: could not convert string to float: 'Hello world!'
Given this information, what do you expect the following program to do?

What does it actually do?

Why do you think it does that?
print("fractional string to int:", int("3.4"))
Solution

What do you expect this program to do? It would not be so unreasonable to expect the Python 3 int command to convert the string “3.4” to 3.4 and an additional type conversion to 3. After all, Python 3 performs a lot of other magic - isn’t that part of its charm?

However, Python 3 throws an error. Why? To be consistent, possibly. If you ask Python to perform two consecutive typecasts, you must convert it explicitly in code.
int("3.4")
int(float("3.4"))
In [2]: int("3.4")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-ec6729dfccdc> in <module>()
----> 1 int("3.4")
ValueError: invalid literal for int() with base 10: '3.4'
3

Arithmetic with Different Types

Which of the following will print 2.0? Note: there may be more than one right answer.
first = 1.0
second = "1"
third = "1.1"
first + float(second)

float(second) + float(third)

first + int(third)

first + int(float(third))

int(first) + int(float(third))

2.0 * second

Solution

Answer: 1 and 4

Complex Numbers

Python provides complex numbers, which are written as 1.0+2.0j. If val is an imaginary number, its real and imaginary parts can be accessed using dot notation as val.real and val.imag.

Why do you think Python uses j instead of i for the imaginary part?

What do you expect 1+2j + 3 to produce?

What do you expect ‘4j’ to be? What about 4 j or `4 + j’? >

Solution

Standard mathematics treatments typically use i to denote an imaginary number. However, from media reports it was an early convention established from electrical engineering that now presents a technically expensive area to change. Stack Overflow provides additional explanation and discussion

4+2j

4j, syntax error, depends on the value of j

Key Points

Every value has a type.

Use the built-in function type to find the type of a value.

Types control what operations can be done on values.

Strings can be added and multiplied.

Strings have a length (but numbers don’t).

Must convert numbers to strings or vice versa when operating on them.

Can mix integers and floats freely in operations.

Variables only change value when something is assigned to them.

Built-in Functions and Help

Overview

Teaching: 15 min
Exercises: 10 min

Questions

How can I use built-in functions?

How can I find out what they do?

What kind of errors can occur in programs?

Objectives

Explain the purpose of functions.

Correctly call built-in Python functions.

Correctly nest calls to built-in functions.

Use help to display documentation for built-in functions.

Correctly describe situations in which SyntaxError and NameError occur.

Use comments to add documentation to programs.

# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.

A function may take zero or more arguments.

We have seen some functions already — now let’s take a closer look.
An argument is a value passed into a function.
len takes exactly one.
int, str, and float create a new value from an existing one.
print takes zero or more.
print with no arguments prints a blank line.
- Must always use parentheses, even if they’re empty, so that Python knows a function is being called.

print('before')
print()
print('after')

before

after

Commonly-used built-in functions include `max`, `min`, and `round`.

Use max to find the largest value of one or more values.
Use min to find the smallest.
Both work on character strings as well as numbers.
- “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.

print(max(1, 2, 3))
print(min('a', 'A', '0'))

3
0

Functions may only work for certain (combinations of) arguments.

max and min must be given at least one argument.
- “Largest of the empty set” is a meaningless question.
And they must be given things that can meaningfully be compared.

print(max(1, 'a'))

TypeError: unorderable types: str() > int()

Functions may have default values for some arguments.

round will round off a floating-point number.
By default, rounds to zero decimal places.

round(3.712)

We can specify the number of decimal places we want.

round(3.712, 1)

3.7

Use the built-in function `help` to get help for a function.

Every built-in function has online documentation.

help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number

    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.

Python reports a syntax error when it can’t understand the source of a program.

Won’t even try to run the program if it can’t be parsed.

# Forgot to close the quote marks around the string.
name = 'Feng

SyntaxError: EOL while scanning string literal

# An extra '=' in the assignment.
age = = 52

SyntaxError: invalid syntax

Look more closely at the error message:

print("hello world"

  File "<ipython-input-6-d1cc229bf815>", line 1
    print ("hello world"
                        ^
SyntaxError: unexpected EOF while parsing

The message indicates a problem on first line of the input (“line 1”).
- In this case the “ipython-input” section of the file name tells us that we are working with input into IPython, the Python interpreter used by the Jupyter Notebook.
The -6- part of the filename indicates that the error occurred in cell 6 of our Notebook.
Next is the problematic line of code, indicating the problem with a ^ pointer.

Python reports a runtime error when something goes wrong while a program is executing.

age = 53
remaining = 100 - aege # mis-spelled 'age'

NameError: name 'aege' is not defined

Fix syntax errors by reading the source and runtime errors by tracing execution.

The Jupyter Notebook has two ways to get help.

Place the cursor inside the parenthesis of the function, hold down shift, and press tab.
Or type a function name with a question mark after it.

Every function returns something.

Every function call produces some result.
If the function doesn’t have a useful result to return, it usually returns the special value None.

result = print('example')
print('result of print is', result)

example
result of print is None

What Happens When

Explain in simple terms the order of operations in the following program: when does the addition happen, when does the subtraction happen, when is each function called, etc.

What is the final value of radiance?
radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
Solution

1.

1.1 * radiance = 1.1

1.1 - 0.5 = 0.6

min(randiance, 0.6) = 0.6

2.0 + 0.6 = 2.6

max(2.1, 2.6) = 2.6

At the end, radiance = 2.6

Spot the Difference

Predict what each of the print statements in the program below will print.

Does max(len(rich), poor) run or produce an error message? If it runs, does its result make any sense?
easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
print(max(len(rich), len(poor)))
Solution
print(max(easy_string))
c
print(max(rich, poor))
tin
print(max(len(rich), len(poor)))
4
It throws a TypeError. The command is trying to run max(4, 'tin') and you can’t compare a string and an integer

Why Not?

Why don’t max and min return None when they are given no arguments?

Solution

max and min return TypeErrors in this case because the correct number of parameters was not supplied. If it just returned None, the error would be much harder to trace as it would likely be stored into a variable and used later in the program, only to likely throw a runtime error.

Last Character of a String

If Python starts counting from zero, and len returns the number of characters in a string, what index expression will get the last character in the string name? (Note: we will see a simpler way to do this in a later episode.)

Solution

name[len(name) - 1]

Key Points

Use comments to add documentation to programs.

A function may take zero or more arguments.

Commonly-used built-in functions include max, min, and round.

Functions may only work for certain (combinations of) arguments.

Functions may have default values for some arguments.

Use the built-in function help to get help for a function.

The Jupyter Notebook has two ways to get help.

Every function returns something.

Python reports a syntax error when it can’t understand the source of a program.

Python reports a runtime error when something goes wrong while a program is executing.

Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.

Libraries

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How can I use software that other people have written?

How can I find out what that software does?

Objectives

Explain what software libraries are and why programmers create and use them.

Write programs that import and use libraries from Python’s standard library.

Find and read documentation for standard libraries interactively (in the interpreter) and online.

Most of the power of a programming language is in its libraries.

A library is a collection of files (called modules) that contains functions for use by other programs.
- May also contain data values (e.g., numerical constants) and other things.
- Library’s contents are supposed to be related, but there’s no way to enforce that.
The Python standard library is an extensive suite of modules that comes with Python itself.
Many additional libraries are available from PyPI (the Python Package Index).
We will see later how to write new libraries.

Libraries and modules

A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

A program must import a library module before using it.

Use import to load a library module into a program’s memory.
Then refer to things from the module as module_name.thing_name.
- Python uses . to mean “part of”.
Using math, one of the modules in the standard library:

import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))

pi is 3.141592653589793
cos(pi) is -1.0

Have to refer to each item with the module’s name.
- math.cos(pi) won’t work: the reference to pi doesn’t somehow “inherit” the function’s reference to math.

Use `help` to learn about the contents of a library module.

Works just like help for a function.

help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    http://docs.language-python.org/3.5/library/math

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)

        Return the arc cosine (measured in radians) of x.
⋮ ⋮ ⋮

Import specific items from a library module to shorten programs.

Use from ... import ... to load only specific items from a library module.
Then refer to them directly without library name as prefix.

from math import cos, pi

print('cos(pi) is', cos(pi))

cos(pi) is -1.0

Create an alias for a library module when importing it to shorten programs.

Use import ... as ... to give a library a short alias while importing it.
Then refer to items in the library using that shortened name.

import math as m

print('cos(pi) is', m.cos(m.pi))

cos(pi) is -1.0

Commonly used for libraries that are frequently used or have long names.
- E.g., matplotlib plotting library is often aliased as mpl.
But can make programs harder to understand, since readers must learn your program’s aliases.

Exploring the Math Module

What function from the math module can you use to calculate a square root without using sqrt?

Since the library contains this function, why does sqrt exist?

Solution

Using help(math) we see that we’ve got pow(x,y) in addition to sqrt(x), so we could use pow(x, 0.5) to find a square root.

The sqrt(x) function is arguably more readable than pow(x, 0.5) when implementing equations. Readability is a cornerstone of good programming, so it makes sense to provide a special function for this specific common case.

Also, the design of Python’s math library has its origin in the C standard, which includes both sqrt(x) and pow(x,y), so a little bit of the history of programming is showing in Python’s function names.

Locating the Right Module

You want to select a random character from a string:
bases = 'ACTTGCTTGAC'
Which standard library module could help you?

Which function would you select from that module? Are there alternatives?

Try to write a program that uses the function.
Solution

The random module seems like it could help you.

The string has 11 characters, each having a positional index from 0 to 10. You could use random.randrange function (or the alias random.randint if you find that easier to remember) to get a random integer between 0 and 10, and then pick out the character at that position:
from random import randrange

random_index = randrange(len(bases))
print(bases[random_index])
or more compactly:
from random import randrange

print(bases[randrange(len(bases))])
Perhaps you found the random.sample function? It allows for slightly less typing:
from random import sample

print(sample(bases, 1)[0])
Note that this function returns a list of values. We will learn about lists in episode 11.

There’s also other functions you could use, but with more convoluted code as a result.

Jigsaw Puzzle (Parson’s Problem) Programming Example

Rearrange the following statements so that a random DNA base is printed and its index in the string. Not all statements may be needed. Feel free to use/add intermediate variables.
bases="ACTTGCTTGAC"
import math
import random
___ = random.randrange(n_bases)
___ = len(bases)
print("random base ", bases[___], "base index", ___)
Solution
import math 
import random
bases = "ACTTGCTTGAC" 
n_bases = len(bases)
idx = random.randrange(n_bases)
print("random base", bases[idx], "base index", idx)

When Is Help Available?

When a colleague of yours types help(math), Python reports an error:
NameError: name 'math' is not defined
What has your colleague forgotten to do?

Solution

Importing the math module (import math)

Importing With Aliases

Fill in the blanks so that the program below prints 90.0.

Rewrite the program so that it uses import without as.

Which form do you find easier to read?
import math as m
angle = ____.degrees(____.pi / 2)
print(____)
Solution
import math as m
angle = m.degrees(m.pi / 2)
print(angle)
can bewritten as
import math
angle = math.degrees(math.pi / 2)
print(angle)
Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.

There Are Many Ways To Import Libraries!

Match the following print statements with the appropriate library calls.

Print commands:

print("sin(pi/2) =",sin(pi/2))

print("sin(pi/2) =",m.sin(m.pi/2))

print("sin(pi/2) =",math.sin(math.pi/2))

Library calls:

from math import sin,pi

import math

import math as m

from math import *

Solution

Library calls 1 and 4. In order to directly refer to sin and pi without the library name as prefix, you need to use the from ... import ... statement. Whereas library call 1 specifically imports the two functions sin and pi, library call 4 imports all functions in the math module.

Library call 3. Here sin and pi are referred to with a shortened library name m instead of math. Library call 3 does exactly that using the import ... as ... syntax - it creates an alias for math in the form of the shortened name m.

Library call 2. Here sin and pi are referred to with the regular library name math, so the regular import ... call suffices.

Importing Specific Items

Fill in the blanks so that the program below prints 90.0.

Do you find this version easier to read than preceding ones?

Why wouldn’t programmers always use this form of import?
____ math import ____, ____
angle = degrees(pi / 2)
print(angle)
Solution
from math import degrees, pi
angle = degrees(pi / 2)
print(angle)
Most likely you find this version easier to read since it’s less dense. The main reason not to use this form of import is to avoid name clashes. For instance, you wouldn’t import degrees this way if you also wanted to use the name degrees for a variable or function of your own. Or if you were to also import a function named degrees from another library.

Reading Error Messages

Read the code below and try to identify what the errors are without running it.

Run the code, and read the error message. What type of error is it?
from math import log
log(0)
Solution

The logarithm of x is only defined for x > 0, so 0 is outside the domain of the function.

You get an error of type “ValueError”, indicating that the function received an inappropriate argument value. The additional message “math domain error” makes it clearer what the problem is.

Key Points

Most of the power of a programming language is in its libraries.

A program must import a library module in order to use it.

Use help to learn about the contents of a library module.

Import specific items from a library to shorten programs.

Create an alias for a library when importing it to shorten programs.

Numpy and Scipy

Overview

Teaching: 40 min
Exercises: 10 min

Questions

How do I deal with tabular scientific data?

Objectives

Import the numpy library.

Understand the NDArray object.

Import the numpy library.

Get some basic information about a numpy and scipy objects and methods.

Numpy is the main Python library for scientific computation

Numpy provides a new data type, the array
arrays are multi-dimensional collections of data of the same intrinsic type (int, float, etc.)

Import numpy before using it

numpy is not built in, but is often installed by default.
use import numpy to import the entire package.
use from numpy import ... to import some functions.
use import numpy as np to use the most common alias.

import numpy as np
import numpy
from numpy import cos

print(numpy.cos, np.cos, cos)

<ufunc 'cos'> <ufunc 'cos'> <ufunc 'cos'>

Use `numpy.zeros` to create empty arrays

f10 = numpy.zeros(10)
i10 = numpy.zeros(10, dtype=int)
print("default array of zeros: ", f10)
print("integer array of zeros: ", i10)

default array of zeros:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
integer array of zeros:  [0 0 0 0 0 0 0 0 0 0]

Use `numpy.ones` to create an array of ones.

print("Using numpy.ones    : ", numpy.ones(10))
print("is the same thing as: ", numpy.zeros(10)+1)

Using numpy.ones    :  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
is the same thing as:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Using `numpy.arange` to generate sets of numbers

arange takes from one to three arguments. By default arange will generate numbers starting from 0 with a step of 1
arange(N) generates numbers from 0..N-1
arange(M,N) generates numbers from M..N-1
arange(M,N,P) generates numbers from M..N-1 including only ever Pth number.

numpy.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

generate an array of numbers from 1 to 10

numpy.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

generate an array of odd numbers from 1 to 10

numpy.arange(1,10,2)

array([1, 3, 5, 7, 9])

incorrectly generate an array of odd numbers from 1 to 10, backwards

numpy.arange(1,10,-2)

array([], dtype=int64)

generate an array of even numbers from 10 to 2, backwards

numpy.arange(10,1,-2)

array([10,  8,  6,  4,  2])

Numpy arrays have a `shape`

Numpy arrays have a shape parameter associated with them
You can change the shape with the reshape method

a = numpy.arange(10)
print("a's shape is ",a.shape)

b=a.reshape(5,2)
print("b's shape is ",b.shape)

a's shape is  (10,)
b's shape is  (5, 2)

Numpy arrays can be treated like single numbers in arithmetic

Arithmetic using numpy arrays is element-by-element
Matrix operations are possible with functions or methods.
The size and shape of the arrays should match.

a = numpy.arange(5)
b = numpy.arange(5)
print("a=",a)
print("b=",b)
print("a*b=",a*b)
print("a+b=",a+b)

a= [0 1 2 3 4]
b= [0 1 2 3 4]
a*b= [ 0  1  4  9 16]
a+b= [0 2 4 6 8]

c = numpy.ones((5,2))
d = numpy.ones((5,2)) + 100
c+d

array([[102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.]])

e = c.reshape(2,5)
c+e #c and e have different shapes

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-0e32881b9afe> in <module>()
      1 e = c.reshape(2,5)
----> 2 c+e #c and e have different shapes

ValueError: operands could not be broadcast together with shapes (5,2) (2,5) 
---------------------------------------------------------------------------

The Numpy library has many functions that work on `arrays`

Aggregation functions like sum,mean,size

a=numpy.arange(5)
print("a = ", a)

a =  [0 1 2 3 4]

Add all of the elements of the array together.

print("sum(a) = ", a.sum())

sum(a) =  10

Calculate the average value of the elements in the array.

print("mean(a) = ", a.mean())

mean(a) =  2.0

Calculate something called std of the array.

print("std(a) = ", a.std()) #what is this?

std(a) =  1.4142135623730951

Calculate the sin of each element in the array

print("np.sin(a) = ", np.sin(a))

np.sin(a) =  [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]

Check the `numpy` help and webpage for more functions

https://docs.scipy.org/doc/numpy/reference/routines.html

Use the `axis` keyword to use the function over a subset of the data.

Many functions take the axis keyword to perform the aggregation of that dimension

a = numpy.arange(10).reshape(5,2)
print("a=",a)
print("mean(a)="  ,numpy.mean(a))
print("mean(a,0)=",numpy.mean(a,axis=0))
print("mean(a,1)=",numpy.mean(a,axis=1))

a= [[0 1]
    [2 3]
    [4 5]
    [6 7]
    [8 9]]
mean(a)= 4.5
mean(a,0)= [4. 5.]
mean(a,1)= [0.5 2.5 4.5 6.5 8.5]

Use square brackets to access elements in the array

Single integers in square brackets returns one element
ranges of data can be accessed with slices

a=numpy.arange(10)

Access the fifth element

a[5]

Access elements 5 through 10

a[5:10]

array([5, 6, 7, 8, 9])

Access elements from 5 to the end of the array

a[5:] #No second number means "rest of the array"

array([5, 6, 7, 8, 9])

Access all elements from the start of the array to the fifth element.

a[:5] #No first number means "from the start of the array"

array([0, 1, 2, 3, 4])

Access every 2nd element from the 5th to the 10th

a[5:10:2] #A third number means "every Nth element"

array([5, 7, 9])

Access every -2nd element from the 5th to the 10th. (incorrect)

a[5:10:-2] #negative numbers mean "count backwards"

array([], dtype=int64)

Access every -2nd element from the 10th to the 5th. (correct)

a[10:5:-2] #but you need to start and stop in the same order

array([9, 7])

Challenge 1

There is an arange function and linspace function, that take similar arguments. Explain the difference. For example, what does the following code do?
print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))
Solution

arange takes the arguments start, stop, step, and generates numbers from start to stop (excluding stop) stepping by step each time.

linspace takes the arguments start, stop, number, and generates numbers from start to stop (including stop) with number of steps.
print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))
[1. 4. 7.]
[1. 5. 9.]

Challenge 2

Generate a 10 x 3 array of random numbers (using numpy.random.rand). From each row, find the minimum absolute value. Make use of numpy.abs and numpy.min. The result should be a one-dimensional array.
Solution

The important part of the solution is passing the axis keyword to the min function:
a = numpy.random.rand(30).reshape(10,3)
print("a is ", a)
print()
print("min(a) along each row is ", numpy.min( numpy.abs( a ), axis=0))

Use the `scipy` library for common scientific and numerical methods

scipy contains functions to generate random numbers, calculate Fourier transforms, integrate
Check the scipy website for more help: https://docs.scipy.org/doc/scipy/reference/

Example : integrate y=x^2 from 0 to 10

x = numpy.arange(11)
y = x**2
import scipy.integrate
#by default, trapz assumes the independent variable is a list of integers from 0..N
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333

integral of x^2 from 0 to 10 335.0

Numerical integration can be inprecise with a coarse grid. (this time, incorrectly!)

x = numpy.linspace(0,10,1000) # finer grid
y=x**2
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333.333

integral of x^2 from 0 to 10 33300.01668335002

Passing the x values to trapz allows it to integrate correctly

print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y,x) )#This value should be 10**3/3 = 333.333

integral of x^2 from 0 to 10 333.333500333834

We’ll come back to scipy.optimize later.

Key Points

Use the numpy library to get basic statistics out of tabular data.

Print numpy arrays.

Use mean, sum, std to get summary statistics.

Add numpy arrays together.

Study the scipy website

Use scipy to integrate tabular data.

Reading Tabular Data into arrays

Overview

Teaching: 20 min
Exercises: 0 min

Questions

How can I read tabular data?

How can I save tabular data?

Objectives

Import the numpy library.

Use numpy to load a simple CSV data set.

Get some basic information about a numpy array.

Use the Numpy package to load data using the `loadtxt` command

Numpy provides the function loadtxt to read and parse numeric data from a text file.
The file can be delimited with commas (a ‘comma separated file’), tabs, or other common delimiters
Numerical data can be converted to floating point data or integers
Headers and comments can be ignored during the reading of the file.

import numpy
data = numpy.loadtxt('data/galileo_flat.empty')
print(data)

[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Read a comma separated file of data with headers

If you have a delimiter in your file (a comma, tab, vertical line), specify that with the delimiter keyword.
If you use a comment character consistently, using the comments keyword.
If you have a header you want to skip, use skiprows

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
print(data)

[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Remember your data has the shape `ROWS X COLUMNS`

Your data will be shaped with the rows first.
You can change the order with transpose

print("data shape is ",data.shape)

data shape is  (5, 2)

Split the data into variables using `unpack`

You can split data using the unpack keyword

D,H = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',unpack=True)
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)

[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)

You can split data after loading too

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
D,H = data.T
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)

[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)

Save data with `numpy.savetxt`

Saving text data is made possible with the savetxt command. It mirrors the loadtxt command

numpy.savetxt("data/mydata.txt", data, delimiter=',')

500000000000000000e+03,1.000000000000000000e+03
340000000000000000e+03,8.280000000000000000e+02
328000000000000000e+03,8.000000000000000000e+02
172000000000000000e+03,6.000000000000000000e+02
000000000000000000e+02,3.000000000000000000e+02

Control the data format with the `fmt` keyword

The default format for the data is floating point data with 16 digits
You can change the format with the fmt keyword

numpy.savetxt("data/mydata2.txt", data, delimiter=',', fmt='%.6g')

Add a header string with `header`

Add header text to the file with the header keyword.
Include column titles in the header keyword.

header="Distance (D), Header(H)"
newdata = numpy.vstack([D,H]).T
numpy.savetxt("data/mydata3.txt", newdata, delimiter=', ', header=header, fmt='%.6g')

# Distance (D), Header(H)
1500, 1000
1340, 828
1328, 800
1172, 600
800, 300

More complex loadtxt commands can make your data more flexible

Using the dtype keyword allows fine control over the types of data you read.
Using dtype allows you to ‘name’ your data columns and reference them with the name.

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',\
                    dtype={'names':("Distance","Height"), 'formats':('f4','f4')})
print("data shape is ", data.shape)
print("Distance data is ", data["Distance"])

data shape is  (5,)
Distance data is  [1500. 1340. 1328. 1172.  800.]

Key Points

Use numpy.loadtxt library to load tabular data.

Use numpy.savetxt library to save tabular data.

Use delimiters to make your text file cleaner.

Use comments in your file to describe the contents.

Plotting

Overview

Teaching: 25 min
Exercises: 15 min

Questions

How can I plot my data?

How can I save my plot for publishing?

Objectives

Create a time series plot showing a single data set.

Create a scatter plot showing relationship between two data sets.

`matplotlib` is the most widely used scientific plotting library in Python

Usually using a sub-library called matplotlib.pyplot. imported using the alias plt
The Jupyter Notebook will render plots inline if we ask it to using a “magic” command.

%matplotlib inline
import matplotlib.pyplot as plt

Simple plots are then (fairly) simple to create.

import numpy
time = numpy.array([0,1,2,3])
position = numpy.array([0,100,200,300])

plt.plot(time, position)
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")

Text(0, 0.5, 'Position (km)')

png

The color and format of lines and markers can be changed.

A shortcut for simple formatting is to use the third argument string.
‘b-‘ means blue line, ‘ro’ means red circles, ‘g+-‘ means green + with a line

import numpy
time = numpy.arange(10)
p1 = time
p2 = time*2
p3 = time*4

plt.plot(time, p1,'b-')
plt.plot(time, p2,'ro')
plt.plot(time, p3,'g+-')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")

Text(0, 0.5, 'Position (km)')

png

More complex formatting can be achieved using the `plot` keywords

linewidth controls the thickness of the line
linestyle controls the type of line
marker controls the shape of the marker
color controls the color of the line and marker
label controls the labelling of the line for use with plt.legend

plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()

<matplotlib.legend.Legend at 0x7fe9b88472b0>

png

Built in “styles” provide consistent plots

print("available style names: ", plt.style.available)

available style names:  ['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']

plt.style.use("ggplot")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()

<matplotlib.legend.Legend at 0x7fe9a8405bb0>

png

plt.style.use("fivethirtyeight")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()

<matplotlib.legend.Legend at 0x7fe9a843aac0>

png

plt.style.use("seaborn-whitegrid")
plt.plot(time, p1,linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'o', markersize=10, label="red dots")
plt.plot(time, p3,'-', marker='+') #where's the marker?
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()

<matplotlib.legend.Legend at 0x7fe9780b4070>

png

Plots can be scatter plots with points and no lines

numpy.random.seed(20)
x,y = numpy.random.randint(0,100,100), numpy.random.randn(100)
x=numpy.cumsum(x)
y=numpy.cumsum(y)

plt.scatter( x, y)
plt.scatter( x, 10-y**2, color='green',marker='<')
plt.xlabel("Labels still work")
plt.title("title")

Text(0.5, 1.0, 'title')

png

Plot data with associated uncertainties using `errorbar`

Don’t join the data with a line by setting the linestyle to an empty string.
Set a marker shape using marker.
Use the same color for marker and errorbars.

numpy.random.seed(42)
x = numpy.random.rand(10)*10
x=numpy.cumsum(x)
error  = numpy.random.randn(10)*4
y=x + numpy.random.randn(10)*0.5

plt.errorbar( x, y, yerr=error,color='green',marker='o',ls='',lw=1,label="data")
plt.xlabel("Labels still work")
plt.title("errobar")
plt.legend()

<matplotlib.legend.Legend at 0x7fe9b888f040>

png

plt.errorbar?

`matplotlib` also makes bar charts and histograms

If you have data grouped into counts already, bar can make a chart

x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("Bar chart")

Text(0.5, 1.0, 'Bar chart')

png

If you have raw data, hist can calculate and plot the histogram.

x = numpy.random.randint(0,100,50)
bin_count, bin_edges, boxes = plt.hist(x, bins=10)
print("The counts are ", bin_count)

The counts are  [8. 5. 0. 5. 5. 6. 3. 6. 8. 4.]

png

bin_count, bin_edges, boxes = plt.hist(x, bins=10, rwidth=0.9)
plt.title("cleaner histogram")

Text(0.5, 1.0, 'cleaner histogram')

png

# Compute pie slices
N = bin_count.size
theta = 0.5*(bin_edges[1:] + bin_edges[:-1])
theta = theta * 2*numpy.pi/theta.max()
width = numpy.pi / 4 * numpy.random.rand(N)

ax = plt.subplot(111, projection='polar')
bars = ax.bar(theta, bin_count, width=width, bottom=0.0,alpha=0.5)

# Use custom colors and opacity
for r, bar in zip(bin_count, bars):
    bar.set_facecolor(plt.cm.viridis(r / bin_count.max()))
    bar.set_alpha(0.5)

t=plt.title("Something more exotic")

png

Define the figure size before plotting using the `figure` command

plt.figure pre-defines a figure for you
The keyword figsize takes two values to define the width and height

plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("narrow bar chart")

Text(0.5, 1.0, 'narrow bar chart')

png

Place multiple figures on one plot with `subplot`

plt.subplot takes three arguments : (number_of_rows, number_of_columns, location)

plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(2,2,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(2,2,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(2,2,4)
plt.bar(x,y)
plt.title("sometimes the formatting is awkward")

Text(0.5, 1.0, 'sometimes the formatting is awkward')

png

plt.figure(figsize=(8,3))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(1,3,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(1,3,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(1,3,3)
plt.bar(x,y)
plt.title("less awkward")

Text(0.5, 1.0, 'less awkward')

png

Figures can be saved with `savefig`

After plotting, use plt.savefig to save the figure to a file
The figure size you specified is (approximately) the size in inches.
For PNG/JPG images you can specify the resolution with dpi

plt.figure(figsize=(8,3))
plt.plot(x,y)
plt.savefig("data/fig1.pdf") #PDF format
plt.savefig("data/fig1.png", dpi=150, transparent=True) #PNG format

png

Note that functions in plt refer to a global figure variable and after a figure has been displayed to the screen (e.g. with plt.show) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you call plt.savefig before the plot is displayed to the screen, otherwise you may find a file with an empty plot.

It is also possible to save the figure to file by first getting a reference to the figure with plt.gcf, then calling the savefig class method from that variable.
fig = plt.gcf() # get current figure
data.plot(kind='bar')
fig.savefig('my_figure.png')

Key Points

matplotlib is the most widely used scientific plotting library in Python.

Plot data directly from a Pandas dataframe.

Select and transform data, then plot it.

Many styles of plot are available: see the Python Graph Gallery for more options.

Can plot many sets of data together.

Lists

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How can I store multiple values?

Objectives

Explain why programs need collections of values.

Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.

A list stores many values in a single structure.

Doing calculations with a hundred variables called pressure_001, pressure_002, etc., would be at least as slow as doing them by hand.
Use a list to store many values together.
- Contained within square brackets [...].
- Values separated by commas ,.
Use len to find out how many values are in a list.

pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
print('pressures:', pressures)
print('length:', len(pressures))

pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
length: 5

Use an item’s index to fetch it from a list.

Just like strings.

print('zeroth item of pressures:', pressures[0])
print('fourth item of pressures:', pressures[4])

zeroth item of pressures: 0.273
fourth item of pressures: 0.276

Lists’ values can be replaced by assigning to them.

Use an index expression on the left of assignment to replace a value.

pressures[0] = 0.265
print('pressures is now:', pressures)

pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]

Appending items to a list lengthens it.

Use list_name.append to add items to the end of a list.

primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
primes.append(9)
print('primes has become:', primes)

primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7, 9]

append is a method of lists.
- Like a function, but tied to a particular object.
Use object_name.method_name to call methods.
- Deliberately resembles the way we refer to things in a library.
We will meet other methods of lists as we go along.
- Use help(list) for a preview.
extend is similar to append, but it allows you to combine two lists. For example:

teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)

primes is currently: [2, 3, 5, 7, 9]
primes has now become: [2, 3, 5, 7, 9, 11, 13, 17, 19]
primes has finally become: [2, 3, 5, 7, 9, 11, 13, 17, 19, [37, 41, 43, 47]]

Note that while extend maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional.

Use `del` to remove items from a list entirely.

del list_name[index] removes an item from a list and shortens the list.
Not a function or a method, but a statement in the language.

primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)

primes before removing last item: [2, 3, 5, 7, 9]
primes after removing last item: [2, 3, 5, 7]

The empty list contains no values.

Use [] on its own to represent a list that doesn’t contain any values.
- “The zero of lists.”
Helpful as a starting point for collecting values (which we will see in the next episode).

Lists may contain values of different types.

A single list may contain numbers, strings, and anything else.

goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

Character strings can be indexed like lists.

Get single characters from a character string using indexes in square brackets.

element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])

zeroth character: c
third character: b

Character strings are immutable.

Cannot change the characters in a string after it has been created.
- Immutable: can’t be changed after creation.
- In contrast, lists are mutable: they can be modified in place.
Python considers the string to be a single value with parts, not a collection of values.

element[0] = 'C'

TypeError: 'str' object does not support item assignment

Lists and character strings are both collections.

Indexing beyond the end of the collection is an error.

Python reports an IndexError if we attempt to access a value that doesn’t exist.
- This is a kind of runtime error.
- Cannot be detected as the code is parsed because the index might be calculated based on data.

print('99th element of element is:', element[99])

IndexError: string index out of range

Fill in the Blanks

Fill in the blanks so that the program below produces the output shown.

values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)

first time: [1, 3, 5]
second time: [3, 5]

Solution

values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)

How Large is a Slice?

If ‘low’ and ‘high’ are both non-negative integers, how long is the list values[low:high]?

Solution

The list values[low:high] has high - low elements. For example, values[1:4] has the 3 elements values[1], values[2], and values[3]. Note that the expression will only work if high is less than the total length of the list values.

From Strings to Lists and Back

Given this:
print('string to list:', list('tin'))
print('list to string:', ''.join(['g', 'o', 'l', 'd']))
['t', 'i', 'n']
'gold'
Explain in simple terms what list('some string') does.

What does '-'.join(['x', 'y']) generate?

Solution

list('some string') “splits” a string into a list of its characters.

x-y

Working With the End

What does the following program print?
element = 'helium'
print(element[-1])
How does Python interpret a negative index?

If a list or string has N elements, what is the most negative index that can safely be used with it, and what location does that index represent?

If values is a list, what does del values[-1] do?

How can you display all elements but the last one without changing values? (Hint: you will need to combine slicing and negative indexing.)

Solution

The program prints m.

Python interprets a negative index as starting from the end (as opposed to starting from the beginning). The last element is -1.

The last index that can safely be used with a list of N elements is element -N, which represents the first element.

del values[-1] removes the last element from the list.

values[:-1]

Stepping Through a List

What does the following program print?
element = 'fluorine'
print(element[::2])
print(element[::-1])
If we write a slice as low:high:stride, what does stride do?

What expression would select all of the even-numbered items from a collection?
Solution

The program prints
furn
eniroulf
stride is the step size of the slice

The slice 1::2 selects all even-numbered items from a collection: it starts with element 1 (which is the second element, since indexing starts at 0), goes on until the end (since no end is given), and uses a step size of 2 (i.e., selects every second element).

Slice Bounds

What does the following program print?
element = 'lithium'
print(element[0:20])
print(element[-1:3])
Solution
lithium

Sort and Sorted

What do these two programs print? In simple terms, explain the difference between sorted(letters) and letters.sort().
# Program A
letters = list('gold')
result = sorted(letters)
print('letters is', letters, 'and result is', result)
# Program B
letters = list('gold')
result = letters.sort()
print('letters is', letters, 'and result is', result)
Solution

Program A prints
letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']
Program B prints
letters is ['d', 'g', 'l', 'o'] and result is None
sorted(letters) returns a sorted copy of the list letters (the original list letters remains unchanged), while letters.sort() sorts the list letters in-place and does not return anything.

Copying (or Not)

What do these two programs print? In simple terms, explain the difference between new = old and new = old[:].
# Program A
old = list('gold')
new = old      # simple assignment
new[0] = 'D'
print('new is', new, 'and old is', old)
# Program B
old = list('gold')
new = old[:]   # assigning a slice
new[0] = 'D'
print('new is', new, 'and old is', old)
Solution

Program A prints
new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']
Program B prints
new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']
new = old makes new a reference to the list old; new and old point towards the same object.

new = old[:] however creates a new list object new containing all elements from the list old; new and old are different objects.

Key Points

A list stores many values in a single structure.

Use an item’s index to fetch it from a list.

Lists’ values can be replaced by assigning to them.

Appending items to a list lengthens it.

Use del to remove items from a list entirely.

The empty list contains no values.

Lists may contain values of different types.

Character strings can be indexed like lists.

Character strings are immutable.

Indexing beyond the end of the collection is an error.

For Loops

Overview

Teaching: 10 min
Exercises: 15 min

Questions

How can I make a program do many things?

Objectives

Explain what for loops are normally used for.

Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.

Write for loops that use the Accumulator pattern to aggregate values.

A for loop executes commands once for each value in a collection.

Doing calculations on the values in a list one by one is as painful as working with pressure_001, pressure_002, etc.
A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection.
“for each thing in this group, do these operations”

for number in [2, 3, 5]:
    print(number)

This for loop is equivalent to:

print(2)
print(3)
print(5)

And the for loop’s output is:

2
3
5

The first line of the `for` loop must end with a colon, and the body must be indented.

The colon at the end of the first line signals the start of a block of statements.
Python uses indentation rather than {} or begin/end to show nesting.
- Any consistent indentation is legal, but almost everyone uses four spaces.

for number in [2, 3, 5]:
print(number)

IndentationError: expected an indented block

Indentation is always meaningful in Python.

firstName="Jon"
  lastName="Smith"

  File "<ipython-input-7-f65f2962bf9c>", line 2
    lastName="Smith"
    ^
IndentationError: unexpected indent

This error can be fixed by removing the extra spaces at the beginning of the second line.

A `for` loop is made up of a collection, a loop variable, and a body.

for number in [2, 3, 5]:
    print(number)

The collection, [2, 3, 5], is what the loop is being run on.
The body, print(number), specifies what to do for each value in the collection.
The loop variable, number, is what changes for each iteration of the loop.
- The “current thing”.

Loop variables can be called anything.

As with all variables, loop variables are:
- Created on demand.
- Meaningless: their names can be anything at all.

for kitten in [2, 3, 5]:
    print(kitten)

The body of a loop can contain many statements.

But no loop should be more than a few lines long.
Hard for human beings to keep larger chunks of code in mind.

primes = [2, 3, 5]
for p in primes:
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)

4 8
9 27
25 125

Use `range` to iterate over a sequence of numbers.

The built-in function range produces a sequence of numbers.
- Not a list: the numbers are produced on demand to make looping over large ranges more efficient.
range(N) is the numbers 0..N-1
- Exactly the legal indices of a list or character string of length N

print('a range is not a list: range(0, 3)')
for number in range(0,3):
    print(number)

a range is not a list: range(0, 3)
0
1
2

The Accumulator pattern turns many values into one.

A common pattern in programs is to:
1. Initialize an accumulator variable to zero, the empty string, or the empty list.
2. Update the variable with values from a collection.

# Sum the first 10 integers.
total = 0
for number in range(10):
   total = total + (number + 1)
print(total)

Read total = total + (number + 1) as:
- Add 1 to the current value of the loop variable number.
- Add that to the current value of the accumulator variable total.
- Assign that to total, replacing the current value.
We have to add number + 1 because range produces 0..9, not 1..10.

Classifying Errors

Is an indentation error a syntax error or a runtime error?

Solution

An IndentationError is a syntax error. Programs with syntax errors cannot be started. A program with a runtime error will start but an error will be thrown under certain conditions.

Tracing Execution

Create a table showing the numbers of the lines that are executed when this program runs, and the values of the variables after each line is executed.
total = 0
for char in "tin":
    total = total + 1
Solution

Line no Variables

1 total = 0

2 total = 0 char = ‘t’

3 total = 1 char = ‘t’

2 total = 1 char = ‘i’

3 total = 2 char = ‘i’

2 total = 2 char = ‘n’

3 total = 3 char = ‘n’

Line no	Variables
1	total = 0
2	total = 0 char = ‘t’
3	total = 1 char = ‘t’
2	total = 1 char = ‘i’
3	total = 2 char = ‘i’
2	total = 2 char = ‘n’
3	total = 3 char = ‘n’

Reversing a String

Fill in the blanks in the program below so that it prints “nit” (the reverse of the original character string “tin”).
original = "tin"
result = ____
for char in original:
    result = ____
print(result)
Solution
original = "tin"
result = ""
for char in original:
    result = char + result
print(result)

Practice Accumulating

Fill in the blanks in each of the programs below to produce the indicated result.

# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

Solution

total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)

# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)

Solution

lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)

# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)

Solution

words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

# Create acronym: ["red", "green", "blue"] => "RGB"
# write the whole thing

Solution

acronym = ""
for word in ["red", "green", "blue"]:
    acronym = acronym + word[0].upper()
print(acronym)

Cumulative Sum

Reorder and properly indent the lines of code below so that they print an array with the cumulative sum of data. The result should be [1, 3, 5, 10].
cumulative += [sum]
for number in data:
cumulative = []
sum += number
sum = 0
print(cumulative)
data = [1,2,2,5]
Solution
sum = 0
data = [1,2,2,5]
cumulative = []
for number in data:
    sum += number
    cumulative.append(sum)
print(cumulative)

Identifying Variable Name Errors

Read the code below and try to identify what the errors are without running it.

Run the code and read the error message. What type of NameError do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?

Fix the error.

Repeat steps 2 and 3, until you have fixed all the errors.
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)
Solution
message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

Identifying Item Errors

Read the code below and try to identify what the errors are without running it.

Run the code, and read the error message. What type of error is it?

Fix the error.
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])
Solution
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[3])

Key Points

A for loop executes commands once for each value in a collection.

The first line of the for loop must end with a colon, and the body must be indented.

Indentation is always meaningful in Python.

A for loop is made up of a collection, a loop variable, and a body.

Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).

The body of a loop can contain many statements.

Use range to iterate over a sequence of numbers.

The Accumulator pattern turns many values into one.

Looping Over Data Sets

Overview

Teaching: 5 min
Exercises: 0 min

Questions

How can I process many data sets with a single command?

Objectives

Be able to read and write globbing expressions that match sets of files.

Use glob to create lists of files.

Write for loops to perform operations on files given their names in a list.

Use a `for` loop to process files given a list of their names.

A filename is just a character string.
And lists can contain character strings.

import numpy
for filename in ["data/galileo_flat.csv","data/galileo_ramp.csv"]:
    distance, height = numpy.loadtxt(filename, skiprows=2,\
                                     comments="#", delimiter=',', unpack=True)
    print(filename, distance.min(), height.max())

    data/galileo_flat.csv 800.0 1000.0
    data/galileo_ramp.csv 253.0 1000.0

Use `glob.glob` to find sets of files whose names match a pattern.

In Unix, the term “globbing” means “matching a set of files with a pattern”.
The most common patterns are:
- * meaning “match zero or more characters”
- ? meaning “match exactly one character”
Python contains the glob library to provide pattern matching functionality
The glob library contains a function also called glob to match file patterns
E.g., glob.glob('*.txt') matches all files in the current directory whose names end with .txt.
Result is a (possibly empty) list of character strings.

import glob
print("all csv files in data/random directory:", glob.glob("data/random/*.csv"))

    all csv files in data/random directory: ['data/random/data.029.csv', 'data/random/data.015.csv', 'data/random/data.001.csv', 'data/random/data.000.csv', 'data/random/data.014.csv', 'data/random/data.028.csv', 'data/random/data.002.csv', '...','data/random/data.035.csv', 'data/random/data.009.csv', 'data/random/data.023.csv', 'data/random/data.037.csv', 'data/random/data.036.csv', 'data/random/data.022.csv', 'data/random/data.026.csv', 'data/random/data.032.csv', 'data/random/data.033.csv', 'data/random/data.027.csv', 'data/random/data.031.csv', 'data/random/data.025.csv', 'data/random/data.019.csv', 'data/random/data.018.csv', 'data/random/data.024.csv', 'data/random/data.030.csv']

print("all txt files in data/random directory:", glob.glob("data/random/*.txt"))

    all txt files in data/random directory: []

Use `glob` and `for` to process batches of files.

Helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.

for filename in sorted(glob.glob('data/random/*.csv')):
    distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True)
    print(filename, distance.mean(), height.std())

    data/random/data.000.csv 0.973455156 14.253108991004671
    data/random/data.001.csv -4.4384872 18.271604877007015
    data/random/data.002.csv -2.28566216 13.972753882460598
    ...
    data/random/data.096.csv 0.724618214 16.429186710317516
    data/random/data.097.csv 0.48894924 15.292681284065516
    data/random/data.098.csv 1.84267224 11.33741881916356
    data/random/data.099.csv -3.772237556 14.155571992376832

Determining Matches

Which of these files is not matched by the expression glob.glob('data/*as*.csv')?

data/gapminder_gdp_africa.csv

data/gapminder_gdp_americas.csv

data/gapminder_gdp_asia.csv

Solution

1 is not matched by the glob.

Averaging over datasets

Write a program that calculate the average value from all of the data in the files, instead of individual files
Solution
import glob
import numpy as np
data = []
for filename in sorted(glob.glob('data/random/*.csv')):
    distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True)
    data.append(distance)
print(np.mean(data))

Key Points

Use a for loop to process files given a list of their names.

Use glob.glob to find sets of files whose names match a pattern.

Use glob and for to process batches of files.

Writing Functions

Overview

Teaching: 10 min
Exercises: 15 min

Questions

How can I create my own functions?

Objectives

Explain and identify the difference between function definition and function call.

Write a function that takes a small, fixed number of arguments and produces a single result.

Break programs down into functions to make them easier to understand.

Human beings can only keep a few items in working memory at a time.
Understand larger/more complicated ideas by understanding and combining pieces.
- Components in a machine.
- Lemmas when proving theorems.
Functions serve the same purpose in programs.
- Encapsulate complexity so that we can treat it as a single “thing”.
Also enables re-use.
- Write one time, use many times.

Define a function using `def` with a name, parameters, and a block of code.

Begin the definition of a new function with def.
Followed by the name of the function.
- Must obey the same rules as variable names.
Then parameters in parentheses.
- Empty parentheses if the function doesn’t take any inputs.
- We will discuss this in detail in a moment.
Then a colon.
Then an indented block of code.

def print_greeting():
    print('Hello!')

Defining a function does not run it.

Defining a function does not run it.
- Like assigning a value to a variable.
Must call the function to execute the code it contains.

print_greeting()

Hello!

Arguments in call are matched to parameters in definition.

Functions are most useful when they can operate on different data.
Specify parameters when defining a function.
- These become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
- If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.

def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

print_date(1871, 3, 19)

1871/3/19

Or, we can name the arguments when we call the function, which allows us to specify them in any order:

print_date(month=3, day=19, year=1871)

1871/3/19

Functions may return a result to their caller using `return`.

Use return ... to give a value back to the caller.
May occur anywhere in the function.
But functions are easier to understand if return occurs:
- At the start to handle special cases.
- At the very end, with a final result.

def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)

a = average([1, 3, 4])
print('average of actual values:', a)

2.6666666666666665

print('average of empty list:', average([]))

None

Remember: every function returns something.
A function that doesn’t explicitly return a value automatically returns None.

result = print_date(1871, 3, 19)
print('result of call is:', result)

1871/3/19
result of call is: None

Identifying Syntax Errors

Read the code below and try to identify what the errors are without running it.

Run the code and read the error message. Is it a SyntaxError or an IndentationError?

Fix the error.

Repeat steps 2 and 3 until you have fixed all the errors.
def another_function
  print("Syntax errors are annoying.")
   print("But at least python tells us about them!")
  print("So they are usually not too hard to fix.")
Solution
def another_function():
  print("Syntax errors are annoying.")
  print("But at least Python tells us about them!")
  print("So they are usually not too hard to fix.")

Definition and Use

What does the following program print?
def report(pressure):
    print('pressure is', pressure)

print('calling', report, 22.5)
Solution
calling <function report at 0x7fd128ff1bf8> 22.5
A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows
print("calling")
report(22.5)

Order of Operations

The example above:
result = print_date(1871, 3, 19)
print('result of call is:', result)
printed:
1871/3/19
result of call is: None
Explain why the two lines of output appeared in the order they did.

What’s wrong in this example?
result = print_date(1871,3,19)

def print_date(year, month, day):
   joined = str(year) + '/' + str(month) + '/' + str(day)
   print(joined)
Solution

The first line of output (1871/3/19) is from the print function inside print_date(), while the second line is from the print function below the function call. All of the code inside print_date() is executed first, and the program then “leaves” the function and executes the rest of the code.

The problem with the example is that the function is defined after the call to the function is made. Python therefore doesn’t understand the function call.

Encapsulation

Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.
import pandas

def min_in_data(____):
    data = ____
    return ____
Solution
import pandas

def min_in_data(filename):
    data = pandas.read_csv(filename)
    return data.min()

Find the First

Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty?
def first_negative(values):
    for v in ____:
        if ____:
            return ____
Solution
def first_negative(values):
    for v in values:
        if v<0:
            return v
If an empty list is passed to this function, it returns None:
my_list = []
print(first_negative(my_list)
None

Calling by Name

Earlier we saw this function:
def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)
We saw that we can call the function using named arguments, like this:
print_date(day=1, month=2, year=2003)
What does print_date(day=1, month=2, year=2003) print?

When have you seen a function call like this before?

When and why is it useful to call functions this way?

Solution

2003/2/1

We saw examples of using named arguments when working with the pandas library. For example, when reading in a dataset using data = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), the last argument index_col is a named argument.

Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.

Encapsulate of If/Print Block

The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.

Please re-write the code so that the if-block is folded into a function.
 import random
 for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass=70+20.0*(2.0*random.random()-1.0)

    print(mass)
   
    #egg sizing machinery prints a label
    if(mass>=85):
       print("jumbo")
    elif(mass>=70):
       print("large")
    elif(mass<70 and mass>=55):
       print("medium")
    else:
       print("small")
The simplified program follows. What function definition will make it functional?
 # revised version
 import random
 for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass=70+20.0*(2.0*random.random()-1.0)

    print(mass,print_egg_label(mass))    
Create a function definition for print_egg_label() that will work with the revised program above. Note, the function’s return value will be significant. Sample output might be 71.23 large.

A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that’s less than 50 grams. Modify your print_egg_label() function to account for these error conditions. Sample output could be 25 too light, probably spoiled.
Solution
def print_egg_label(mass):
    #egg sizing machinery prints a label
    if(mass>=90):
        return("warning: egg might be dirty")
    elif(mass>=85):
        return("jumbo")
    elif(mass>=70):
        return("large")
    elif(mass<70 and mass>=55):
        return("medium")
    elif(mass<50):
        return("too light, probably spoiled")
    else:
        return("small")

Key Points

Break programs down into functions to make them easier to understand.

Define a function using def with a name, parameters, and a block of code.

Defining a function does not run it.

Arguments in call are matched to parameters in definition.

Functions may return a result to their caller using return.

Variable Scope

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How do function calls actually work?

How can I determine where errors occurred?

Objectives

Identify local and global variables.

Identify parameters as local variables.

Read a traceback and determine the file, function, and line number on which the error occurred, the type of error, and the error message.

The scope of a variable is the part of a program that can ‘see’ that variable.

There are only so many sensible names for variables.
People using functions shouldn’t have to worry about what variable names the author of the function used.
People writing functions shouldn’t have to worry about what variable names the function’s caller uses.
The part of a program in which a variable is visible is called its scope.

pressure = 103.9

def adjust(t):
    temperature = t * 1.43 / pressure
    return temperature

pressure is a global variable.
- Defined outside any particular function.
- Visible everywhere.
t and temperature are local variables in adjust.
- Defined in the function.
- Not visible in the main program.
- Remember: a function parameter is a variable that is automatically assigned a value when the function is called.

print('adjusted:', adjust(0.9))
print('temperature after call:', temperature)

adjusted: 0.01238691049085659

Traceback (most recent call last):
  File "/Users/swcarpentry/foo.py", line 8, in <module>
    print('temperature after call:', temperature)
NameError: name 'temperature' is not defined

Local and Global Variable Use

Trace the values of all variables in this program as it is executed. (Use ‘—’ as the value of variables before and after they exist.)
limit = 100

def clip(value):
    return min(max(0.0, value), limit)

value = -22.5
print(clip(value))

Reading Error Messages

Read the traceback below, and identify the following:

How many levels does the traceback have?

What is the file name where the error occurred?

What is the function name where the error occurred?

On which line number in this function did the error occurr?

What is the type of error?

What is the error message?
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-e4c4cbafeeb5> in <module>()
      1 import errors_02
----> 2 errors_02.print_friday_message()

/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
     13
     14 def print_friday_message():
---> 15     print_message("Friday")

/Users/ghopper/thesis/code/errors_02.py in print_message(day)
      9         "sunday": "Aw, the weekend is almost over."
     10     }
---> 11     print(messages[day])
     12
     13

KeyError: 'Friday'

Key Points

The scope of a variable is the part of a program that can ‘see’ that variable.

Conditionals

Overview

Teaching: 10 min
Exercises: 15 min

Questions

How can programs do different things for different data?

Objectives

Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators).

Trace the execution of unnested conditionals and conditionals inside loops.

Use `if` statements to control whether or not a block of code is executed.

An if statement (more properly called a conditional statement) controls whether some block of code is executed or not.
Structure is similar to a for statement:
- First line opens with if and ends with a colon
- Body containing one or more statements is indented (usually by 4 spaces)

mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

mass = 2.07
if mass > 3.0:
    print (mass, 'is large')

3.54 is large

Conditionals are often used inside loops.

Not much point using a conditional when we know the value (as above).
But useful when we have a collection to process.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')

3.54 is large
9.22 is large

Use `else` to execute a block of code when an `if` condition is not true.

else can be used following an if.
Allows us to specify an alternative to execute when the if branch isn’t taken.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

54 is large
07 is small
22 is large
86 is small
71 is small

Use `elif` to specify additional tests.

May want to provide several alternative choices, each with its own test.
Use elif (short for “else if”) and a condition to specify these.
Always associated with an if.
Must come before the else (which is the “catch all”).

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 9.0:
        print(m, 'is HUGE')
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

54 is large
07 is small
22 is HUGE
86 is small
71 is small

Conditions are tested once, in order.

Python steps through the branches of the conditional in order, testing each in turn.
So ordering matters.

grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')

grade is C

Does not automatically go back and re-evaluate if values change.

velocity = 10.0
if velocity > 20.0:
    print('moving too fast')
else:
    print('adjusting velocity')
    velocity = 50.0

adjusting velocity

Often use conditionals in a loop to “evolve” the values of variables.

velocity = 10.0
for i in range(5): # execute the loop 5 times
    print(i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)

0 : 10.0
moving too slow
1 : 20.0
moving too slow
2 : 30.0
moving too fast
3 : 25.0
moving too fast
4 : 20.0
moving too slow
final velocity: 30.0

Create a table showing variables’ values to trace a program’s execution.

i	0	.	1	.	2	.	3	.	4	.
velocity	10.0	20.0	.	30.0	.	25.0	.	20.0	.	30.0

The program must have a print statement outside the body of the loop to show the final value of velocity, since its value is updated by the last iteration of the loop.

Compound Relations Using and, or, and Parentheses

Often, you want some combination of things to be true. You can combine relations within a conditional using and and or. Continuing the example above, suppose you have
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
velocity = [10.00, 20.00, 30.00, 25.00, 20.00]

i = 0
for i in range(5):
    if mass[i] > 5 and velocity[i] > 20:
        print("Fast heavy object.  Duck!")
    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
        print("Normal traffic")
    elif mass[i] <= 2 and velocity[i] <= 20:
        print("Slow light object.  Ignore it")
    else:
        print("Whoa!  Something is up with the data.  Check it")
Just like with arithmetic, you can and should use parentheses whenever there is possible ambiguity. A good general rule is to always use parentheses when mixing and and or in the same condition. That is, instead of:
if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:
write one of these:
if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):
so it is perfectly clear to a reader (and to Python) what you really mean.

Tracing Execution

What does this program print?

pressure = 71.9
if pressure > 50.0:
    pressure = 25.0
elif pressure <= 50.0:
    pressure = 0.0
print(pressure)

Solution
25.0

Trimming Values

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
    if ____:
        result.append(0)
    else:
        ____
print(result)

[0, 1, 1, 1, 0, 1]

Solution

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value<0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

Initializing

Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

values = [...some test data...]
smallest, largest = None, None
for v in values:
    if ____:
        smallest, largest = v, v
    ____:
        smallest = min(____, v)
        largest = max(____, v)
print(smallest, largest)

What are the advantages and disadvantages of using this method to find the range of the data?

Solution

values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest==None and largest==None:
        smallest, largest = v, v
    else:
        smallest = min(smallest, v)
        largest = max(largest, v)
print(smallest, largest)

Key Points

Use if statements to control whether or not a block of code is executed.

Conditionals are often used inside loops.

Use else to execute a block of code when an if condition is not true.

Use elif to specify additional tests.

Conditions are tested once, in order.

Create a table showing variables’ values to trace a program’s execution.

Programming Style

Overview

Teaching: 10 min
Exercises: 5 min

Questions

How can I make my programs more readable?

How do most programmers format their code?

How can programs check their own operation?

Objectives

Provide sound justifications for basic rules of coding style.

Refactor one-page programs to make them more readable and justify the changes.

Use Python community coding standards (PEP-8).

Coding style

Coding style helps us to understand the code better. It helps to maintain and change the code. Python relies strongly on coding style, as we may notice by the indentation we apply to lines to define different blocks of code. Python proposes a standard style through one of its first Python Enhancement Proposals (PEP), PEP8, and highlight the importance of readability in the Zen of Python.

We may highlight some points:

document your code
use clear, meaningful variable names
use white-space, not tabs, to indent lines

Follow standard Python style in your code.

PEP8: a style guide for Python that discusses topics such as how you should name variables, how you should use indentation in your code, how you should structure your import statements, etc. Adhering to PEP8 makes it easier for other Python developers to read and understand your code, and to understand what their contributions should look like. The PEP8 application and Python library can check your code for compliance with PEP8.
Google style guide on Python supports the use of PEP8 and extend the coding style to more specific structure of a Python code, which may be interesting also to follow.

Use assertions to check for internal errors.

Assertions are a simple, but powerful method for making sure that the context in which your code is executing is as you expect.

def calc_bulk_density(mass, volume):
    '''Return dry bulk density = powder mass / powder volume.'''
    assert volume > 0
    return mass / volume

If the assertion is False, the Python interpreter raises an AssertionError runtime exception. The source code for the expression that failed will be displayed as part of the error message. To ignore assertions in your code run the interpreter with the ‘-O’ (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment.

Use docstrings to provide online help.

If the first thing in a function is a character string that is not assigned to a variable, Python attaches it to the function as the online help.
Called a docstring (short for “documentation string”).

def average(values):
    "Return average of values, or None if no values are supplied."

    if len(values) == 0:
        return None
    return sum(values) / average(values)

help(average)

Help on function average in module __main__:

average(values)
    Return average of values, or None if no values are supplied.

Multiline Strings

Often use multiline strings for documentation. These start and end with three quote characters (either single or double) and end with three matching characters.
"""This string spans
multiple lines.

Blank lines are allowed."""

What Will Be Shown?

Highlight the lines in the code below that will be available as online help. Are there lines that should be made available, but won’t be? Will any lines produce a syntax error or a runtime error?

"Find maximum edit distance between multiple sequences."
# This finds the maximum distance between all sequences.

def overall_max(sequences):
    '''Determine overall maximum edit distance.'''

    highest = 0
    for left in sequences:
        for right in sequences:
            '''Avoid checking sequence against itself.'''
            if left != right:
                this = edit_distance(left, right)
                highest = max(highest, this)

    # Report.
    return highest

Document This

Turn the comment on the following function into a docstring and check that help displays it properly.

def middle(a, b, c):
    # Return the middle value of three.
    # Assumes the values can actually be compared.
    values = [a, b, c]
    values.sort()
    return values[1]

Solution

def middle(a, b, c):
    '''Return the middle value of three.
    Assumes the values can actually be compared.'''
    values = [a, b, c]
    values.sort()
    return values[1]

Clean Up This Code

Read this short program and try to predict what it does.
Run it: how accurate was your prediction?
Refactor the program to make it more readable. Remember to run it after each change to ensure its behavior hasn’t changed.
Compare your rewrite with your neighbor’s. What did you do the same? What did you do differently, and why?

n = 10
s = 'et cetera'
print(s)
i = 0
while i < n:
    # print('at', j)
    new = ''
    for j in range(len(s)):
        left = j-1
        right = (j+1)%len(s)
        if s[left]==s[right]: new += '-'
        else: new += '*'
    s=''.join(new)
    print(s)
    i += 1

Solution

Here’s one solution.

def string_machine(input_string, iterations):
    """
    Takes input_string and generates a new string with -'s and *'s
    corresponding to characters that have identical adjacent characters
    or not, respectively.  Iterates through this procedure with the resultant
    strings for the supplied number of iterations.
    """
    print(input_string)
    old = input_string
    for i in range(iterations):
        new = ''
        # iterate through characters in previous string
        for j in range(len(input_string)):
            left = j-1
            right = (j+1)%len(input_string) # ensure right index wraps around
            if old[left]==old[right]:
                new += '-'
            else:
                new += '*'
        print(new)
        # store new string as old
        old = new

string_machine('et cetera', 10)

et cetera
*****-***
----*-*--
---*---*-
--*-*-*-*
**-------
***-----*
--**---**
*****-***
----*-*--
---*---*-

Key Points

Follow standard Python style in your code.

Use docstrings to provide online help.

Fitting data to models

Overview

Teaching: 50 min
Exercises: 10 min

Questions

How do I fit my data to a scientific model.

Objectives

Import the scipy.optimize library.

Understand the curvefit function.

Print the results from curvefit.

Plot the data from curvefit.

Data analysis with Python

Many physical systems can be modeled as an equation, which in Python would be represented by a function $f$. If an appropriate function $f$ can be found for an experiment we can use the equation to determine physical parameters releted to the experiment, and we can use this new model to predict new things about the world. Galileo used this method to calculate the trajectory of canonballs by rolling them down inclined ramps.

In experimental physics, we constrain these models by designing an experiment with two quantities. The first quantity, that we can control, is the independent variable. The second quantity, that we can measure, is the dependent variable. The relationship between these two quantities can then be used to determine some physical parameters.

A simple example of measuring the path of moving object. We could guess that the model is moving at a constant speed and design an experiment to find that speed using the model:

[s = ut]

Scipy provides functions that can fit model functions to data.

Scipy provides a number of functions that, given a suitable model function, can return the best estimate of the unknown parameters in the model.

Consider the experiment where the time of flight of an object moving at constant speed is measured. If the experiment is correctly setup. The unknown variable we are trying to determine is the speed $u$. The remaining variables are time $t$ and height $s$. We can design two different experiments, one where we control time (measuring at a fixed interval) and measure distance, or one where we control distance and measure time.

In Python the model function might be written as:

def distance(time, speed):
    """Calculate the distance travelled at a constant speed for a known time."""
    return speed * time

def model_function(independent_data, parameter1, parameter2):
    """A template for the model function."""
    dependent_data = parameter1 * independent_data + parameter2
    return dependent_data

#control time, measure distance
import numpy
# derr is my estimate of errors measuring distance, my ruler is bad.
derr = 5 # metres
measured_times =numpy.arange(10,100,10) #time in seconds
measured_distances = numpy.array([ 108.2,  220.4,  360.2,  482.8,
        630.6,  793.9,  947.5, 1125.0, 1314.9]) # distance in metres
distance_errors = numpy.ones_like(measured_distances)*derr

For such a simple model, the average speed can be calculated from the data quite easily.

speeds = numpy.diff(measured_distances) / numpy.diff(measured_times)
average_speed = numpy.average(speeds)
print("Average speed is {:.04g} m/s".format(average_speed))
mean_times_error = numpy.std(speeds, ddof=1)/numpy.sqrt(speeds.size)
mean_times_std = numpy.sqrt( numpy.mean( derr**2 * numpy.ones(speeds.size)) )

#error propagation, sum in quadrature
speed_error = numpy.sqrt( numpy.mean( (distance_errors / measured_distances)**2) )* average_speed
print("Standard error in average speed is {:.03g} m/s".format(mean_times_error))
print("Error in average speed is {:.03g} m/s".format(speed_error))

Average speed is 15.08 m/s
Standard error in average speed is 0.928 m/s
Error in average speed is 0.281 m/s

# Copied here to make it easier to find!
def distance(time, speed):
    """Calculate the distance travelled at a constant speed for a known time."""
    return speed * time

You can also use scipy.optimize.curve_fit to perform this calculation.

from scipy.optimize import curve_fit

popt, pcov = curve_fit(distance, measured_times, measured_distances)

print("Speed is %4g m/s" % popt[0])

pvar = numpy.diag(pcov)
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))

    Speed is 13.6645 m/s
    Error in fitted speed is 0.31 m/s

What is `popt`, `pvar`?

popt is a one dimensional array of the best estimates for the parameter values, each entry matches the order in the function definition
pcov is the covariance matrix showing the uncertainty and interdependence of each parameter in popt. We take the diagonal elements as pvar for the variance of each parameter in popt.
The above error didn’t consider the errors in the individual data points correctly.
Give curve_fit the error values using the sigma keyword, and always use absolute_sigma=True

Exercise 1

Predict the value of distance at after 10 seconds and 100s.

Calculate predictions using the model function

curve_fit needs a model function to make predictions.
Any calculations using that model should also use the function to avoid errors
e.g. when plotting the predictions you should call the model_function, and not rewrite the equation

# Always predict with the model function!
d10 = distance(10, popt[0])
d100 = distance(100, popt[0])
print("After 10 seconds, predicted distance = {:.4g}m".format(d10))
print("After 100 seconds, predicted distance = {:.4g}m".format(d100))

#dont_do_this
rewrite10 = popt[0] * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(rewrite10))

#or this
hardcoded10 = 13.64 * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(hardcoded10))

    After 10 seconds, predicted distance = 136.6m
    After 100 seconds, predicted distance = 1366m
    After 10 seconds, predicted distance = 136.6m
    After 10 seconds, predicted distance = 136.4m

popt, pcov = curve_fit(distance, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)
pvar = numpy.diag(pcov)

print("Average speed is {:.04g} m/s".format(popt[0]))
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))

    Average speed is 13.66 m/s
    Error in fitted speed is 0.0296 m/s

With the correct error estimates, the model is more certain about the speed, but the eastimate of the average speed didn’t change.

The model function needs to follow the `curve_fit` rules

The function must take and array of independent data as its first argument
The function can take any number of additional parameters that will be found using curve_fit
The function must return a single prediction of the dependent data for each value in the independent data.

def good_model_function(xdata, parameter_1, parameter_2, parameter_3):
    # code_that_calculates_a_model
    return prediction

`curve_fit` works with multiple parameters

Extending the above experiment, what if the object was actually accelerating? The model function is now

[s = ut + \frac{1}{2} at^2]

where $a$ is the acceleration. We can change the model function and run the curve_fit code again

def distance_with_acceleration(time, speed, acceleration):
    """Calculate the distance travelled with at a constant speed for a known time
    and constant acceleration."""
    return speed * time + 0.5 * acceleration * time**2

from scipy.optimize import curve_fit
popt2, pcov2 = curve_fit(distance_with_acceleration, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)
print("Initial speed is {:.04g} m/s".format(popt2[0]))
print("Error in fitted initial speed is {:.03g} m/s".format(numpy.sqrt(pcov2[0,0])))

print("Acceleration is {:.04g} m/s2".format(popt2[1]))
print("Error in fitted acceleration is {:.03g} m/s2".format(numpy.sqrt(pcov2[1,1])))

    Initial speed is 10.26 m/s
    Error in fitted initial speed is 0.119 m/s
    Acceleration is 0.09589 m/s2
    Error in fitted acceleration is 0.00325 m/s2

The data use here is fake, generated with an initial speed of 10.86 m/s and an acceleration of 0.1$m/s^2$. The model with constant speed predicted a higher speed to compensate for the acceleration!

Exercise 1

How could we have quickly checked whether our model was good?

A plot would have quickly showed the linear model is not correct, or printing each value predicted might tell us something too for small amounts of data.

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.errorbar(measured_times, measured_distances,yerr=distance_errors, marker='o', linestyle='none', label="measured data")
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds)),label='simple average')
plt.plot(measured_times, distance(measured_times, popt[0]),label='$s=ut$')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1]),label=r'$s=ut+\frac{1}{2}at^2$')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")

png

Always plot your data and model fits.

Plotting residuals

Once you have a model prediction, you can check for problems in the model using a residual plot. Plot the difference between the model prediction and the measured data.

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds))-measured_distances,label='simple average',marker='o',ls='')
plt.plot(measured_times, distance(measured_times, popt[0])-measured_distances,label='$s=ut$',marker='s',ls='')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1])-measured_distances,label=r'$s=ut+\frac{1}{2}at^2$',marker='<',ls='')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")

png

In this example you can see that the linear model (simple average or s=ut) deviates from the data in a way that depends on the independent variable (time). If the model fit is good we expect to see differences betwee model and data that are random in magnitude and location (as in the quadratic fit).

`curve_fit` find the best estimate of the parameters using by minimizing chi squared.

Curve fit works by finding the combination of parameters that gives the lowest value of a parameter $\chi^2$, defined as

[\chi^2 = \sum\frac{(y_i - f(x_i))^2}{\sigma_{y_i}^2}]

The lower the value of $\chi^2$, the closer the model is on average to each measured data point.
This metric penalizes outliers disproportionally because of the square factor
The metric weights the penalty of each point by the inverse of the standard deviation, penalizing (genuinely) noisier outliers less than less noisy outliers.

Reduced chi squared is easier to understand and compare between data sets.

The value of $\chi^2$ for a good model depends on the number of data points and model parameters.
A related variable $\chi_r^2 = \frac{\chi^2}{\mathrm{dof}}$ is defined such that a the ideal value is 1.0.
To get the metric we need the number of degrees of freedom (dof) defined as the number of data points (N) minus the number of unknown parameters (m) $\mathrm{dof} = N - m$.
High values of $\chi_r^2$ are bad and suggest the model does a poor job of fitting the data.
Low values («1) are also bad, suggesting the model fits the data too well.
A low value suggests the model is fitting data better than the average error in the data should allow.

def chi2(y_measure,y_predict,errors):
    """Calculate the chi squared value given a measurement with errors and prediction"""
    return numpy.sum( (y_measure - y_predict)**2 / errors**2 )

def chi2reduced(y_measure, y_predict, errors, number_of_parameters):
    """Calculate the reduced chi squared value given a measurement with errors and prediction,
    and knowing the number of parameters in the model."""
    return chi2(y_measure, y_predict, errors)/(y_measure.size - number_of_parameters)

print("Constant velocity model chi2r=",chi2reduced(measured_distances,
                                        distance(measured_times,popt[0]),
                                        distance_errors,
                                        1))

print("Constant acceleration model chi2r=",chi2reduced(measured_distances,
                                        distance_with_acceleration(measured_times,popt2[0],popt2[1]),
                                        distance_errors,
                                        2))

    Constant velocity model chi2r= 109.63987561403505
    Constant acceleration model chi2r= 1.1810606671759618

Chi square values should be rounded to a small number of digits, keeping only 1 or 2 significant figures.

print("Constant velocity model chi2r=",round(chi2reduced(measured_distances,
                                        distance(measured_times,popt[0]),
                                        distance_errors,
                                        1),-1))

print("Constant acceleration model chi2r=",round(chi2reduced(measured_distances,
                                        distance_with_acceleration(measured_times,popt2[0],popt2[1]),
                                        distance_errors,
                                        2),1))

    Constant velocity model chi2r= 110.0
    Constant acceleration model chi2r= 1.2

Exercise 2

Put a print statement inside the model function distance_with_acceleration to print out the parameter values. What is happening to the parameter values?

def distance_with_acceleration_print(time, speed, acceleration):
    """Calculate the distance travelled with at a constant speed for a known time
    and constant acceleration."""

    print ("speed=",speed, "acceleration=",acceleration)
    return speed * time + 0.5 * acceleration * time**2

popt2, pcov2 = curve_fit(distance_with_acceleration_print, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)

    speed= 1.0 acceleration= 1.0
    speed= 1.0 acceleration= 1.0
    speed= 1.0 acceleration= 1.0
    speed= 1.0000000149011612 acceleration= 1.0
    speed= 1.0 acceleration= 1.0000000149011612
    speed= 10.257717023193093 acceleration= 0.0958943850247661
    speed= 10.257717176044988 acceleration= 0.0958943850247661
    speed= 10.257717023193093 acceleration= 0.0958943864537038
    speed= 10.257717023531002 acceleration= 0.09589438501373611

Non-linear regression

Mathematically, curve_fit is using least squared error regression to find the best parameter estimate.
curve_fit works with non linear model (e.g. $y=at^{(b-1)}+c$) and the error estimates are usually correct.

When fitting non-linear functions, use the `p0` keyword to start `curve_fit` with a good estimate

p0 is used to provide a first guess of the parameters you are trying to find
If you have some idea of a parameter value, use p0 to give curve_fit a better chance of finding the global minimum error for non-linear functions
Don’t be too precise so as not to bias the fitting process.

iteration=0

def nonlinear_function(t, a, b, c,verbose=True):
    global iteration
    if verbose:
        print (iteration, "a=",a, "b=",b, "c=",c)
    iteration = iteration+1
    return a*t**(b-1) + c

#generated "good" data
t=numpy.arange(10)
y=numpy.array([-0.173, 2.12, 9.42, 19.69, 37.16, 59.40, 96.59, 119.448, 158.0,201.9])
sigmaNL = numpy.ones(10)*0.5

First, try fitting the non-linear function with no initial guess

iteration=0
poptNL1, pcovNL1 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL)

a= 1.0 b= 1.0 c= 1.0
a= 1.0 b= 1.0 c= 1.0
a= 1.0 b= 1.0 c= 1.0
a= 1.0000000149011612 b= 1.0 c= 1.0
a= 1.0 b= 1.0000000149011612 c= 1.0
a= 1.0 b= 1.0 c= 1.0000000149011612
a= 77.19199892187382 b= 1.000001167729559 c= 1.0
a= 77.19200007212423 b= 1.000001167729559 c= 1.0
a= 77.19199892187382 b= 1.0000011826307376 c= 1.0
a= 77.19199892187382 b= 1.000001167729559 c= 1.0000000149011612
...
a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734594072738433
a= 2.507417147966414 b= 2.9990317544021594 c= -0.9734594072738433
a= 2.5074171106029874 b= 2.999031799091215 c= -0.9734594072738433
a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734593927681677
a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605

Try a good guess for the parameters

iteration = 0
poptNL2, pcovNL2 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL, p0=(2.5,3,0))
#I think it's 2.5*t^2 with no offset

a= 2.5 b= 3.0 c= 0.0
a= 2.5 b= 3.0 c= 0.0
a= 2.5 b= 3.0 c= 0.0
a= 2.500000037252903 b= 3.0 c= 0.0
a= 2.5 b= 3.0000000447034836 c= 0.0
a= 2.5 b= 3.0 c= 1.4901161193880158e-08
a= 2.507540116653946 b= 2.9990074809599334 c= -0.973917163330992
a= 2.5075401540192055 b= 2.9990074809599334 c= -0.973917163330992
a= 2.507540116653946 b= 2.9990075256486275 c= -0.973917163330992
a= 2.507540116653946 b= 2.9990074809599334 c= -0.9739171488184953
a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643979860024
a= 2.5074184599976044 b= 2.9990315172382234 c= -0.9734643979860024
a= 2.5074184226341583 b= 2.9990315619272754 c= -0.9734643979860024
a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643834802524
a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336

Now try an unreasonable guess for the b parameter

iteration = 0
poptNL3, pcovNL3 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL, p0=(3,-2,0.1))
#I think it's 3/t^3 +0.1

    0 a= 3.0 b= -2.0 c= 0.1
    1 a= 3.0 b= -2.0 c= 0.1
    2 a= 3.0 b= -2.0 c= 0.1
    3 a= 3.0000000447034836 b= -2.0 c= 0.1
    4 a= 3.0 b= -1.9999999701976776 c= 0.1
    5 a= 3.0 b= -2.0 c= 0.10000000149011612


    <ipython-input-16-1520d182c2d1>:7: RuntimeWarning: divide by zero encountered in power
      return a*t**(b-1) + c
    /Users/lee/anaconda3/lib/python3.8/site-packages/scipy/optimize/minpack.py:828: OptimizeWarning: Covariance of the parameters could not be estimated
      warnings.warn('Covariance of the parameters could not be estimated',

It’s always important to check the fit

plt.figure(figsize=(8,6))
plt.errorbar(t,
             y,
             yerr=sigmaNL, marker='o',ls='none',label="Data")

def plot_and_print(popt,ls,label):
    plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2]),label=label,ls=ls,lw=3)
plot_and_print(poptNL1,"-","No guess")
plot_and_print(poptNL2,"--","good guess")
plot_and_print(poptNL3,":","Bad guess")

plt.legend()
plt.xlabel("Time")
plt.ylabel("Value")
plt.figure(figsize=(8,6))



def plot_residual(data, popt,marker,label):
    plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2],verbose=False)-data,label=label,marker=marker,ls='',lw=3)
plot_residual(y,poptNL1,"o","No guess")
plot_residual(y,poptNL2,"s","good guess")
plot_residual(y,poptNL3,"<","Bad guess")

plt.legend()
plt.setp(plt.gca(),ylabel="Residual",xlabel="Time (s)")

a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605
a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336
a= 3.0 b= -2.0 c= 0.1

png

Key Points

scipy provides tools and functions to fit models to data.

Use curve_fit to fit linear and non-linear models to experimental data

Use appropriate errors in the sigma keyword to get a better estimate of parameter errors.

Check the fit using a plot if possible

Check the χ² value to compare the fit against the errors in the measurements.

Non linear models can be fitted, but may need an initial esimate of the parameters.

Wrap-Up

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What have we learned?

What else is out there and where do I find it?

Objectives

Name and locate scientific Python community sites for software, workshops, and help.

Python supports a large community within and outwith research.

The Python 3 documentation covers the core language and the standard library.
PyCon is the largest annual conference for the Python community.
SciPy is a rich collection of scientific utilities. It is also the name of a series of annual conferences.
Jupyter is the home of the Jupyter Notebook.
Pandas is the home of the Pandas data library.
Stack Overflow’s general Python section can be helpful, as can the sections on NumPy, SciPy, Pandas, and other topics.

Key Points

Python supports a large community within and outwith research.

PHY224 Python Review

Running and Quitting

Overview

Python programs are plain text files.

Use the Jupyter Notebook for editing and running Python.

Spyder

PyCharm

Jupyter notebook

Math with numbers

Math with strings (?)

Operators

Logic

Key Points

Variables and Assignment

Overview

Use variables to store values.

Use print to display values.

Variables must be created before they are used.

Variables can be used in calculations.

Use an index to get a single character from a string.

Use a slice to get a substring.

Use the built-in function len to find the length of a string.

Python is case-sensitive.

Use meaningful variable names.

Predicting Values

Solution

Challenge

Solution

Choosing a Name

Solution

Slicing

Key Points

Data Types and Type Conversion

Overview

Every value has a type.

Use the built-in function type to find the type of a value.

Types control what operations (or methods) can be performed on a given value.

You can use the “+” and “*” operators on strings.

Strings have a length (but numbers don’t).

Must convert numbers to strings or vice versa when operating on them.

Can mix integers and floats freely in operations.

Variables only change value when something is assigned to them.

Fractions

Solution

Automatic Type Conversion

Solution

Choose a Type

Solution

Division Types

Solution

Strings to Numbers

Solution

Arithmetic with Different Types

Solution

Complex Numbers

Solution

Key Points

Built-in Functions and Help

Overview

Use comments to add documentation to programs.

A function may take zero or more arguments.

Commonly-used built-in functions include max, min, and round.

Functions may only work for certain (combinations of) arguments.

Functions may have default values for some arguments.

Use the built-in function help to get help for a function.

Python reports a syntax error when it can’t understand the source of a program.

Python reports a runtime error when something goes wrong while a program is executing.

The Jupyter Notebook has two ways to get help.

Every function returns something.

What Happens When

Solution

Spot the Difference

Solution

Why Not?

Solution

Last Character of a String

Solution

Key Points

Libraries

Overview

Use `print` to display values.

Use the built-in function `len` to find the length of a string.

Use the built-in function `type` to find the type of a value.

Commonly-used built-in functions include `max`, `min`, and `round`.

Use the built-in function `help` to get help for a function.

Use `help` to learn about the contents of a library module.

Use `numpy.zeros` to create empty arrays

Use `numpy.ones` to create an array of ones.

Using `numpy.arange` to generate sets of numbers

Numpy arrays have a `shape`

The Numpy library has many functions that work on `arrays`

Check the `numpy` help and webpage for more functions

Use the `axis` keyword to use the function over a subset of the data.

Use the `scipy` library for common scientific and numerical methods

Use the Numpy package to load data using the `loadtxt` command

Remember your data has the shape `ROWS X COLUMNS`

Split the data into variables using `unpack`

Save data with `numpy.savetxt`

Control the data format with the `fmt` keyword

Add a header string with `header`

`matplotlib` is the most widely used scientific plotting library in Python

More complex formatting can be achieved using the `plot` keywords

Plot data with associated uncertainties using `errorbar`

`matplotlib` also makes bar charts and histograms

Define the figure size before plotting using the `figure` command

Place multiple figures on one plot with `subplot`

Figures can be saved with `savefig`