PHY224 Python Review

Running and Quitting

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How can I run Python programs?

Objectives
  • Launch the Jupyter Notebook, create new notebooks, and exit the Notebook.

  • Create Markdown cells in a notebook.

  • Create and run Python cells in a notebook.

Python programs are plain text files.

Use the Jupyter Notebook for editing and running Python.

Spyder

You can also use a Python Integrated Development Environment Example Spyder

PyCharm

You can also use a Python Integrated Development Environment PyCharm

Jupyter notebook

Example Jupyter Notebook

Math with numbers

2+3
5
2*2
4
9.81*10.2**2/2
510.3162
3 ** 2
9

Math with strings (?)

"cat " + "dog"
'cat dog'
"cat "*3 + "dog"
'cat cat cat dog'
"cat " - "dog"
...
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Operators

6 + 7
13
6 * 7
42
6 / 7 
0.8571428571428571
6 ** 7 # power
279936
66 % 7  # Modulo (remainder)
3

Logic

6 < 7
True
6 <= 7 
True
6 == 7 
False
6 >= 7 
False
6 > 7 
False
True and False 
False
True or False
True
not False
True

Key Points

  • Python programs are plain text files.

  • Use the Jupyter Notebook for editing and running Python.

  • The Notebook has Command and Edit modes.

  • Use the keyboard and mouse to select and edit cells.

  • The Notebook will turn Markdown into pretty-printed documentation.

  • Markdown does most of what HTML does.


Variables and Assignment

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I store data in programs?

Objectives
  • Write programs that assign scalar values to variables and perform calculations with those values.

  • Correctly trace value changes in programs that use scalar assignment.

Use variables to store values.

age = 42
first_name = 'Ahmed'

Use print to display values.

print(first_name, 'is', age, 'years old')
Ahmed is 42 years old

Variables must be created before they are used.

print(last_name)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-c1fbb4e96102> in <module>()
----> 1 print(last_name)

NameError: name 'last_name' is not defined

Variables can be used in calculations.

age = age + 3
print('Age in three years:', age)
Age in three years: 45

Use an index to get a single character from a string.

atom_name = 'helium'
print(atom_name[0])
h

Use a slice to get a substring.

atom_name = 'sodium'
print(atom_name[0:3])
sod

Use the built-in function len to find the length of a string.

print(len('helium'))
6

Python is case-sensitive.

Use meaningful variable names.

flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')

Predicting Values

What is the final value of position in the program below? (Try to predict the value without running the program, then check your prediction.)

initial = 'left'
position = initial
initial = 'right'

Solution

'left'

The initial variable is assigned the value ‘left’. In the second line, the position variable also receives the string value ‘left’. In third line, the initial variable is given the value ‘right’, but the position variable retains its string value of ‘left’.

Challenge

If you assign a = 123, what happens if you try to get the second digit of a via a[1]?

Solution

Numbers are not stored in the written representation, so they can’t be treated like strings.

a = 123
print(a[1])
TypeError: 'int' object is not subscriptable

Choosing a Name

Which is a better variable name, m, min, or minutes? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:

  1. ts = m * 60 + s
  2. tot_sec = min * 60 + sec
  3. total_seconds = minutes * 60 + seconds

Solution

minutes is better because min might mean something like “minimum” (and actually does in Python, but we haven’t seen that yet).

Slicing

What does the following program print?

atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])
atom_name[1:3] is: ar
  1. What does thing[low:high] do?
  2. What does thing[low:] (without a value after the colon) do?
  3. What does thing[:high] (without a value before the colon) do?
  4. What does thing[:] (just a colon) do?
  5. What does thing[number:negative-number] do?
  6. What happens when you choose a high value which is out of range? (i.e., try atom_name[0:15])

Key Points

  • Use variables to store values.

  • Use print to display values.

  • Variables persist between cells.

  • Variables must be created before they are used.

  • Variables can be used in calculations.

  • Use an index to get a single character from a string.

  • Use a slice to get a substring.

  • Use the built-in function len to find the length of a string.

  • Python is case-sensitive.

  • Use meaningful variable names.


Data Types and Type Conversion

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • What kinds of data do programs store?

  • How can I convert one type to another?

Objectives
  • Explain key differences between integers and floating point numbers.

  • Explain key differences between numbers and character strings.

  • Use built-in functions to convert between integers, floating point numbers, and strings.

Every value has a type.

Use the built-in function type to find the type of a value.

print(type(52))
<class 'int'>
fitness = 'average'
print(type(fitness))
<class 'str'>

Types control what operations (or methods) can be performed on a given value.

print(5 - 3)
2
print('hello' - 'h')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-67f5626a1e07> in <module>()
----> 1 print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

You can use the “+” and “*” operators on strings.

full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)
Ahmed Walsh
separator = '=' * 10
print(separator)
==========

Strings have a length (but numbers don’t).

print(len(full_name))
11
print(len(52))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f769e8e8097d> in <module>()
----> 1 print(len(52))

TypeError: object of type 'int' has no len()

Must convert numbers to strings or vice versa when operating on them.

print(1 + '2')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-fe4f54a023c6> in <module>()
----> 1 print(1 + '2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'
print(1 + int('2'))
print(str(1) + '2')
3
12

Can mix integers and floats freely in operations.

print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)
half is 0.5
three squared is 9.0

Variables only change value when something is assigned to them.

first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)
first is 2 and second is 5

Fractions

What type of value is 3.4? How can you find out?

Solution

It is a floating-point number (often abbreviated “float”).

print(type(3.4))
<class 'float'>

Automatic Type Conversion

What type of value is 3.25 + 4?

Solution

It is a float: integers are automatically converted to floats as necessary.

result = 3.25 + 4
print(result, 'is', type(result))
7.25 is <class 'float'>

Choose a Type

What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?

  1. Number of days since the start of the year.
  2. Time elapsed from the start of the year until now in days.
  3. Serial number of a piece of lab equipment.
  4. A lab specimen’s age
  5. Current population of a city.
  6. Average population of a city over time.

Solution

The answers to the questions are:

  1. Integer, since the number of days would lie between 1 and 365.
  2. Floating point, since fractional days are required
  3. Character string if serial number contains letters and numbers, otherwise integer if the serial number consists only of numerals
  4. This will vary! How do you define a specimen’s age? whole days since collection (integer)? date and time (string)?
  5. Choose floating point to represent population as large aggreates (eg millions), or integer to represent population in units of individuals.
  6. Floating point number, since an average is likely to have a fractional part.

Division Types

In Python 3, the // operator performs integer (whole-number) floor division, the / operator performs floating-point division, and the ‘%’ (or modulo) operator calculates and returns the remainder from integer division:

print('5 // 3:', 5//3)
print('5 / 3:', 5/3)
print('5 % 3:', 5%3)
5 // 3: 1
5 / 3: 1.6666666666666667
5 % 3: 2

However in Python2 (and other languages), the / operator between two integer types perform a floor (//) division. To perform a float division, we have to convert one of the integers to float.

print('5 // 3:', 1)
print('5 / 3:', 1 )
print('5 / float(3):', 1.6666667 )
print('float(5) / 3:', 1.6666667 )
print('float(5 / 3):', 1.0 )
print('5 % 3:', 2)

If num_subjects is the number of subjects taking part in a study, and num_per_survey is the number that can take part in a single survey, write an expression that calculates the number of surveys needed to reach everyone once.

Solution

We want the minimum number of surveys that reaches everyone once, which is the rounded up value of num_subjects / num_per_survey. This is equivalent to performing an integer division with // and adding 1.

num_subjects = 600
num_per_survey = 42
num_surveys = num_subjects // num_per_survey + 1

print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
600 subjects, 42 per survey: 15

Strings to Numbers

Where reasonable, float() will convert a string to a floating point number, and int() will convert a floating point number to an integer:

print("string to float:", float("3.4"))
print("float to int:", int(3.4))
string to float: 3.4
float to int: 3

If the conversion doesn’t make sense, however, an error message will occur

print("string to float:", float("Hello world!"))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-df3b790bf0a2> in <module>()
----> 1 print("string to float:", float("Hello world!"))

ValueError: could not convert string to float: 'Hello world!'

Given this information, what do you expect the following program to do?

What does it actually do?

Why do you think it does that?

print("fractional string to int:", int("3.4"))

Solution

What do you expect this program to do? It would not be so unreasonable to expect the Python 3 int command to convert the string “3.4” to 3.4 and an additional type conversion to 3. After all, Python 3 performs a lot of other magic - isn’t that part of its charm?

However, Python 3 throws an error. Why? To be consistent, possibly. If you ask Python to perform two consecutive typecasts, you must convert it explicitly in code.

int("3.4")
int(float("3.4"))
In [2]: int("3.4")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-ec6729dfccdc> in <module>()
----> 1 int("3.4")
ValueError: invalid literal for int() with base 10: '3.4'
3

Arithmetic with Different Types

Which of the following will print 2.0? Note: there may be more than one right answer.

first = 1.0
second = "1"
third = "1.1"
  1. first + float(second)
  2. float(second) + float(third)
  3. first + int(third)
  4. first + int(float(third))
  5. int(first) + int(float(third))
  6. 2.0 * second

Solution

Answer: 1 and 4

Complex Numbers

Python provides complex numbers, which are written as 1.0+2.0j. If val is an imaginary number, its real and imaginary parts can be accessed using dot notation as val.real and val.imag.

  1. Why do you think Python uses j instead of i for the imaginary part?
  2. What do you expect 1+2j + 3 to produce?
  3. What do you expect ‘4j’ to be? What about 4 j or `4 + j’? >

Solution

  1. Standard mathematics treatments typically use i to denote an imaginary number. However, from media reports it was an early convention established from electrical engineering that now presents a technically expensive area to change. Stack Overflow provides additional explanation and discussion
  2. 4+2j
  3. 4j, syntax error, depends on the value of j

Key Points

  • Every value has a type.

  • Use the built-in function type to find the type of a value.

  • Types control what operations can be done on values.

  • Strings can be added and multiplied.

  • Strings have a length (but numbers don’t).

  • Must convert numbers to strings or vice versa when operating on them.

  • Can mix integers and floats freely in operations.

  • Variables only change value when something is assigned to them.


Built-in Functions and Help

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How can I use built-in functions?

  • How can I find out what they do?

  • What kind of errors can occur in programs?

Objectives
  • Explain the purpose of functions.

  • Correctly call built-in Python functions.

  • Correctly nest calls to built-in functions.

  • Use help to display documentation for built-in functions.

  • Correctly describe situations in which SyntaxError and NameError occur.

Use comments to add documentation to programs.

# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.

A function may take zero or more arguments.

print('before')
print()
print('after')
before

after

Commonly-used built-in functions include max, min, and round.

print(max(1, 2, 3))
print(min('a', 'A', '0'))
3
0

Functions may only work for certain (combinations of) arguments.

print(max(1, 'a'))
TypeError: unorderable types: str() > int()

Functions may have default values for some arguments.

round(3.712)
4
round(3.712, 1)
3.7

Use the built-in function help to get help for a function.

help(round)
Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number

    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.

Python reports a syntax error when it can’t understand the source of a program.

# Forgot to close the quote marks around the string.
name = 'Feng
SyntaxError: EOL while scanning string literal
# An extra '=' in the assignment.
age = = 52
SyntaxError: invalid syntax
print("hello world"
  File "<ipython-input-6-d1cc229bf815>", line 1
    print ("hello world"
                        ^
SyntaxError: unexpected EOF while parsing

Python reports a runtime error when something goes wrong while a program is executing.

age = 53
remaining = 100 - aege # mis-spelled 'age'
NameError: name 'aege' is not defined

The Jupyter Notebook has two ways to get help.

Every function returns something.

result = print('example')
print('result of print is', result)
example
result of print is None

What Happens When

  1. Explain in simple terms the order of operations in the following program: when does the addition happen, when does the subtraction happen, when is each function called, etc.
  2. What is the final value of radiance?
radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))

Solution

1.

  1. 1.1 * radiance = 1.1
  2. 1.1 - 0.5 = 0.6
  3. min(randiance, 0.6) = 0.6
  4. 2.0 + 0.6 = 2.6
  5. max(2.1, 2.6) = 2.6
  6. At the end, radiance = 2.6

Spot the Difference

  1. Predict what each of the print statements in the program below will print.
  2. Does max(len(rich), poor) run or produce an error message? If it runs, does its result make any sense?
easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
print(max(len(rich), len(poor)))

Solution

  1. print(max(easy_string))
    
c
print(max(rich, poor))
tin
print(max(len(rich), len(poor)))
4
  1. It throws a TypeError. The command is trying to run max(4, 'tin') and you can’t compare a string and an integer

Why Not?

Why don’t max and min return None when they are given no arguments?

Solution

max and min return TypeErrors in this case because the correct number of parameters was not supplied. If it just returned None, the error would be much harder to trace as it would likely be stored into a variable and used later in the program, only to likely throw a runtime error.

Last Character of a String

If Python starts counting from zero, and len returns the number of characters in a string, what index expression will get the last character in the string name? (Note: we will see a simpler way to do this in a later episode.)

Solution

name[len(name) - 1]

Key Points

  • Use comments to add documentation to programs.

  • A function may take zero or more arguments.

  • Commonly-used built-in functions include max, min, and round.

  • Functions may only work for certain (combinations of) arguments.

  • Functions may have default values for some arguments.

  • Use the built-in function help to get help for a function.

  • The Jupyter Notebook has two ways to get help.

  • Every function returns something.

  • Python reports a syntax error when it can’t understand the source of a program.

  • Python reports a runtime error when something goes wrong while a program is executing.

  • Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.


Libraries

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I use software that other people have written?

  • How can I find out what that software does?

Objectives
  • Explain what software libraries are and why programmers create and use them.

  • Write programs that import and use libraries from Python’s standard library.

  • Find and read documentation for standard libraries interactively (in the interpreter) and online.

Most of the power of a programming language is in its libraries.

Libraries and modules

A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

A program must import a library module before using it.

import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))
pi is 3.141592653589793
cos(pi) is -1.0

Use help to learn about the contents of a library module.

help(math)
Help on module math:

NAME
    math

MODULE REFERENCE
    http://docs.language-python.org/3.5/library/math

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)

        Return the arc cosine (measured in radians) of x.
⋮ ⋮ ⋮

Import specific items from a library module to shorten programs.

from math import cos, pi

print('cos(pi) is', cos(pi))
cos(pi) is -1.0

Create an alias for a library module when importing it to shorten programs.

import math as m

print('cos(pi) is', m.cos(m.pi))
cos(pi) is -1.0

Exploring the Math Module

  1. What function from the math module can you use to calculate a square root without using sqrt?
  2. Since the library contains this function, why does sqrt exist?

Solution

  1. Using help(math) we see that we’ve got pow(x,y) in addition to sqrt(x), so we could use pow(x, 0.5) to find a square root.
  2. The sqrt(x) function is arguably more readable than pow(x, 0.5) when implementing equations. Readability is a cornerstone of good programming, so it makes sense to provide a special function for this specific common case.

    Also, the design of Python’s math library has its origin in the C standard, which includes both sqrt(x) and pow(x,y), so a little bit of the history of programming is showing in Python’s function names.

Locating the Right Module

You want to select a random character from a string:

bases = 'ACTTGCTTGAC'
  1. Which standard library module could help you?
  2. Which function would you select from that module? Are there alternatives?
  3. Try to write a program that uses the function.

Solution

The random module seems like it could help you.

The string has 11 characters, each having a positional index from 0 to 10. You could use random.randrange function (or the alias random.randint if you find that easier to remember) to get a random integer between 0 and 10, and then pick out the character at that position:

from random import randrange

random_index = randrange(len(bases))
print(bases[random_index])

or more compactly:

from random import randrange

print(bases[randrange(len(bases))])

Perhaps you found the random.sample function? It allows for slightly less typing:

from random import sample

print(sample(bases, 1)[0])

Note that this function returns a list of values. We will learn about lists in episode 11.

There’s also other functions you could use, but with more convoluted code as a result.

Jigsaw Puzzle (Parson’s Problem) Programming Example

Rearrange the following statements so that a random DNA base is printed and its index in the string. Not all statements may be needed. Feel free to use/add intermediate variables.

bases="ACTTGCTTGAC"
import math
import random
___ = random.randrange(n_bases)
___ = len(bases)
print("random base ", bases[___], "base index", ___)

Solution

import math 
import random
bases = "ACTTGCTTGAC" 
n_bases = len(bases)
idx = random.randrange(n_bases)
print("random base", bases[idx], "base index", idx)

When Is Help Available?

When a colleague of yours types help(math), Python reports an error:

NameError: name 'math' is not defined

What has your colleague forgotten to do?

Solution

Importing the math module (import math)

Importing With Aliases

  1. Fill in the blanks so that the program below prints 90.0.
  2. Rewrite the program so that it uses import without as.
  3. Which form do you find easier to read?
import math as m
angle = ____.degrees(____.pi / 2)
print(____)

Solution

import math as m
angle = m.degrees(m.pi / 2)
print(angle)

can bewritten as

import math
angle = math.degrees(math.pi / 2)
print(angle)

Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.

There Are Many Ways To Import Libraries!

Match the following print statements with the appropriate library calls.

Print commands:

  1. print("sin(pi/2) =",sin(pi/2))
  2. print("sin(pi/2) =",m.sin(m.pi/2))
  3. print("sin(pi/2) =",math.sin(math.pi/2))

Library calls:

  1. from math import sin,pi
  2. import math
  3. import math as m
  4. from math import *

Solution

  1. Library calls 1 and 4. In order to directly refer to sin and pi without the library name as prefix, you need to use the from ... import ... statement. Whereas library call 1 specifically imports the two functions sin and pi, library call 4 imports all functions in the math module.
  2. Library call 3. Here sin and pi are referred to with a shortened library name m instead of math. Library call 3 does exactly that using the import ... as ... syntax - it creates an alias for math in the form of the shortened name m.
  3. Library call 2. Here sin and pi are referred to with the regular library name math, so the regular import ... call suffices.

Importing Specific Items

  1. Fill in the blanks so that the program below prints 90.0.
  2. Do you find this version easier to read than preceding ones?
  3. Why wouldn’t programmers always use this form of import?
____ math import ____, ____
angle = degrees(pi / 2)
print(angle)

Solution

from math import degrees, pi
angle = degrees(pi / 2)
print(angle)

Most likely you find this version easier to read since it’s less dense. The main reason not to use this form of import is to avoid name clashes. For instance, you wouldn’t import degrees this way if you also wanted to use the name degrees for a variable or function of your own. Or if you were to also import a function named degrees from another library.

Reading Error Messages

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code, and read the error message. What type of error is it?
from math import log
log(0)

Solution

  1. The logarithm of x is only defined for x > 0, so 0 is outside the domain of the function.
  2. You get an error of type “ValueError”, indicating that the function received an inappropriate argument value. The additional message “math domain error” makes it clearer what the problem is.

Key Points

  • Most of the power of a programming language is in its libraries.

  • A program must import a library module in order to use it.

  • Use help to learn about the contents of a library module.

  • Import specific items from a library to shorten programs.

  • Create an alias for a library when importing it to shorten programs.


Numpy and Scipy

Overview

Teaching: 40 min
Exercises: 10 min
Questions
  • How do I deal with tabular scientific data?

Objectives
  • Import the numpy library.

  • Understand the NDArray object.

  • Import the numpy library.

  • Get some basic information about a numpy and scipy objects and methods.

Numpy is the main Python library for scientific computation

Import numpy before using it

import numpy as np
import numpy
from numpy import cos

print(numpy.cos, np.cos, cos)
<ufunc 'cos'> <ufunc 'cos'> <ufunc 'cos'>

Use numpy.zeros to create empty arrays

f10 = numpy.zeros(10)
i10 = numpy.zeros(10, dtype=int)
print("default array of zeros: ", f10)
print("integer array of zeros: ", i10)
default array of zeros:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
integer array of zeros:  [0 0 0 0 0 0 0 0 0 0]

Use numpy.ones to create an array of ones.

print("Using numpy.ones    : ", numpy.ones(10))
print("is the same thing as: ", numpy.zeros(10)+1)

Using numpy.ones    :  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
is the same thing as:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Using numpy.arange to generate sets of numbers

numpy.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
numpy.arange(1,10)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
numpy.arange(1,10,2)
array([1, 3, 5, 7, 9])
numpy.arange(1,10,-2)
array([], dtype=int64)
numpy.arange(10,1,-2)
array([10,  8,  6,  4,  2])

Numpy arrays have a shape

a = numpy.arange(10)
print("a's shape is ",a.shape)

b=a.reshape(5,2)
print("b's shape is ",b.shape)
a's shape is  (10,)
b's shape is  (5, 2)

Numpy arrays can be treated like single numbers in arithmetic

a = numpy.arange(5)
b = numpy.arange(5)
print("a=",a)
print("b=",b)
print("a*b=",a*b)
print("a+b=",a+b)
a= [0 1 2 3 4]
b= [0 1 2 3 4]
a*b= [ 0  1  4  9 16]
a+b= [0 2 4 6 8]
c = numpy.ones((5,2))
d = numpy.ones((5,2)) + 100
c+d
array([[102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.]])
e = c.reshape(2,5)
c+e #c and e have different shapes
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-0e32881b9afe> in <module>()
      1 e = c.reshape(2,5)
----> 2 c+e #c and e have different shapes

ValueError: operands could not be broadcast together with shapes (5,2) (2,5) 
---------------------------------------------------------------------------

The Numpy library has many functions that work on arrays

a=numpy.arange(5)
print("a = ", a)
a =  [0 1 2 3 4]
print("sum(a) = ", a.sum())
sum(a) =  10
print("mean(a) = ", a.mean())
mean(a) =  2.0
print("std(a) = ", a.std()) #what is this?
std(a) =  1.4142135623730951
print("np.sin(a) = ", np.sin(a))
np.sin(a) =  [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]

Check the numpy help and webpage for more functions

https://docs.scipy.org/doc/numpy/reference/routines.html

Use the axis keyword to use the function over a subset of the data.

a = numpy.arange(10).reshape(5,2)
print("a=",a)
print("mean(a)="  ,numpy.mean(a))
print("mean(a,0)=",numpy.mean(a,axis=0))
print("mean(a,1)=",numpy.mean(a,axis=1))
a= [[0 1]
    [2 3]
    [4 5]
    [6 7]
    [8 9]]
mean(a)= 4.5
mean(a,0)= [4. 5.]
mean(a,1)= [0.5 2.5 4.5 6.5 8.5]

Use square brackets to access elements in the array

a=numpy.arange(10)
a[5]
5
a[5:10]
array([5, 6, 7, 8, 9])
a[5:] #No second number means "rest of the array"
array([5, 6, 7, 8, 9])
a[:5] #No first number means "from the start of the array"
array([0, 1, 2, 3, 4])
a[5:10:2] #A third number means "every Nth element"
array([5, 7, 9])
a[5:10:-2] #negative numbers mean "count backwards"
array([], dtype=int64)
a[10:5:-2] #but you need to start and stop in the same order
array([9, 7])

Challenge 1

There is an arange function and linspace function, that take similar arguments. Explain the difference. For example, what does the following code do?

print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))

Solution

  • arange takes the arguments start, stop, step, and generates numbers from start to stop (excluding stop) stepping by step each time.
  • linspace takes the arguments start, stop, number, and generates numbers from start to stop (including stop) with number of steps.
print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))
[1. 4. 7.]
[1. 5. 9.]

Challenge 2

Generate a 10 x 3 array of random numbers (using numpy.random.rand). From each row, find the minimum absolute value. Make use of numpy.abs and numpy.min. The result should be a one-dimensional array.

Solution

The important part of the solution is passing the axis keyword to the min function:

a = numpy.random.rand(30).reshape(10,3)
print("a is ", a)
print()
print("min(a) along each row is ", numpy.min( numpy.abs( a ), axis=0))

Use the scipy library for common scientific and numerical methods

Example : integrate y=x^2 from 0 to 10

x = numpy.arange(11)
y = x**2
import scipy.integrate
#by default, trapz assumes the independent variable is a list of integers from 0..N
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333
integral of x^2 from 0 to 10 335.0
x = numpy.linspace(0,10,1000) # finer grid
y=x**2
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 33300.01668335002
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y,x) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 333.333500333834

We’ll come back to scipy.optimize later.

Key Points

  • Use the numpy library to get basic statistics out of tabular data.

  • Print numpy arrays.

  • Use mean, sum, std to get summary statistics.

  • Add numpy arrays together.

  • Study the scipy website

  • Use scipy to integrate tabular data.


Reading Tabular Data into arrays

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can I read tabular data?

  • How can I save tabular data?

Objectives
  • Import the numpy library.

  • Use numpy to load a simple CSV data set.

  • Get some basic information about a numpy array.

Use the Numpy package to load data using the loadtxt command

import numpy
data = numpy.loadtxt('data/galileo_flat.empty')
print(data)
[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Read a comma separated file of data with headers

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
print(data)
[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Remember your data has the shape ROWS X COLUMNS

print("data shape is ",data.shape)
data shape is  (5, 2)

Split the data into variables using unpack

D,H = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',unpack=True)
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)
data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
D,H = data.T
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)

Save data with numpy.savetxt

Saving text data is made possible with the savetxt command. It mirrors the loadtxt command

numpy.savetxt("data/mydata.txt", data, delimiter=',')
1.500000000000000000e+03,1.000000000000000000e+03
1.340000000000000000e+03,8.280000000000000000e+02
1.328000000000000000e+03,8.000000000000000000e+02
1.172000000000000000e+03,6.000000000000000000e+02
8.000000000000000000e+02,3.000000000000000000e+02

Control the data format with the fmt keyword

numpy.savetxt("data/mydata2.txt", data, delimiter=',', fmt='%.6g')
1500,1000
1340,828
1328,800
1172,600
800,300

Add a header string with header

header="Distance (D), Header(H)"
newdata = numpy.vstack([D,H]).T
numpy.savetxt("data/mydata3.txt", newdata, delimiter=', ', header=header, fmt='%.6g')
# Distance (D), Header(H)
1500, 1000
1340, 828
1328, 800
1172, 600
800, 300

More complex loadtxt commands can make your data more flexible

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',\
                    dtype={'names':("Distance","Height"), 'formats':('f4','f4')})
print("data shape is ", data.shape)
print("Distance data is ", data["Distance"])
data shape is  (5,)
Distance data is  [1500. 1340. 1328. 1172.  800.]

Key Points

  • Use numpy.loadtxt library to load tabular data.

  • Use numpy.savetxt library to save tabular data.

  • Use delimiters to make your text file cleaner.

  • Use comments in your file to describe the contents.


Plotting

Overview

Teaching: 25 min
Exercises: 15 min
Questions
  • How can I plot my data?

  • How can I save my plot for publishing?

Objectives
  • Create a time series plot showing a single data set.

  • Create a scatter plot showing relationship between two data sets.

matplotlib is the most widely used scientific plotting library in Python

%matplotlib inline
import matplotlib.pyplot as plt
import numpy
time = numpy.array([0,1,2,3])
position = numpy.array([0,100,200,300])

plt.plot(time, position)
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
Text(0, 0.5, 'Position (km)')

png

The color and format of lines and markers can be changed.

import numpy
time = numpy.arange(10)
p1 = time
p2 = time*2
p3 = time*4

plt.plot(time, p1,'b-')
plt.plot(time, p2,'ro')
plt.plot(time, p3,'g+-')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
Text(0, 0.5, 'Position (km)')

png

More complex formatting can be achieved using the plot keywords

plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9b88472b0>

png

Built in “styles” provide consistent plots

print("available style names: ", plt.style.available)
available style names:  ['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']
plt.style.use("ggplot")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9a8405bb0>

png

plt.style.use("fivethirtyeight")
plt.plot(time, p1,color='blue', linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'ro', markersize=10, label="red dots")
plt.plot(time, p3,'g-', marker='+')
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9a843aac0>

png

plt.style.use("seaborn-whitegrid")
plt.plot(time, p1,linestyle='-', linewidth=5,label="blue line")
plt.plot(time, p2,'o', markersize=10, label="red dots")
plt.plot(time, p3,'-', marker='+') #where's the marker?
plt.xlabel("Time (hr)")
plt.ylabel("Position (km)")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9780b4070>

png

Plots can be scatter plots with points and no lines

numpy.random.seed(20)
x,y = numpy.random.randint(0,100,100), numpy.random.randn(100)
x=numpy.cumsum(x)
y=numpy.cumsum(y)

plt.scatter( x, y)
plt.scatter( x, 10-y**2, color='green',marker='<')
plt.xlabel("Labels still work")
plt.title("title")
Text(0.5, 1.0, 'title')

png

Plot data with associated uncertainties using errorbar

numpy.random.seed(42)
x = numpy.random.rand(10)*10
x=numpy.cumsum(x)
error  = numpy.random.randn(10)*4
y=x + numpy.random.randn(10)*0.5

plt.errorbar( x, y, yerr=error,color='green',marker='o',ls='',lw=1,label="data")
plt.xlabel("Labels still work")
plt.title("errobar")
plt.legend()
<matplotlib.legend.Legend at 0x7fe9b888f040>

png

plt.errorbar?

matplotlib also makes bar charts and histograms

x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("Bar chart")
Text(0.5, 1.0, 'Bar chart')

png

x = numpy.random.randint(0,100,50)
bin_count, bin_edges, boxes = plt.hist(x, bins=10)
print("The counts are ", bin_count)
The counts are  [8. 5. 0. 5. 5. 6. 3. 6. 8. 4.]

png

bin_count, bin_edges, boxes = plt.hist(x, bins=10, rwidth=0.9)
plt.title("cleaner histogram")
Text(0.5, 1.0, 'cleaner histogram')

png

# Compute pie slices
N = bin_count.size
theta = 0.5*(bin_edges[1:] + bin_edges[:-1])
theta = theta * 2*numpy.pi/theta.max()
width = numpy.pi / 4 * numpy.random.rand(N)

ax = plt.subplot(111, projection='polar')
bars = ax.bar(theta, bin_count, width=width, bottom=0.0,alpha=0.5)

# Use custom colors and opacity
for r, bar in zip(bin_count, bars):
    bar.set_facecolor(plt.cm.viridis(r / bin_count.max()))
    bar.set_alpha(0.5)

t=plt.title("Something more exotic")

png

Define the figure size before plotting using the figure command

plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.bar(x,y)
plt.title("narrow bar chart")
Text(0.5, 1.0, 'narrow bar chart')

png

Place multiple figures on one plot with subplot

plt.figure(figsize=(8,2))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(2,2,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(2,2,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(2,2,4)
plt.bar(x,y)
plt.title("sometimes the formatting is awkward")
Text(0.5, 1.0, 'sometimes the formatting is awkward')

png

plt.figure(figsize=(8,3))
x = [0,1,2,3,4,5]
y = [0,4,2,6,8,2]
plt.subplot(1,3,1)
plt.bar(x,y)
plt.title("top left")
plt.subplot(1,3,2)
plt.bar(y,x)
plt.title("top right")
plt.subplot(1,3,3)
plt.bar(x,y)
plt.title("less awkward")

Text(0.5, 1.0, 'less awkward')

png

Figures can be saved with savefig

plt.figure(figsize=(8,3))
plt.plot(x,y)
plt.savefig("data/fig1.pdf") #PDF format
plt.savefig("data/fig1.png", dpi=150, transparent=True) #PNG format

png

figure

Note that functions in plt refer to a global figure variable and after a figure has been displayed to the screen (e.g. with plt.show) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you call plt.savefig before the plot is displayed to the screen, otherwise you may find a file with an empty plot.

It is also possible to save the figure to file by first getting a reference to the figure with plt.gcf, then calling the savefig class method from that variable.

fig = plt.gcf() # get current figure
data.plot(kind='bar')
fig.savefig('my_figure.png')

Key Points

  • matplotlib is the most widely used scientific plotting library in Python.

  • Plot data directly from a Pandas dataframe.

  • Select and transform data, then plot it.

  • Many styles of plot are available: see the Python Graph Gallery for more options.

  • Can plot many sets of data together.


Lists

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I store multiple values?

Objectives
  • Explain why programs need collections of values.

  • Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.

A list stores many values in a single structure.

pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
print('pressures:', pressures)
print('length:', len(pressures))
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
length: 5

Use an item’s index to fetch it from a list.

print('zeroth item of pressures:', pressures[0])
print('fourth item of pressures:', pressures[4])
zeroth item of pressures: 0.273
fourth item of pressures: 0.276

Lists’ values can be replaced by assigning to them.

pressures[0] = 0.265
print('pressures is now:', pressures)
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]

Appending items to a list lengthens it.

primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
primes.append(9)
print('primes has become:', primes)
primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7, 9]
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)
primes is currently: [2, 3, 5, 7, 9]
primes has now become: [2, 3, 5, 7, 9, 11, 13, 17, 19]
primes has finally become: [2, 3, 5, 7, 9, 11, 13, 17, 19, [37, 41, 43, 47]]

Note that while extend maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional.

Use del to remove items from a list entirely.

primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)
primes before removing last item: [2, 3, 5, 7, 9]
primes after removing last item: [2, 3, 5, 7]

The empty list contains no values.

Lists may contain values of different types.

goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

Character strings can be indexed like lists.

element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])
zeroth character: c
third character: b

Character strings are immutable.

element[0] = 'C'
TypeError: 'str' object does not support item assignment

Indexing beyond the end of the collection is an error.

print('99th element of element is:', element[99])
IndexError: string index out of range

Fill in the Blanks

Fill in the blanks so that the program below produces the output shown.

values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)
first time: [1, 3, 5]
second time: [3, 5]

Solution

values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)

How Large is a Slice?

If ‘low’ and ‘high’ are both non-negative integers, how long is the list values[low:high]?

Solution

The list values[low:high] has high - low elements. For example, values[1:4] has the 3 elements values[1], values[2], and values[3]. Note that the expression will only work if high is less than the total length of the list values.

From Strings to Lists and Back

Given this:

print('string to list:', list('tin'))
print('list to string:', ''.join(['g', 'o', 'l', 'd']))
['t', 'i', 'n']
'gold'
  1. Explain in simple terms what list('some string') does.
  2. What does '-'.join(['x', 'y']) generate?

Solution

  1. list('some string') “splits” a string into a list of its characters.
  2. x-y

Working With the End

What does the following program print?

element = 'helium'
print(element[-1])
  1. How does Python interpret a negative index?
  2. If a list or string has N elements, what is the most negative index that can safely be used with it, and what location does that index represent?
  3. If values is a list, what does del values[-1] do?
  4. How can you display all elements but the last one without changing values? (Hint: you will need to combine slicing and negative indexing.)

Solution

The program prints m.

  1. Python interprets a negative index as starting from the end (as opposed to starting from the beginning). The last element is -1.
  2. The last index that can safely be used with a list of N elements is element -N, which represents the first element.
  3. del values[-1] removes the last element from the list.
  4. values[:-1]

Stepping Through a List

What does the following program print?

element = 'fluorine'
print(element[::2])
print(element[::-1])
  1. If we write a slice as low:high:stride, what does stride do?
  2. What expression would select all of the even-numbered items from a collection?

Solution

The program prints

furn
eniroulf
  1. stride is the step size of the slice
  2. The slice 1::2 selects all even-numbered items from a collection: it starts with element 1 (which is the second element, since indexing starts at 0), goes on until the end (since no end is given), and uses a step size of 2 (i.e., selects every second element).

Slice Bounds

What does the following program print?

element = 'lithium'
print(element[0:20])
print(element[-1:3])

Solution

lithium

Sort and Sorted

What do these two programs print? In simple terms, explain the difference between sorted(letters) and letters.sort().

# Program A
letters = list('gold')
result = sorted(letters)
print('letters is', letters, 'and result is', result)
# Program B
letters = list('gold')
result = letters.sort()
print('letters is', letters, 'and result is', result)

Solution

Program A prints

letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']

Program B prints

letters is ['d', 'g', 'l', 'o'] and result is None

sorted(letters) returns a sorted copy of the list letters (the original list letters remains unchanged), while letters.sort() sorts the list letters in-place and does not return anything.

Copying (or Not)

What do these two programs print? In simple terms, explain the difference between new = old and new = old[:].

# Program A
old = list('gold')
new = old      # simple assignment
new[0] = 'D'
print('new is', new, 'and old is', old)
# Program B
old = list('gold')
new = old[:]   # assigning a slice
new[0] = 'D'
print('new is', new, 'and old is', old)

Solution

Program A prints

new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']

Program B prints

new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']

new = old makes new a reference to the list old; new and old point towards the same object.

new = old[:] however creates a new list object new containing all elements from the list old; new and old are different objects.

Key Points

  • A list stores many values in a single structure.

  • Use an item’s index to fetch it from a list.

  • Lists’ values can be replaced by assigning to them.

  • Appending items to a list lengthens it.

  • Use del to remove items from a list entirely.

  • The empty list contains no values.

  • Lists may contain values of different types.

  • Character strings can be indexed like lists.

  • Character strings are immutable.

  • Indexing beyond the end of the collection is an error.


For Loops

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How can I make a program do many things?

Objectives
  • Explain what for loops are normally used for.

  • Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.

  • Write for loops that use the Accumulator pattern to aggregate values.

A for loop executes commands once for each value in a collection.

for number in [2, 3, 5]:
    print(number)
print(2)
print(3)
print(5)
2
3
5

The first line of the for loop must end with a colon, and the body must be indented.

for number in [2, 3, 5]:
print(number)
IndentationError: expected an indented block
firstName="Jon"
  lastName="Smith"
  File "<ipython-input-7-f65f2962bf9c>", line 2
    lastName="Smith"
    ^
IndentationError: unexpected indent

A for loop is made up of a collection, a loop variable, and a body.

for number in [2, 3, 5]:
    print(number)

Loop variables can be called anything.

for kitten in [2, 3, 5]:
    print(kitten)

The body of a loop can contain many statements.

primes = [2, 3, 5]
for p in primes:
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)
2 4 8
3 9 27
5 25 125

Use range to iterate over a sequence of numbers.

print('a range is not a list: range(0, 3)')
for number in range(0,3):
    print(number)
a range is not a list: range(0, 3)
0
1
2

The Accumulator pattern turns many values into one.

# Sum the first 10 integers.
total = 0
for number in range(10):
   total = total + (number + 1)
print(total)
55

Classifying Errors

Is an indentation error a syntax error or a runtime error?

Solution

An IndentationError is a syntax error. Programs with syntax errors cannot be started. A program with a runtime error will start but an error will be thrown under certain conditions.

Tracing Execution

Create a table showing the numbers of the lines that are executed when this program runs, and the values of the variables after each line is executed.

total = 0
for char in "tin":
    total = total + 1

Solution

Line no Variables
1 total = 0
2 total = 0 char = ‘t’
3 total = 1 char = ‘t’
2 total = 1 char = ‘i’
3 total = 2 char = ‘i’
2 total = 2 char = ‘n’
3 total = 3 char = ‘n’

Reversing a String

Fill in the blanks in the program below so that it prints “nit” (the reverse of the original character string “tin”).

original = "tin"
result = ____
for char in original:
    result = ____
print(result)

Solution

original = "tin"
result = ""
for char in original:
    result = char + result
print(result)

Practice Accumulating

Fill in the blanks in each of the programs below to produce the indicated result.

# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

Solution

total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)

Solution

lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)

Solution

words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)
# Create acronym: ["red", "green", "blue"] => "RGB"
# write the whole thing

Solution

acronym = ""
for word in ["red", "green", "blue"]:
    acronym = acronym + word[0].upper()
print(acronym)

Cumulative Sum

Reorder and properly indent the lines of code below so that they print an array with the cumulative sum of data. The result should be [1, 3, 5, 10].

cumulative += [sum]
for number in data:
cumulative = []
sum += number
sum = 0
print(cumulative)
data = [1,2,2,5]

Solution

sum = 0
data = [1,2,2,5]
cumulative = []
for number in data:
    sum += number
    cumulative.append(sum)
print(cumulative)

Identifying Variable Name Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code and read the error message. What type of NameError do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?
  3. Fix the error.
  4. Repeat steps 2 and 3, until you have fixed all the errors.
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)

Solution

message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

Identifying Item Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code, and read the error message. What type of error is it?
  3. Fix the error.
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])

Solution

seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[3])

Key Points

  • A for loop executes commands once for each value in a collection.

  • The first line of the for loop must end with a colon, and the body must be indented.

  • Indentation is always meaningful in Python.

  • A for loop is made up of a collection, a loop variable, and a body.

  • Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).

  • The body of a loop can contain many statements.

  • Use range to iterate over a sequence of numbers.

  • The Accumulator pattern turns many values into one.


Looping Over Data Sets

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • How can I process many data sets with a single command?

Objectives
  • Be able to read and write globbing expressions that match sets of files.

  • Use glob to create lists of files.

  • Write for loops to perform operations on files given their names in a list.

Use a for loop to process files given a list of their names.

import numpy
for filename in ["data/galileo_flat.csv","data/galileo_ramp.csv"]:
    distance, height = numpy.loadtxt(filename, skiprows=2,\
                                     comments="#", delimiter=',', unpack=True)
    print(filename, distance.min(), height.max())
    data/galileo_flat.csv 800.0 1000.0
    data/galileo_ramp.csv 253.0 1000.0

Use glob.glob to find sets of files whose names match a pattern.

import glob
print("all csv files in data/random directory:", glob.glob("data/random/*.csv"))
    all csv files in data/random directory: ['data/random/data.029.csv', 'data/random/data.015.csv', 'data/random/data.001.csv', 'data/random/data.000.csv', 'data/random/data.014.csv', 'data/random/data.028.csv', 'data/random/data.002.csv', '...','data/random/data.035.csv', 'data/random/data.009.csv', 'data/random/data.023.csv', 'data/random/data.037.csv', 'data/random/data.036.csv', 'data/random/data.022.csv', 'data/random/data.026.csv', 'data/random/data.032.csv', 'data/random/data.033.csv', 'data/random/data.027.csv', 'data/random/data.031.csv', 'data/random/data.025.csv', 'data/random/data.019.csv', 'data/random/data.018.csv', 'data/random/data.024.csv', 'data/random/data.030.csv']
print("all txt files in data/random directory:", glob.glob("data/random/*.txt"))
    all txt files in data/random directory: []

Use glob and for to process batches of files.

Helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.

for filename in sorted(glob.glob('data/random/*.csv')):
    distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True)
    print(filename, distance.mean(), height.std())
    data/random/data.000.csv 0.973455156 14.253108991004671
    data/random/data.001.csv -4.4384872 18.271604877007015
    data/random/data.002.csv -2.28566216 13.972753882460598
    ...
    data/random/data.096.csv 0.724618214 16.429186710317516
    data/random/data.097.csv 0.48894924 15.292681284065516
    data/random/data.098.csv 1.84267224 11.33741881916356
    data/random/data.099.csv -3.772237556 14.155571992376832

Determining Matches

Which of these files is not matched by the expression glob.glob('data/*as*.csv')?

  1. data/gapminder_gdp_africa.csv
  2. data/gapminder_gdp_americas.csv
  3. data/gapminder_gdp_asia.csv

Solution

1 is not matched by the glob.

Averaging over datasets

Write a program that calculate the average value from all of the data in the files, instead of individual files

Solution

import glob
import numpy as np
data = []
for filename in sorted(glob.glob('data/random/*.csv')):
    distance, height = numpy.loadtxt(filename, delimiter=',', unpack=True)
    data.append(distance)
print(np.mean(data))

Key Points

  • Use a for loop to process files given a list of their names.

  • Use glob.glob to find sets of files whose names match a pattern.

  • Use glob and for to process batches of files.


Writing Functions

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How can I create my own functions?

Objectives
  • Explain and identify the difference between function definition and function call.

  • Write a function that takes a small, fixed number of arguments and produces a single result.

Break programs down into functions to make them easier to understand.

Define a function using def with a name, parameters, and a block of code.

def print_greeting():
    print('Hello!')

Defining a function does not run it.

print_greeting()
Hello!

Arguments in call are matched to parameters in definition.

def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

print_date(1871, 3, 19)
1871/3/19

Or, we can name the arguments when we call the function, which allows us to specify them in any order:

print_date(month=3, day=19, year=1871)
1871/3/19

Functions may return a result to their caller using return.

def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)
a = average([1, 3, 4])
print('average of actual values:', a)
2.6666666666666665
print('average of empty list:', average([]))
None
result = print_date(1871, 3, 19)
print('result of call is:', result)
1871/3/19
result of call is: None

Identifying Syntax Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code and read the error message. Is it a SyntaxError or an IndentationError?
  3. Fix the error.
  4. Repeat steps 2 and 3 until you have fixed all the errors.
def another_function
  print("Syntax errors are annoying.")
   print("But at least python tells us about them!")
  print("So they are usually not too hard to fix.")

Solution

def another_function():
  print("Syntax errors are annoying.")
  print("But at least Python tells us about them!")
  print("So they are usually not too hard to fix.")

Definition and Use

What does the following program print?

def report(pressure):
    print('pressure is', pressure)

print('calling', report, 22.5)

Solution

calling <function report at 0x7fd128ff1bf8> 22.5

A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows

print("calling")
report(22.5)

Order of Operations

The example above:

result = print_date(1871, 3, 19)
print('result of call is:', result)

printed:

1871/3/19
result of call is: None

Explain why the two lines of output appeared in the order they did.

What’s wrong in this example?

result = print_date(1871,3,19)

def print_date(year, month, day):
   joined = str(year) + '/' + str(month) + '/' + str(day)
   print(joined)

Solution

  1. The first line of output (1871/3/19) is from the print function inside print_date(), while the second line is from the print function below the function call. All of the code inside print_date() is executed first, and the program then “leaves” the function and executes the rest of the code.
  2. The problem with the example is that the function is defined after the call to the function is made. Python therefore doesn’t understand the function call.

Encapsulation

Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.

import pandas

def min_in_data(____):
    data = ____
    return ____

Solution

import pandas

def min_in_data(filename):
    data = pandas.read_csv(filename)
    return data.min()

Find the First

Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty?

def first_negative(values):
    for v in ____:
        if ____:
            return ____

Solution

def first_negative(values):
    for v in values:
        if v<0:
            return v

If an empty list is passed to this function, it returns None:

my_list = []
print(first_negative(my_list)
None

Calling by Name

Earlier we saw this function:

def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

We saw that we can call the function using named arguments, like this:

print_date(day=1, month=2, year=2003)
  1. What does print_date(day=1, month=2, year=2003) print?
  2. When have you seen a function call like this before?
  3. When and why is it useful to call functions this way?

Solution

  1. 2003/2/1
  2. We saw examples of using named arguments when working with the pandas library. For example, when reading in a dataset using data = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), the last argument index_col is a named argument.
  3. Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.

Encapsulate of If/Print Block

The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.

Please re-write the code so that the if-block is folded into a function.

 import random
 for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass=70+20.0*(2.0*random.random()-1.0)

    print(mass)
   
    #egg sizing machinery prints a label
    if(mass>=85):
       print("jumbo")
    elif(mass>=70):
       print("large")
    elif(mass<70 and mass>=55):
       print("medium")
    else:
       print("small")

The simplified program follows. What function definition will make it functional?

 # revised version
 import random
 for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass=70+20.0*(2.0*random.random()-1.0)

    print(mass,print_egg_label(mass))    

  1. Create a function definition for print_egg_label() that will work with the revised program above. Note, the function’s return value will be significant. Sample output might be 71.23 large.
  2. A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that’s less than 50 grams. Modify your print_egg_label() function to account for these error conditions. Sample output could be 25 too light, probably spoiled.

Solution

def print_egg_label(mass):
    #egg sizing machinery prints a label
    if(mass>=90):
        return("warning: egg might be dirty")
    elif(mass>=85):
        return("jumbo")
    elif(mass>=70):
        return("large")
    elif(mass<70 and mass>=55):
        return("medium")
    elif(mass<50):
        return("too light, probably spoiled")
    else:
        return("small")

Key Points

  • Break programs down into functions to make them easier to understand.

  • Define a function using def with a name, parameters, and a block of code.

  • Defining a function does not run it.

  • Arguments in call are matched to parameters in definition.

  • Functions may return a result to their caller using return.


Variable Scope

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How do function calls actually work?

  • How can I determine where errors occurred?

Objectives
  • Identify local and global variables.

  • Identify parameters as local variables.

  • Read a traceback and determine the file, function, and line number on which the error occurred, the type of error, and the error message.

The scope of a variable is the part of a program that can ‘see’ that variable.

pressure = 103.9

def adjust(t):
    temperature = t * 1.43 / pressure
    return temperature
print('adjusted:', adjust(0.9))
print('temperature after call:', temperature)
adjusted: 0.01238691049085659
Traceback (most recent call last):
  File "/Users/swcarpentry/foo.py", line 8, in <module>
    print('temperature after call:', temperature)
NameError: name 'temperature' is not defined

Local and Global Variable Use

Trace the values of all variables in this program as it is executed. (Use ‘—’ as the value of variables before and after they exist.)

limit = 100

def clip(value):
    return min(max(0.0, value), limit)

value = -22.5
print(clip(value))

Reading Error Messages

Read the traceback below, and identify the following:

  1. How many levels does the traceback have?
  2. What is the file name where the error occurred?
  3. What is the function name where the error occurred?
  4. On which line number in this function did the error occurr?
  5. What is the type of error?
  6. What is the error message?
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-e4c4cbafeeb5> in <module>()
      1 import errors_02
----> 2 errors_02.print_friday_message()

/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
     13
     14 def print_friday_message():
---> 15     print_message("Friday")

/Users/ghopper/thesis/code/errors_02.py in print_message(day)
      9         "sunday": "Aw, the weekend is almost over."
     10     }
---> 11     print(messages[day])
     12
     13

KeyError: 'Friday'

Key Points

  • The scope of a variable is the part of a program that can ‘see’ that variable.


Conditionals

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How can programs do different things for different data?

Objectives
  • Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators).

  • Trace the execution of unnested conditionals and conditionals inside loops.

Use if statements to control whether or not a block of code is executed.

mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

mass = 2.07
if mass > 3.0:
    print (mass, 'is large')
3.54 is large

Conditionals are often used inside loops.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
3.54 is large
9.22 is large

Use else to execute a block of code when an if condition is not true.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')
3.54 is large
2.07 is small
9.22 is large
1.86 is small
1.71 is small

Use elif to specify additional tests.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 9.0:
        print(m, 'is HUGE')
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')
3.54 is large
2.07 is small
9.22 is HUGE
1.86 is small
1.71 is small

Conditions are tested once, in order.

grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')
grade is C
velocity = 10.0
if velocity > 20.0:
    print('moving too fast')
else:
    print('adjusting velocity')
    velocity = 50.0
adjusting velocity
velocity = 10.0
for i in range(5): # execute the loop 5 times
    print(i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)
0 : 10.0
moving too slow
1 : 20.0
moving too slow
2 : 30.0
moving too fast
3 : 25.0
moving too fast
4 : 20.0
moving too slow
final velocity: 30.0

Create a table showing variables’ values to trace a program’s execution.

i 0 . 1 . 2 . 3 . 4 .
velocity 10.0 20.0 . 30.0 . 25.0 . 20.0 . 30.0

Compound Relations Using and, or, and Parentheses

Often, you want some combination of things to be true. You can combine relations within a conditional using and and or. Continuing the example above, suppose you have

mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
velocity = [10.00, 20.00, 30.00, 25.00, 20.00]

i = 0
for i in range(5):
    if mass[i] > 5 and velocity[i] > 20:
        print("Fast heavy object.  Duck!")
    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
        print("Normal traffic")
    elif mass[i] <= 2 and velocity[i] <= 20:
        print("Slow light object.  Ignore it")
    else:
        print("Whoa!  Something is up with the data.  Check it")

Just like with arithmetic, you can and should use parentheses whenever there is possible ambiguity. A good general rule is to always use parentheses when mixing and and or in the same condition. That is, instead of:

if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:

write one of these:

if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):

so it is perfectly clear to a reader (and to Python) what you really mean.

Tracing Execution

What does this program print?

pressure = 71.9
if pressure > 50.0:
    pressure = 25.0
elif pressure <= 50.0:
    pressure = 0.0
print(pressure)

Solution

25.0

Trimming Values

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
    if ____:
        result.append(0)
    else:
        ____
print(result)
[0, 1, 1, 1, 0, 1]

Solution

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value<0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

Initializing

Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

values = [...some test data...]
smallest, largest = None, None
for v in values:
    if ____:
        smallest, largest = v, v
    ____:
        smallest = min(____, v)
        largest = max(____, v)
print(smallest, largest)

What are the advantages and disadvantages of using this method to find the range of the data?

Solution

values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest==None and largest==None:
        smallest, largest = v, v
    else:
        smallest = min(smallest, v)
        largest = max(largest, v)
print(smallest, largest)

Key Points

  • Use if statements to control whether or not a block of code is executed.

  • Conditionals are often used inside loops.

  • Use else to execute a block of code when an if condition is not true.

  • Use elif to specify additional tests.

  • Conditions are tested once, in order.

  • Create a table showing variables’ values to trace a program’s execution.


Programming Style

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How can I make my programs more readable?

  • How do most programmers format their code?

  • How can programs check their own operation?

Objectives
  • Provide sound justifications for basic rules of coding style.

  • Refactor one-page programs to make them more readable and justify the changes.

  • Use Python community coding standards (PEP-8).

Coding style

Coding style helps us to understand the code better. It helps to maintain and change the code. Python relies strongly on coding style, as we may notice by the indentation we apply to lines to define different blocks of code. Python proposes a standard style through one of its first Python Enhancement Proposals (PEP), PEP8, and highlight the importance of readability in the Zen of Python.

We may highlight some points:

Follow standard Python style in your code.

Use assertions to check for internal errors.

Assertions are a simple, but powerful method for making sure that the context in which your code is executing is as you expect.

def calc_bulk_density(mass, volume):
    '''Return dry bulk density = powder mass / powder volume.'''
    assert volume > 0
    return mass / volume

If the assertion is False, the Python interpreter raises an AssertionError runtime exception. The source code for the expression that failed will be displayed as part of the error message. To ignore assertions in your code run the interpreter with the ‘-O’ (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment.

Use docstrings to provide online help.

def average(values):
    "Return average of values, or None if no values are supplied."

    if len(values) == 0:
        return None
    return sum(values) / average(values)

help(average)
Help on function average in module __main__:

average(values)
    Return average of values, or None if no values are supplied.

Multiline Strings

Often use multiline strings for documentation. These start and end with three quote characters (either single or double) and end with three matching characters.

"""This string spans
multiple lines.

Blank lines are allowed."""

What Will Be Shown?

Highlight the lines in the code below that will be available as online help. Are there lines that should be made available, but won’t be? Will any lines produce a syntax error or a runtime error?

"Find maximum edit distance between multiple sequences."
# This finds the maximum distance between all sequences.

def overall_max(sequences):
    '''Determine overall maximum edit distance.'''

    highest = 0
    for left in sequences:
        for right in sequences:
            '''Avoid checking sequence against itself.'''
            if left != right:
                this = edit_distance(left, right)
                highest = max(highest, this)

    # Report.
    return highest

Document This

Turn the comment on the following function into a docstring and check that help displays it properly.

def middle(a, b, c):
    # Return the middle value of three.
    # Assumes the values can actually be compared.
    values = [a, b, c]
    values.sort()
    return values[1]

Solution

def middle(a, b, c):
    '''Return the middle value of three.
    Assumes the values can actually be compared.'''
    values = [a, b, c]
    values.sort()
    return values[1]

Clean Up This Code

  1. Read this short program and try to predict what it does.
  2. Run it: how accurate was your prediction?
  3. Refactor the program to make it more readable. Remember to run it after each change to ensure its behavior hasn’t changed.
  4. Compare your rewrite with your neighbor’s. What did you do the same? What did you do differently, and why?
n = 10
s = 'et cetera'
print(s)
i = 0
while i < n:
    # print('at', j)
    new = ''
    for j in range(len(s)):
        left = j-1
        right = (j+1)%len(s)
        if s[left]==s[right]: new += '-'
        else: new += '*'
    s=''.join(new)
    print(s)
    i += 1

Solution

Here’s one solution.

def string_machine(input_string, iterations):
    """
    Takes input_string and generates a new string with -'s and *'s
    corresponding to characters that have identical adjacent characters
    or not, respectively.  Iterates through this procedure with the resultant
    strings for the supplied number of iterations.
    """
    print(input_string)
    old = input_string
    for i in range(iterations):
        new = ''
        # iterate through characters in previous string
        for j in range(len(input_string)):
            left = j-1
            right = (j+1)%len(input_string) # ensure right index wraps around
            if old[left]==old[right]:
                new += '-'
            else:
                new += '*'
        print(new)
        # store new string as old
        old = new

string_machine('et cetera', 10)
et cetera
*****-***
----*-*--
---*---*-
--*-*-*-*
**-------
***-----*
--**---**
*****-***
----*-*--
---*---*-

Key Points

  • Follow standard Python style in your code.

  • Use docstrings to provide online help.


Fitting data to models

Overview

Teaching: 50 min
Exercises: 10 min
Questions
  • How do I fit my data to a scientific model.

Objectives
  • Import the scipy.optimize library.

  • Understand the curvefit function.

  • Print the results from curvefit.

  • Plot the data from curvefit.

Data analysis with Python

Many physical systems can be modeled as an equation, which in Python would be represented by a function $f$. If an appropriate function $f$ can be found for an experiment we can use the equation to determine physical parameters releted to the experiment, and we can use this new model to predict new things about the world. Galileo used this method to calculate the trajectory of canonballs by rolling them down inclined ramps.

In experimental physics, we constrain these models by designing an experiment with two quantities. The first quantity, that we can control, is the independent variable. The second quantity, that we can measure, is the dependent variable. The relationship between these two quantities can then be used to determine some physical parameters.

A simple example of measuring the path of moving object. We could guess that the model is moving at a constant speed and design an experiment to find that speed using the model:

[s = ut]

Scipy provides functions that can fit model functions to data.

Scipy provides a number of functions that, given a suitable model function, can return the best estimate of the unknown parameters in the model.

Consider the experiment where the time of flight of an object moving at constant speed is measured. If the experiment is correctly setup. The unknown variable we are trying to determine is the speed $u$. The remaining variables are time $t$ and height $s$. We can design two different experiments, one where we control time (measuring at a fixed interval) and measure distance, or one where we control distance and measure time.

In Python the model function might be written as:

def distance(time, speed):
    """Calculate the distance travelled at a constant speed for a known time."""
    return speed * time

def model_function(independent_data, parameter1, parameter2):
    """A template for the model function."""
    dependent_data = parameter1 * independent_data + parameter2
    return dependent_data
#control time, measure distance
import numpy
# derr is my estimate of errors measuring distance, my ruler is bad.
derr = 5 # metres
measured_times =numpy.arange(10,100,10) #time in seconds
measured_distances = numpy.array([ 108.2,  220.4,  360.2,  482.8,
        630.6,  793.9,  947.5, 1125.0, 1314.9]) # distance in metres
distance_errors = numpy.ones_like(measured_distances)*derr

For such a simple model, the average speed can be calculated from the data quite easily.

speeds = numpy.diff(measured_distances) / numpy.diff(measured_times)
average_speed = numpy.average(speeds)
print("Average speed is {:.04g} m/s".format(average_speed))
mean_times_error = numpy.std(speeds, ddof=1)/numpy.sqrt(speeds.size)
mean_times_std = numpy.sqrt( numpy.mean( derr**2 * numpy.ones(speeds.size)) )

#error propagation, sum in quadrature
speed_error = numpy.sqrt( numpy.mean( (distance_errors / measured_distances)**2) )* average_speed
print("Standard error in average speed is {:.03g} m/s".format(mean_times_error))
print("Error in average speed is {:.03g} m/s".format(speed_error))
Average speed is 15.08 m/s
Standard error in average speed is 0.928 m/s
Error in average speed is 0.281 m/s
# Copied here to make it easier to find!
def distance(time, speed):
    """Calculate the distance travelled at a constant speed for a known time."""
    return speed * time
from scipy.optimize import curve_fit

popt, pcov = curve_fit(distance, measured_times, measured_distances)

print("Speed is %4g m/s" % popt[0])

pvar = numpy.diag(pcov)
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))
    Speed is 13.6645 m/s
    Error in fitted speed is 0.31 m/s

What is popt, pvar?

Exercise 1

Predict the value of distance at after 10 seconds and 100s.

Calculate predictions using the model function

# Always predict with the model function!
d10 = distance(10, popt[0])
d100 = distance(100, popt[0])
print("After 10 seconds, predicted distance = {:.4g}m".format(d10))
print("After 100 seconds, predicted distance = {:.4g}m".format(d100))

#dont_do_this
rewrite10 = popt[0] * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(rewrite10))

#or this
hardcoded10 = 13.64 * 10
print("After 10 seconds, predicted distance = {:.4g}m".format(hardcoded10))
    After 10 seconds, predicted distance = 136.6m
    After 100 seconds, predicted distance = 1366m
    After 10 seconds, predicted distance = 136.6m
    After 10 seconds, predicted distance = 136.4m
popt, pcov = curve_fit(distance, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)
pvar = numpy.diag(pcov)

print("Average speed is {:.04g} m/s".format(popt[0]))
print("Error in fitted speed is {:.03g} m/s".format(numpy.sqrt(pvar[0])))
    Average speed is 13.66 m/s
    Error in fitted speed is 0.0296 m/s

The model function needs to follow the curve_fit rules

def good_model_function(xdata, parameter_1, parameter_2, parameter_3):
    # code_that_calculates_a_model
    return prediction

curve_fit works with multiple parameters

Extending the above experiment, what if the object was actually accelerating? The model function is now

[s = ut + \frac{1}{2} at^2]

where $a$ is the acceleration. We can change the model function and run the curve_fit code again

def distance_with_acceleration(time, speed, acceleration):
    """Calculate the distance travelled with at a constant speed for a known time
    and constant acceleration."""
    return speed * time + 0.5 * acceleration * time**2

from scipy.optimize import curve_fit
popt2, pcov2 = curve_fit(distance_with_acceleration, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)
print("Initial speed is {:.04g} m/s".format(popt2[0]))
print("Error in fitted initial speed is {:.03g} m/s".format(numpy.sqrt(pcov2[0,0])))

print("Acceleration is {:.04g} m/s2".format(popt2[1]))
print("Error in fitted acceleration is {:.03g} m/s2".format(numpy.sqrt(pcov2[1,1])))
    Initial speed is 10.26 m/s
    Error in fitted initial speed is 0.119 m/s
    Acceleration is 0.09589 m/s2
    Error in fitted acceleration is 0.00325 m/s2

The data use here is fake, generated with an initial speed of 10.86 m/s and an acceleration of 0.1$m/s^2$. The model with constant speed predicted a higher speed to compensate for the acceleration!

Exercise 1

How could we have quickly checked whether our model was good?

A plot would have quickly showed the linear model is not correct, or printing each value predicted might tell us something too for small amounts of data.

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.errorbar(measured_times, measured_distances,yerr=distance_errors, marker='o', linestyle='none', label="measured data")
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds)),label='simple average')
plt.plot(measured_times, distance(measured_times, popt[0]),label='$s=ut$')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1]),label=r'$s=ut+\frac{1}{2}at^2$')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")

png

Always plot your data and model fits.

Plotting residuals

Once you have a model prediction, you can check for problems in the model using a residual plot. Plot the difference between the model prediction and the measured data.

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")
plt.figure(figsize=(8,6))
plt.plot(measured_times, distance(measured_times, numpy.mean(speeds))-measured_distances,label='simple average',marker='o',ls='')
plt.plot(measured_times, distance(measured_times, popt[0])-measured_distances,label='$s=ut$',marker='s',ls='')
plt.plot(measured_times, distance_with_acceleration(measured_times, popt2[0],popt2[1])-measured_distances,label=r'$s=ut+\frac{1}{2}at^2$',marker='<',ls='')
plt.legend(fontsize=14)
plt.xlabel("Time (s)")
plt.ylabel("Distance (m)")

png

In this example you can see that the linear model (simple average or s=ut) deviates from the data in a way that depends on the independent variable (time). If the model fit is good we expect to see differences betwee model and data that are random in magnitude and location (as in the quadratic fit).

curve_fit find the best estimate of the parameters using by minimizing chi squared.

[\chi^2 = \sum\frac{(y_i - f(x_i))^2}{\sigma_{y_i}^2}]

Reduced chi squared is easier to understand and compare between data sets.

def chi2(y_measure,y_predict,errors):
    """Calculate the chi squared value given a measurement with errors and prediction"""
    return numpy.sum( (y_measure - y_predict)**2 / errors**2 )

def chi2reduced(y_measure, y_predict, errors, number_of_parameters):
    """Calculate the reduced chi squared value given a measurement with errors and prediction,
    and knowing the number of parameters in the model."""
    return chi2(y_measure, y_predict, errors)/(y_measure.size - number_of_parameters)
print("Constant velocity model chi2r=",chi2reduced(measured_distances,
                                        distance(measured_times,popt[0]),
                                        distance_errors,
                                        1))

print("Constant acceleration model chi2r=",chi2reduced(measured_distances,
                                        distance_with_acceleration(measured_times,popt2[0],popt2[1]),
                                        distance_errors,
                                        2))
    Constant velocity model chi2r= 109.63987561403505
    Constant acceleration model chi2r= 1.1810606671759618

Chi square values should be rounded to a small number of digits, keeping only 1 or 2 significant figures.

print("Constant velocity model chi2r=",round(chi2reduced(measured_distances,
                                        distance(measured_times,popt[0]),
                                        distance_errors,
                                        1),-1))

print("Constant acceleration model chi2r=",round(chi2reduced(measured_distances,
                                        distance_with_acceleration(measured_times,popt2[0],popt2[1]),
                                        distance_errors,
                                        2),1))
    Constant velocity model chi2r= 110.0
    Constant acceleration model chi2r= 1.2

Exercise 2

Put a print statement inside the model function distance_with_acceleration to print out the parameter values. What is happening to the parameter values?

def distance_with_acceleration_print(time, speed, acceleration):
    """Calculate the distance travelled with at a constant speed for a known time
    and constant acceleration."""

    print ("speed=",speed, "acceleration=",acceleration)
    return speed * time + 0.5 * acceleration * time**2

popt2, pcov2 = curve_fit(distance_with_acceleration_print, measured_times, measured_distances,
                       absolute_sigma=True, sigma = distance_errors)

    speed= 1.0 acceleration= 1.0
    speed= 1.0 acceleration= 1.0
    speed= 1.0 acceleration= 1.0
    speed= 1.0000000149011612 acceleration= 1.0
    speed= 1.0 acceleration= 1.0000000149011612
    speed= 10.257717023193093 acceleration= 0.0958943850247661
    speed= 10.257717176044988 acceleration= 0.0958943850247661
    speed= 10.257717023193093 acceleration= 0.0958943864537038
    speed= 10.257717023531002 acceleration= 0.09589438501373611

Non-linear regression

When fitting non-linear functions, use the p0 keyword to start curve_fit with a good estimate

iteration=0

def nonlinear_function(t, a, b, c,verbose=True):
    global iteration
    if verbose:
        print (iteration, "a=",a, "b=",b, "c=",c)
    iteration = iteration+1
    return a*t**(b-1) + c

#generated "good" data
t=numpy.arange(10)
y=numpy.array([-0.173, 2.12, 9.42, 19.69, 37.16, 59.40, 96.59, 119.448, 158.0,201.9])
sigmaNL = numpy.ones(10)*0.5

First, try fitting the non-linear function with no initial guess

iteration=0
poptNL1, pcovNL1 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL)
    0 a= 1.0 b= 1.0 c= 1.0
    1 a= 1.0 b= 1.0 c= 1.0
    2 a= 1.0 b= 1.0 c= 1.0
    3 a= 1.0000000149011612 b= 1.0 c= 1.0
    4 a= 1.0 b= 1.0000000149011612 c= 1.0
    5 a= 1.0 b= 1.0 c= 1.0000000149011612
    6 a= 77.19199892187382 b= 1.000001167729559 c= 1.0
    7 a= 77.19200007212423 b= 1.000001167729559 c= 1.0
    8 a= 77.19199892187382 b= 1.0000011826307376 c= 1.0
    9 a= 77.19199892187382 b= 1.000001167729559 c= 1.0000000149011612
...
    150 a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734594072738433
    151 a= 2.507417147966414 b= 2.9990317544021594 c= -0.9734594072738433
    152 a= 2.5074171106029874 b= 2.999031799091215 c= -0.9734594072738433
    153 a= 2.5074171106029874 b= 2.9990317544021594 c= -0.9734593927681677
    154 a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605

Try a good guess for the parameters

iteration = 0
poptNL2, pcovNL2 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL, p0=(2.5,3,0))
#I think it's 2.5*t^2 with no offset
    0 a= 2.5 b= 3.0 c= 0.0
    1 a= 2.5 b= 3.0 c= 0.0
    2 a= 2.5 b= 3.0 c= 0.0
    3 a= 2.500000037252903 b= 3.0 c= 0.0
    4 a= 2.5 b= 3.0000000447034836 c= 0.0
    5 a= 2.5 b= 3.0 c= 1.4901161193880158e-08
    6 a= 2.507540116653946 b= 2.9990074809599334 c= -0.973917163330992
    7 a= 2.5075401540192055 b= 2.9990074809599334 c= -0.973917163330992
    8 a= 2.507540116653946 b= 2.9990075256486275 c= -0.973917163330992
    9 a= 2.507540116653946 b= 2.9990074809599334 c= -0.9739171488184953
    10 a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643979860024
    11 a= 2.5074184599976044 b= 2.9990315172382234 c= -0.9734643979860024
    12 a= 2.5074184226341583 b= 2.9990315619272754 c= -0.9734643979860024
    13 a= 2.5074184226341583 b= 2.9990315172382234 c= -0.9734643834802524
    14 a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336

Now try an unreasonable guess for the b parameter

iteration = 0
poptNL3, pcovNL3 = curve_fit(nonlinear_function, t, y,
                       absolute_sigma=True, sigma = sigmaNL, p0=(3,-2,0.1))
#I think it's 3/t^3 +0.1
    0 a= 3.0 b= -2.0 c= 0.1
    1 a= 3.0 b= -2.0 c= 0.1
    2 a= 3.0 b= -2.0 c= 0.1
    3 a= 3.0000000447034836 b= -2.0 c= 0.1
    4 a= 3.0 b= -1.9999999701976776 c= 0.1
    5 a= 3.0 b= -2.0 c= 0.10000000149011612


    <ipython-input-16-1520d182c2d1>:7: RuntimeWarning: divide by zero encountered in power
      return a*t**(b-1) + c
    /Users/lee/anaconda3/lib/python3.8/site-packages/scipy/optimize/minpack.py:828: OptimizeWarning: Covariance of the parameters could not be estimated
      warnings.warn('Covariance of the parameters could not be estimated',

It’s always important to check the fit

plt.figure(figsize=(8,6))
plt.errorbar(t,
             y,
             yerr=sigmaNL, marker='o',ls='none',label="Data")

def plot_and_print(popt,ls,label):
    plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2]),label=label,ls=ls,lw=3)
plot_and_print(poptNL1,"-","No guess")
plot_and_print(poptNL2,"--","good guess")
plot_and_print(poptNL3,":","Bad guess")

plt.legend()
plt.xlabel("Time")
plt.ylabel("Value")
plt.figure(figsize=(8,6))



def plot_residual(data, popt,marker,label):
    plt.plot(t, nonlinear_function(t,popt[0],popt[1],popt[2],verbose=False)-data,label=label,marker=marker,ls='',lw=3)
plot_residual(y,poptNL1,"o","No guess")
plot_residual(y,poptNL2,"s","good guess")
plot_residual(y,poptNL3,"<","Bad guess")

plt.legend()
plt.setp(plt.gca(),ylabel="Residual",xlabel="Time (s)")

    18 a= 2.5074210685973637 b= 2.999031031902325 c= -0.9734725519528605
    19 a= 2.5074209783416057 b= 2.9990310475838156 c= -0.9734720313746336
    20 a= 3.0 b= -2.0 c= 0.1

png

png

Key Points

  • scipy provides tools and functions to fit models to data.

  • Use curve_fit to fit linear and non-linear models to experimental data

  • Use appropriate errors in the sigma keyword to get a better estimate of parameter errors.

  • Check the fit using a plot if possible

  • Check the χ2 value to compare the fit against the errors in the measurements.

  • Non linear models can be fitted, but may need an initial esimate of the parameters.


Wrap-Up

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What have we learned?

  • What else is out there and where do I find it?

Objectives
  • Name and locate scientific Python community sites for software, workshops, and help.

Python supports a large community within and outwith research.

Key Points

  • Python supports a large community within and outwith research.