Numpy and Scipy

Overview

Teaching: 40 min
Exercises: 10 min
Questions
  • How do I deal with tabular scientific data?

Objectives
  • Import the numpy library.

  • Understand the NDArray object.

  • Import the numpy library.

  • Get some basic information about a numpy and scipy objects and methods.

Numpy is the main Python library for scientific computation

Import numpy before using it

import numpy as np
import numpy
from numpy import cos

print(numpy.cos, np.cos, cos)
<ufunc 'cos'> <ufunc 'cos'> <ufunc 'cos'>

Use numpy.zeros to create empty arrays

f10 = numpy.zeros(10)
i10 = numpy.zeros(10, dtype=int)
print("default array of zeros: ", f10)
print("integer array of zeros: ", i10)
default array of zeros:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
integer array of zeros:  [0 0 0 0 0 0 0 0 0 0]

Use numpy.ones to create an array of ones.

print("Using numpy.ones    : ", numpy.ones(10))
print("is the same thing as: ", numpy.zeros(10)+1)

Using numpy.ones    :  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
is the same thing as:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Using numpy.arange to generate sets of numbers

numpy.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
numpy.arange(1,10)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
numpy.arange(1,10,2)
array([1, 3, 5, 7, 9])
numpy.arange(1,10,-2)
array([], dtype=int64)
numpy.arange(10,1,-2)
array([10,  8,  6,  4,  2])

Numpy arrays have a shape

a = numpy.arange(10)
print("a's shape is ",a.shape)

b=a.reshape(5,2)
print("b's shape is ",b.shape)
a's shape is  (10,)
b's shape is  (5, 2)

Numpy arrays can be treated like single numbers in arithmetic

a = numpy.arange(5)
b = numpy.arange(5)
print("a=",a)
print("b=",b)
print("a*b=",a*b)
print("a+b=",a+b)
a= [0 1 2 3 4]
b= [0 1 2 3 4]
a*b= [ 0  1  4  9 16]
a+b= [0 2 4 6 8]
c = numpy.ones((5,2))
d = numpy.ones((5,2)) + 100
c+d
array([[102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.],
       [102., 102.]])
e = c.reshape(2,5)
c+e #c and e have different shapes
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-0e32881b9afe> in <module>()
      1 e = c.reshape(2,5)
----> 2 c+e #c and e have different shapes

ValueError: operands could not be broadcast together with shapes (5,2) (2,5) 
---------------------------------------------------------------------------

The Numpy library has many functions that work on arrays

a=numpy.arange(5)
print("a = ", a)
a =  [0 1 2 3 4]
print("sum(a) = ", a.sum())
sum(a) =  10
print("mean(a) = ", a.mean())
mean(a) =  2.0
print("std(a) = ", a.std()) #what is this?
std(a) =  1.4142135623730951
print("np.sin(a) = ", np.sin(a))
np.sin(a) =  [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]

Check the numpy help and webpage for more functions

https://docs.scipy.org/doc/numpy/reference/routines.html

Use the axis keyword to use the function over a subset of the data.

a = numpy.arange(10).reshape(5,2)
print("a=",a)
print("mean(a)="  ,numpy.mean(a))
print("mean(a,0)=",numpy.mean(a,axis=0))
print("mean(a,1)=",numpy.mean(a,axis=1))
a= [[0 1]
    [2 3]
    [4 5]
    [6 7]
    [8 9]]
mean(a)= 4.5
mean(a,0)= [4. 5.]
mean(a,1)= [0.5 2.5 4.5 6.5 8.5]

Use square brackets to access elements in the array

a=numpy.arange(10)
a[5]
5
a[5:10]
array([5, 6, 7, 8, 9])
a[5:] #No second number means "rest of the array"
array([5, 6, 7, 8, 9])
a[:5] #No first number means "from the start of the array"
array([0, 1, 2, 3, 4])
a[5:10:2] #A third number means "every Nth element"
array([5, 7, 9])
a[5:10:-2] #negative numbers mean "count backwards"
array([], dtype=int64)
a[10:5:-2] #but you need to start and stop in the same order
array([9, 7])

Challenge 1

There is an arange function and linspace function, that take similar arguments. Explain the difference. For example, what does the following code do?

print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))

Solution

  • arange takes the arguments start, stop, step, and generates numbers from start to stop (excluding stop) stepping by step each time.
  • linspace takes the arguments start, stop, number, and generates numbers from start to stop (including stop) with number of steps.
print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))
[1. 4. 7.]
[1. 5. 9.]

Challenge 2

Generate a 10 x 3 array of random numbers (using numpy.random.rand). From each row, find the minimum absolute value. Make use of numpy.abs and numpy.min. The result should be a one-dimensional array.

Solution

The important part of the solution is passing the axis keyword to the min function:

a = numpy.random.rand(30).reshape(10,3)
print("a is ", a)
print()
print("min(a) along each row is ", numpy.min( numpy.abs( a ), axis=0))

Use the scipy library for common scientific and numerical methods

Example : integrate y=x^2 from 0 to 10

x = numpy.arange(11)
y = x**2
import scipy.integrate
#by default, trapz assumes the independent variable is a list of integers from 0..N
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333
integral of x^2 from 0 to 10 335.0
x = numpy.linspace(0,10,1000) # finer grid
y=x**2
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 33300.01668335002
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y,x) )#This value should be 10**3/3 = 333.333
integral of x^2 from 0 to 10 333.333500333834

We’ll come back to scipy.optimize later.

Key Points

  • Use the numpy library to get basic statistics out of tabular data.

  • Print numpy arrays.

  • Use mean, sum, std to get summary statistics.

  • Add numpy arrays together.

  • Study the scipy website

  • Use scipy to integrate tabular data.