# Numpy and Scipy

## Overview

Teaching: 40 min
Exercises: 10 min
Questions
• How do I deal with tabular scientific data?

Objectives
• Import the numpy library.

• Understand the NDArray object.

• Import the numpy library.

• Get some basic information about a numpy and scipy objects and methods.

## Numpy is the main Python library for scientific computation

• Numpy provides a new data type, the array
• arrays are multi-dimensional collections of data of the same intrinsic type (int, float, etc.)

## Import numpy before using it

• numpy is not built in, but is often installed by default.
• use import numpy to import the entire package.
• use from numpy import ... to import some functions.
• use import numpy as np to use the most common alias.
import numpy as np
import numpy
from numpy import cos

print(numpy.cos, np.cos, cos)

<ufunc 'cos'> <ufunc 'cos'> <ufunc 'cos'>


## Use numpy.zeros to create empty arrays

f10 = numpy.zeros(10)
i10 = numpy.zeros(10, dtype=int)
print("default array of zeros: ", f10)
print("integer array of zeros: ", i10)

default array of zeros:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
integer array of zeros:  [0 0 0 0 0 0 0 0 0 0]


## Use numpy.ones to create an array of ones.

print("Using numpy.ones    : ", numpy.ones(10))
print("is the same thing as: ", numpy.zeros(10)+1)


Using numpy.ones    :  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
is the same thing as:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


## Using numpy.arange to generate sets of numbers

• arange takes from one to three arguments. By default arange will generate numbers starting from 0 with a step of 1
• arange(N) generates numbers from 0..N-1
• arange(M,N) generates numbers from M..N-1
• arange(M,N,P) generates numbers from M..N-1 including only ever Pth number.
numpy.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

• generate an array of numbers from 1 to 10
numpy.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

• generate an array of odd numbers from 1 to 10
numpy.arange(1,10,2)

array([1, 3, 5, 7, 9])

• incorrectly generate an array of odd numbers from 1 to 10, backwards
numpy.arange(1,10,-2)

array([], dtype=int64)

• generate an array of even numbers from 10 to 2, backwards
numpy.arange(10,1,-2)

array([10,  8,  6,  4,  2])


## Numpy arrays have a shape

• Numpy arrays have a shape parameter associated with them
• You can change the shape with the reshape method
a = numpy.arange(10)
print("a's shape is ",a.shape)

b=a.reshape(5,2)
print("b's shape is ",b.shape)

a's shape is  (10,)
b's shape is  (5, 2)


## Numpy arrays can be treated like single numbers in arithmetic

• Arithmetic using numpy arrays is element-by-element
• Matrix operations are possible with functions or methods.
• The size and shape of the arrays should match.
a = numpy.arange(5)
b = numpy.arange(5)
print("a=",a)
print("b=",b)
print("a*b=",a*b)
print("a+b=",a+b)

a= [0 1 2 3 4]
b= [0 1 2 3 4]
a*b= [ 0  1  4  9 16]
a+b= [0 2 4 6 8]

c = numpy.ones((5,2))
d = numpy.ones((5,2)) + 100
c+d

array([[102., 102.],
[102., 102.],
[102., 102.],
[102., 102.],
[102., 102.]])

e = c.reshape(2,5)
c+e #c and e have different shapes

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-0e32881b9afe> in <module>()
1 e = c.reshape(2,5)
----> 2 c+e #c and e have different shapes

ValueError: operands could not be broadcast together with shapes (5,2) (2,5)
---------------------------------------------------------------------------


## The Numpy library has many functions that work on arrays

• Aggregation functions like sum,mean,size
a=numpy.arange(5)
print("a = ", a)

a =  [0 1 2 3 4]

• Add all of the elements of the array together.
print("sum(a) = ", a.sum())

sum(a) =  10

• Calculate the average value of the elements in the array.
print("mean(a) = ", a.mean())

mean(a) =  2.0

• Calculate something called std of the array.
print("std(a) = ", a.std()) #what is this?

std(a) =  1.4142135623730951

• Calculate the sin of each element in the array
print("np.sin(a) = ", np.sin(a))

np.sin(a) =  [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]


## Check the numpy help and webpage for more functions

https://docs.scipy.org/doc/numpy/reference/routines.html

## Use the axis keyword to use the function over a subset of the data.

• Many functions take the axis keyword to perform the aggregation of that dimension
a = numpy.arange(10).reshape(5,2)
print("a=",a)
print("mean(a)="  ,numpy.mean(a))
print("mean(a,0)=",numpy.mean(a,axis=0))
print("mean(a,1)=",numpy.mean(a,axis=1))

a= [[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
mean(a)= 4.5
mean(a,0)= [4. 5.]
mean(a,1)= [0.5 2.5 4.5 6.5 8.5]


## Use square brackets to access elements in the array

• Single integers in square brackets returns one element
• ranges of data can be accessed with slices
a=numpy.arange(10)

• Access the fifth element
a

5

• Access elements 5 through 10
a[5:10]

array([5, 6, 7, 8, 9])

• Access elements from 5 to the end of the array
a[5:] #No second number means "rest of the array"

array([5, 6, 7, 8, 9])

• Access all elements from the start of the array to the fifth element.
a[:5] #No first number means "from the start of the array"

array([0, 1, 2, 3, 4])

• Access every 2nd element from the 5th to the 10th
a[5:10:2] #A third number means "every Nth element"

array([5, 7, 9])

• Access every -2nd element from the 5th to the 10th. (incorrect)
a[5:10:-2] #negative numbers mean "count backwards"

array([], dtype=int64)

• Access every -2nd element from the 10th to the 5th. (correct)
a[10:5:-2] #but you need to start and stop in the same order

array([9, 7])


## Challenge 1

There is an arange function and linspace function, that take similar arguments. Explain the difference. For example, what does the following code do?

print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))


## Solution

• arange takes the arguments start, stop, step, and generates numbers from start to stop (excluding stop) stepping by step each time.
• linspace takes the arguments start, stop, number, and generates numbers from start to stop (including stop) with number of steps.
print (numpy.arange(1.,9,3))
print (numpy.linspace(1.,9,3))

[1. 4. 7.]
[1. 5. 9.]


## Challenge 2

Generate a 10 x 3 array of random numbers (using numpy.random.rand). From each row, find the minimum absolute value. Make use of numpy.abs and numpy.min. The result should be a one-dimensional array.

## Solution

The important part of the solution is passing the axis keyword to the min function:

a = numpy.random.rand(30).reshape(10,3)
print("a is ", a)
print()
print("min(a) along each row is ", numpy.min( numpy.abs( a ), axis=0))


## Use the scipy library for common scientific and numerical methods

• scipy contains functions to generate random numbers, calculate Fourier transforms, integrate
• Check the scipy website for more help: https://docs.scipy.org/doc/scipy/reference/

## Example : integrate y=x^2 from 0 to 10

x = numpy.arange(11)
y = x**2
import scipy.integrate
#by default, trapz assumes the independent variable is a list of integers from 0..N
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333

integral of x^2 from 0 to 10 335.0

• Numerical integration can be inprecise with a coarse grid. (this time, incorrectly!)
x = numpy.linspace(0,10,1000) # finer grid
y=x**2
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y) )#This value should be 10**3/3 = 333.333

integral of x^2 from 0 to 10 33300.01668335002

• Passing the x values to trapz allows it to integrate correctly
print("integral of x^2 from 0 to 10", scipy.integrate.trapz(y,x) )#This value should be 10**3/3 = 333.333

integral of x^2 from 0 to 10 333.333500333834


We’ll come back to scipy.optimize later.

## Key Points

• Use the numpy library to get basic statistics out of tabular data.

• Print numpy arrays.

• Use mean, sum, std to get summary statistics.