Reading Tabular Data into arrays

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can I read tabular data?

  • How can I save tabular data?

Objectives
  • Import the numpy library.

  • Use numpy to load a simple CSV data set.

  • Get some basic information about a numpy array.

Use the Numpy package to load data using the loadtxt command

import numpy
data = numpy.loadtxt('data/galileo_flat.empty')
print(data)
[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Read a comma separated file of data with headers

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
print(data)
[[1500. 1000.]
 [1340.  828.]
 [1328.  800.]
 [1172.  600.]
 [ 800.  300.]]

Remember your data has the shape ROWS X COLUMNS

print("data shape is ",data.shape)
data shape is  (5, 2)

Split the data into variables using unpack

D,H = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',unpack=True)
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)
data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',')
D,H = data.T
print(D,H)
print("D shape is ",D.shape)
print("H shape is ",H.shape)
[1500. 1340. 1328. 1172.  800.] [1000.  828.  800.  600.  300.]
D shape is  (5,)
H shape is  (5,)

Save data with numpy.savetxt

Saving text data is made possible with the savetxt command. It mirrors the loadtxt command

numpy.savetxt("data/mydata.txt", data, delimiter=',')
1.500000000000000000e+03,1.000000000000000000e+03
1.340000000000000000e+03,8.280000000000000000e+02
1.328000000000000000e+03,8.000000000000000000e+02
1.172000000000000000e+03,6.000000000000000000e+02
8.000000000000000000e+02,3.000000000000000000e+02

Control the data format with the fmt keyword

numpy.savetxt("data/mydata2.txt", data, delimiter=',', fmt='%.6g')
1500,1000
1340,828
1328,800
1172,600
800,300

Add a header string with header

header="Distance (D), Header(H)"
newdata = numpy.vstack([D,H]).T
numpy.savetxt("data/mydata3.txt", newdata, delimiter=', ', header=header, fmt='%.6g')
# Distance (D), Header(H)
1500, 1000
1340, 828
1328, 800
1172, 600
800, 300

More complex loadtxt commands can make your data more flexible

data = numpy.loadtxt('data/galileo_flat.csv', comments="#", skiprows=2, delimiter=',',\
                    dtype={'names':("Distance","Height"), 'formats':('f4','f4')})
print("data shape is ", data.shape)
print("Distance data is ", data["Distance"])
data shape is  (5,)
Distance data is  [1500. 1340. 1328. 1172.  800.]

Key Points

  • Use numpy.loadtxt library to load tabular data.

  • Use numpy.savetxt library to save tabular data.

  • Use delimiters to make your text file cleaner.

  • Use comments in your file to describe the contents.