NumPy Tutorial for Beginners#
0. Introduction#
NumPy (Numerical Python) is a Python library used for numerical computations and working with arrays. It provides a high-performance multidimensional array object, and tools for working with these arrays.
1. Installation and Import#
Install NumPy#
Install using pip:
pip install numpy
Install using conda:
conda install numpy
Install a specific version:
pip install numpy==1.19.5
conda install numpy==1.19.5
Importing Numpy#
import numpy as np
2. Create arrays and their properties#
Create arrays using numpy, and get their properties such as dimension, shape, data type, itemsize, total size, etc.
Initializing NumPy Arrays#
You can create a Numpy array using the np.array()
function:
a = np.array([1, 2, 3, 4, 5],dtype=np.int32)
print(a)
[1 2 3 4 5]
b = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(b[0,2])
3.0
More initializing methods#
# All 1s matrix
print(np.ones((4, 2, 2), dtype='int32'))
[[[1 1]
[1 1]]
[[1 1]
[1 1]]
[[1 1]
[1 1]]
[[1 1]
[1 1]]]
# All 0s matrix
print(np.zeros((2, 3, 3)))
[[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]
# Any other number
print(np.full((2, 2), 99))
[[99 99]
[99 99]]
# Any other number (full_like) - copy shape of another array
print(np.full_like(a, 4))
[4 4 4 4 4]
Random Numbers#
We can also generate random numbers.
# Random decimal numbers
print(np.random.rand(4, 2))
[[0.27214113 0.42470955]
[0.15039463 0.05081564]
[0.41403664 0.83663161]
[0.47059714 0.20739323]]
# Random Integer values (start, end, size)
print(np.random.randint(11, size=(3, 3)))
print(np.random.randint(-4, 8, size=(3, 3)))
[[6 1 7]
[0 0 6]
[9 5 6]]
[[ 0 -4 3]
[-2 6 4]
[ 5 5 0]]
# Random numbers that follow a normal distribution
print(np.random.randn(100))
[ 0.81908006 0.69837319 1.0273331 0.60686532 -0.32244164 -0.58673988
0.06515702 0.35431672 -0.96770695 -0.34362351 -0.53119826 0.32151517
-0.23192672 -1.33286953 -0.09312971 -0.24885004 -0.18813984 -0.22393621
0.02650239 0.04099948 1.09604832 -0.0033549 0.06026991 -0.39557011
0.25973791 -0.68697636 1.11028273 1.1003993 -0.85466969 -0.60369915
-0.05662239 0.08455747 0.12121574 -0.15021884 1.59662858 -1.43942235
0.00241576 -0.51597818 -0.17797304 -2.39455806 -0.11790954 0.11579165
-0.75742716 1.65736719 -0.67701451 0.33496297 -0.57323842 0.44953001
-0.54623695 -0.10131939 0.08132299 0.41048231 0.67084456 0.44777595
0.441986 0.84863681 0.22713655 1.00297205 0.09047113 0.3898093
0.03502136 -1.10581106 0.51049216 1.06537323 -1.68371678 1.22707633
0.47282751 -1.17712218 1.77879117 0.59708234 -0.0627379 -1.38844659
-1.21025254 -0.51293237 0.28667206 0.83059634 -0.8711549 0.15470931
-0.88663017 0.13056562 -0.65468049 0.4256023 0.97835658 0.95166436
0.12898577 0.99132973 1.80608807 -0.97147976 -0.50030263 0.12205409
-0.92876715 -1.97748152 -2.06540439 0.97965414 -0.16802195 -0.35764946
-0.11523502 -0.29496799 0.66311985 0.17284526]
Other#
# The identity matrix
print(np.identity(5))
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
# linspace (start, end, number of points)
print(np.linspace(1, 5, 10))
[1. 1.44444444 1.88888889 2.33333333 2.77777778 3.22222222
3.66666667 4.11111111 4.55555556 5. ]
# An array with a range of numbers - start, end, step
print(np.arange(1, 5))
print(np.arange(1, 5, 0.5))
[1 2 3 4]
[1. 1.5 2. 2.5 3. 3.5 4. 4.5]
# Repeat an array
arr = np.array([[1, 2, 3]])
r1 = np.repeat(arr, 3, axis=0)
print(r1)
[[1 2 3]
[1 2 3]
[1 2 3]]
# Challenge
output = np.ones((5, 5))
print(output)
z = np.zeros((3, 3))
z[1, 1] = 9
print(z)
output[1:4, 1:4] = z
print(output)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
[[0. 0. 0.]
[0. 9. 0.]
[0. 0. 0.]]
[[1. 1. 1. 1. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 9. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 1. 1. 1. 1.]]
Be careful when copying arrays!!!#
a = np.array([1, 2, 3])
b = a
b[0] = 100
print(a)
[100 2 3]
# Use copy() to avoid this
a = np.array([1, 2, 3])
b = a.copy()
b[0] = 100
print(a)
[1 2 3]
b. Get dimension and shape of the arrays#
# Get Dimension
print(a.ndim)
print(b.ndim)
1
1
# Get shape
print(a.shape)
print(b.shape)
(3,)
(3,)
c. Get the data type, item size, and total size of the arrays.#
# Get Type
print(a.dtype,b.dtype)
int64 int64
# Get Size
print(a.itemsize, b.itemsize)
8 8
# Get total size
print(a.size * a.itemsize, b.size * b.itemsize)
print(a.nbytes, b.nbytes)
24 24
24 24
Hands on practice (5 mins)#
Please install numpy and practice the following basics:
Install and import numpy.
Create 3 different kinds of numpy arrays, e.g. with specific numbers, random, different datatype.
Get the shape, dimension, data type, itemsize and total size of the arrays.
2. Accessing/Changing specific elements, rows, columns, etc.#
a = np.array([[1, 2, 3, 4, 5, 6, 7], [8, 9, 0, 1, 2, 3, 4]])
print(a)
[[1 2 3 4 5 6 7]
[8 9 0 1 2 3 4]]
# Get a specific element [r, c]
print(a[1, 5])
3
# Get a specific row
print(a[0, :])
[1 2 3 4 5 6 7]
# Get a specific column
print(a[:, 2])
[3 0]
# Get a little more fancy [startindex:endindex:stepsize]
print(a[0, 1:-2:2])
[2 4]
# Change values
a[1, 5] = 20
print(a)
a[:, 2] = [1, 2]
print(a)
[[ 1 2 3 4 5 6 7]
[ 8 9 0 1 2 20 4]]
[[ 1 2 1 4 5 6 7]
[ 8 9 2 1 2 20 4]]
# 3D example
b = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(b)
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
# Get specific element (work outside in)
print(b[0, 1, 1])
print(b[:, 1, :])
4
[[3 4]
[7 8]]
# replace
b[:, 1, :] = [[9, 9], [8, 8]]
print(b)
[[[1 2]
[9 9]]
[[5 6]
[8 8]]]
Practice time (5 mins)#
Please practice the following:
Create different arrays such as all zeros, all nans, fill with a specific number, or random numbers.
Copy an array in different ways and check their differences.
3. Array Operation#
Numpy arrays support all standard arithmetic operations
# Mathematics
a = np.array([1, 2, 3, 4])
print(a)
# Addition
print(a + 2)
# Subtraction
print(a - 2)
# Multiplication
print(a * 2)
# Division
print(a / 2)
# Power
print(a ** 2)
# Take the sin
print(np.sin(a))
# Take the cos
print(np.cos(a))
[1 2 3 4]
[3 4 5 6]
[-1 0 1 2]
[2 4 6 8]
[0.5 1. 1.5 2. ]
[ 1 4 9 16]
[ 0.84147098 0.90929743 0.14112001 -0.7568025 ]
[ 0.54030231 -0.41614684 -0.9899925 -0.65364362]
# Operation on 2 arrays
b = np.array([1, 2, 1, 2])
# Addition
print(a + b)
# Subtraction
print(a - b)
# Multiplication
print(a * b)
# Division
print(a / b)
[2 4 4 6]
[0 0 2 2]
[1 4 3 8]
[1. 1. 3. 2.]
# Linear Algebra
a = np.ones((2, 3))
print(a)
b = np.full((3, 2), 2)
print(b)
# Matrix Multiplication
print(np.matmul(a, b))
# Find the determinant
c = np.identity(3)
print(np.linalg.det(c))
[[1. 1. 1.]
[1. 1. 1.]]
[[2 2]
[2 2]
[2 2]]
[[6. 6.]
[6. 6.]]
1.0
# Statistics
stats = np.array([[1, 2, 3], [4, 5, 6]])
print(stats)
# Min
print(np.min(stats))
# Max
print(np.max(stats))
# Sum
print(np.sum(stats))
# Axis 0 = columns
print(np.sum(stats, axis=0))
# Axis 1 = rows
print(np.sum(stats, axis=1))
[[1 2 3]
[4 5 6]]
1
6
21
[5 7 9]
[ 6 15]
b. Reorganizing Arrays (reshape, vstack, hstack)#
# Reorganizing Arrays
before = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(before)
after = before.reshape((4, 2))
print(after)
# Vertically stacking vectors
v1 = np.array([1, 2, 3, 4])
v2 = np.array([5, 6, 7, 8])
print(np.vstack([v1, v2, v1, v2]))
# Horizontal stack
h1 = np.ones((2, 4))
h2 = np.zeros((2, 2))
print(np.hstack((h1, h2)))
[[1 2 3 4]
[5 6 7 8]]
[[1 2]
[3 4]
[5 6]
[7 8]]
[[1 2 3 4]
[5 6 7 8]
[1 2 3 4]
[5 6 7 8]]
[[1. 1. 1. 1. 0. 0.]
[1. 1. 1. 1. 0. 0.]]
Indexing#
You can access elements of a Numpy array using indices:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
# Indexing with a list of booleans (or a boolean mask)
print(a[[True, False, True, False, True, False, True, False, True]])
print(a[a > 5])
[1 3 5 7 9]
[6 7 8 9]
# You can index with a list in NumPy
print(a[[1, 2, 8]])
[2 3 9]
# Any and All
print(a > 5)
print(np.any(a > 5))
print(np.all(a > 5))
[False False False False False True True True True]
True
False
# Challenge
filedata = np.genfromtxt('data.txt', delimiter=',')
filedata = filedata.astype('int32')
print(filedata)
# You can index with a list in NumPy
print(filedata[[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]])
print(np.any(filedata > 50, axis=0))
print(np.all(filedata > 50, axis=0))
print((~((filedata > 50) & (filedata < 100))))
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[41], line 2
1 # Challenge
----> 2 filedata = np.genfromtxt('data.txt', delimiter=',')
3 filedata = filedata.astype('int32')
4 print(filedata)
File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/npyio.py:1980, in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, ndmin, like)
1978 fname = os_fspath(fname)
1979 if isinstance(fname, str):
-> 1980 fid = np.lib._datasource.open(fname, 'rt', encoding=encoding)
1981 fid_ctx = contextlib.closing(fid)
1982 else:
File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/_datasource.py:193, in open(path, mode, destpath, encoding, newline)
156 """
157 Open `path` with `mode` and return the file object.
158
(...)
189
190 """
192 ds = DataSource(destpath)
--> 193 return ds.open(path, mode, encoding=encoding, newline=newline)
File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/_datasource.py:533, in DataSource.open(self, path, mode, encoding, newline)
530 return _file_openers[ext](found, mode=mode,
531 encoding=encoding, newline=newline)
532 else:
--> 533 raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: data.txt not found.
Slicing#
You can slice Numpy arrays similar to Python lists:
arr = np.array([1, 2, 3, 4, 5])
print(arr[1:4]) # prints [2 3 4]
[2 3 4]
Shape and Reshape#
You can get the shape of an array using the shape
attribute and change the shape using reshape
function:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # prints (2, 3)
reshaped_arr = arr.reshape(3, 2)
print(reshaped_arr)
(2, 3)
[[1 2]
[3 4]
[5 6]]
Practice Time#
Time: 5 mins
Read the example dataset.
Generate another random array with the same size.
Compare their statistics such as mean, min, max, etc.
Perform operations on these two arrays.
Further Reading#
NumPy is a super useful and basic package in python. There are many free online resources to explore.
For example:
Check their official documentation, which provides very helpful descriptions and examples.
Numpy Tutorial (2022): For Physicists, Engineers, and Mathematicians