NumPy Tutorial for Beginners#

0. Introduction#

NumPy (Numerical Python) is a Python library used for numerical computations and working with arrays. It provides a high-performance multidimensional array object, and tools for working with these arrays.

1. Installation and Import#

Install NumPy#

Install using pip:

pip install numpy

Install using conda:

conda install numpy

Install a specific version:

pip install numpy==1.19.5

conda install numpy==1.19.5

Importing Numpy#

import numpy as np

2. Create arrays and their properties#

Create arrays using numpy, and get their properties such as dimension, shape, data type, itemsize, total size, etc.

Initializing NumPy Arrays#

You can create a Numpy array using the np.array() function:

a = np.array([1, 2, 3, 4, 5],dtype=np.int32)
print(a)
[1 2 3 4 5]
b = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(b[0,2])
3.0

More initializing methods#

# All 1s matrix
print(np.ones((4, 2, 2), dtype='int32'))
[[[1 1]
  [1 1]]

 [[1 1]
  [1 1]]

 [[1 1]
  [1 1]]

 [[1 1]
  [1 1]]]
# All 0s matrix
print(np.zeros((2, 3, 3)))
[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]
# Any other number
print(np.full((2, 2), 99))
[[99 99]
 [99 99]]
# Any other number (full_like) - copy shape of another array
print(np.full_like(a, 4))
[4 4 4 4 4]

Random Numbers#

We can also generate random numbers.

# Random decimal numbers
print(np.random.rand(4, 2))
[[0.27214113 0.42470955]
 [0.15039463 0.05081564]
 [0.41403664 0.83663161]
 [0.47059714 0.20739323]]
# Random Integer values (start, end, size)
print(np.random.randint(11, size=(3, 3)))
print(np.random.randint(-4, 8, size=(3, 3)))
[[6 1 7]
 [0 0 6]
 [9 5 6]]
[[ 0 -4  3]
 [-2  6  4]
 [ 5  5  0]]
# Random numbers that follow a normal distribution
print(np.random.randn(100))
[ 0.81908006  0.69837319  1.0273331   0.60686532 -0.32244164 -0.58673988
  0.06515702  0.35431672 -0.96770695 -0.34362351 -0.53119826  0.32151517
 -0.23192672 -1.33286953 -0.09312971 -0.24885004 -0.18813984 -0.22393621
  0.02650239  0.04099948  1.09604832 -0.0033549   0.06026991 -0.39557011
  0.25973791 -0.68697636  1.11028273  1.1003993  -0.85466969 -0.60369915
 -0.05662239  0.08455747  0.12121574 -0.15021884  1.59662858 -1.43942235
  0.00241576 -0.51597818 -0.17797304 -2.39455806 -0.11790954  0.11579165
 -0.75742716  1.65736719 -0.67701451  0.33496297 -0.57323842  0.44953001
 -0.54623695 -0.10131939  0.08132299  0.41048231  0.67084456  0.44777595
  0.441986    0.84863681  0.22713655  1.00297205  0.09047113  0.3898093
  0.03502136 -1.10581106  0.51049216  1.06537323 -1.68371678  1.22707633
  0.47282751 -1.17712218  1.77879117  0.59708234 -0.0627379  -1.38844659
 -1.21025254 -0.51293237  0.28667206  0.83059634 -0.8711549   0.15470931
 -0.88663017  0.13056562 -0.65468049  0.4256023   0.97835658  0.95166436
  0.12898577  0.99132973  1.80608807 -0.97147976 -0.50030263  0.12205409
 -0.92876715 -1.97748152 -2.06540439  0.97965414 -0.16802195 -0.35764946
 -0.11523502 -0.29496799  0.66311985  0.17284526]

Other#

# The identity matrix
print(np.identity(5))
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
# linspace (start, end, number of points)
print(np.linspace(1, 5, 10))
[1.         1.44444444 1.88888889 2.33333333 2.77777778 3.22222222
 3.66666667 4.11111111 4.55555556 5.        ]
# An array with a range of numbers - start, end, step
print(np.arange(1, 5))
print(np.arange(1, 5, 0.5))
[1 2 3 4]
[1.  1.5 2.  2.5 3.  3.5 4.  4.5]
# Repeat an array
arr = np.array([[1, 2, 3]])
r1 = np.repeat(arr, 3, axis=0)
print(r1)
[[1 2 3]
 [1 2 3]
 [1 2 3]]
# Challenge
output = np.ones((5, 5))
print(output)

z = np.zeros((3, 3))
z[1, 1] = 9
print(z)

output[1:4, 1:4] = z
print(output)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[0. 0. 0.]
 [0. 9. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 9. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]
Be careful when copying arrays!!!#
a = np.array([1, 2, 3])
b = a
b[0] = 100
print(a)
[100   2   3]
# Use copy() to avoid this
a = np.array([1, 2, 3])
b = a.copy()
b[0] = 100
print(a)
[1 2 3]

b. Get dimension and shape of the arrays#

# Get Dimension
print(a.ndim)
print(b.ndim)
1
1
# Get shape
print(a.shape)
print(b.shape)
(3,)
(3,)

c. Get the data type, item size, and total size of the arrays.#

# Get Type
print(a.dtype,b.dtype)
int64 int64
# Get Size
print(a.itemsize, b.itemsize)
8 8
# Get total size
print(a.size * a.itemsize, b.size * b.itemsize)
print(a.nbytes, b.nbytes)
24 24
24 24

Hands on practice (5 mins)#

Please install numpy and practice the following basics:

  1. Install and import numpy.

  2. Create 3 different kinds of numpy arrays, e.g. with specific numbers, random, different datatype.

  3. Get the shape, dimension, data type, itemsize and total size of the arrays.

2. Accessing/Changing specific elements, rows, columns, etc.#

a = np.array([[1, 2, 3, 4, 5, 6, 7], [8, 9, 0, 1, 2, 3, 4]])
print(a)
[[1 2 3 4 5 6 7]
 [8 9 0 1 2 3 4]]
# Get a specific element [r, c]
print(a[1, 5])
3
# Get a specific row
print(a[0, :])
[1 2 3 4 5 6 7]
# Get a specific column
print(a[:, 2])
[3 0]
# Get a little more fancy [startindex:endindex:stepsize]
print(a[0, 1:-2:2])
[2 4]
# Change values
a[1, 5] = 20
print(a)

a[:, 2] = [1, 2]
print(a)
[[ 1  2  3  4  5  6  7]
 [ 8  9  0  1  2 20  4]]
[[ 1  2  1  4  5  6  7]
 [ 8  9  2  1  2 20  4]]
# 3D example
b = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(b)
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
# Get specific element (work outside in)
print(b[0, 1, 1])
print(b[:, 1, :])
4
[[3 4]
 [7 8]]
# replace
b[:, 1, :] = [[9, 9], [8, 8]]
print(b)
[[[1 2]
  [9 9]]

 [[5 6]
  [8 8]]]

Practice time (5 mins)#

Please practice the following:

  1. Create different arrays such as all zeros, all nans, fill with a specific number, or random numbers.

  2. Copy an array in different ways and check their differences.

3. Array Operation#

Numpy arrays support all standard arithmetic operations

# Mathematics
a = np.array([1, 2, 3, 4])
print(a)

# Addition
print(a + 2)

# Subtraction
print(a - 2)

# Multiplication
print(a * 2)

# Division
print(a / 2)

# Power
print(a ** 2)

# Take the sin
print(np.sin(a))

# Take the cos
print(np.cos(a))
[1 2 3 4]
[3 4 5 6]
[-1  0  1  2]
[2 4 6 8]
[0.5 1.  1.5 2. ]
[ 1  4  9 16]
[ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]
# Operation on 2 arrays
b = np.array([1, 2, 1, 2])

# Addition
print(a + b)

# Subtraction
print(a - b)

# Multiplication
print(a * b)

# Division
print(a / b)
[2 4 4 6]
[0 0 2 2]
[1 4 3 8]
[1. 1. 3. 2.]
# Linear Algebra
a = np.ones((2, 3))
print(a)

b = np.full((3, 2), 2)
print(b)

# Matrix Multiplication
print(np.matmul(a, b))

# Find the determinant
c = np.identity(3)
print(np.linalg.det(c))
[[1. 1. 1.]
 [1. 1. 1.]]
[[2 2]
 [2 2]
 [2 2]]
[[6. 6.]
 [6. 6.]]
1.0
# Statistics
stats = np.array([[1, 2, 3], [4, 5, 6]])
print(stats)

# Min
print(np.min(stats))

# Max
print(np.max(stats))

# Sum
print(np.sum(stats))

# Axis 0 = columns
print(np.sum(stats, axis=0))

# Axis 1 = rows
print(np.sum(stats, axis=1))
[[1 2 3]
 [4 5 6]]
1
6
21
[5 7 9]
[ 6 15]

b. Reorganizing Arrays (reshape, vstack, hstack)#

# Reorganizing Arrays
before = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(before)

after = before.reshape((4, 2))
print(after)

# Vertically stacking vectors
v1 = np.array([1, 2, 3, 4])
v2 = np.array([5, 6, 7, 8])

print(np.vstack([v1, v2, v1, v2]))

# Horizontal stack
h1 = np.ones((2, 4))
h2 = np.zeros((2, 2))

print(np.hstack((h1, h2)))
[[1 2 3 4]
 [5 6 7 8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[1 2 3 4]
 [5 6 7 8]
 [1 2 3 4]
 [5 6 7 8]]
[[1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 0. 0.]]

Indexing#

You can access elements of a Numpy array using indices:

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
# Indexing with a list of booleans (or a boolean mask)
print(a[[True, False, True, False, True, False, True, False, True]])
print(a[a > 5])
[1 3 5 7 9]
[6 7 8 9]
# You can index with a list in NumPy
print(a[[1, 2, 8]])
[2 3 9]
# Any and All
print(a > 5)
print(np.any(a > 5))
print(np.all(a > 5))
[False False False False False  True  True  True  True]
True
False
# Challenge
filedata = np.genfromtxt('data.txt', delimiter=',')
filedata = filedata.astype('int32')
print(filedata)

# You can index with a list in NumPy

print(filedata[[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]])
print(np.any(filedata > 50, axis=0))
print(np.all(filedata > 50, axis=0))
print((~((filedata > 50) & (filedata < 100))))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[41], line 2
      1 # Challenge
----> 2 filedata = np.genfromtxt('data.txt', delimiter=',')
      3 filedata = filedata.astype('int32')
      4 print(filedata)

File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/npyio.py:1980, in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, ndmin, like)
   1978     fname = os_fspath(fname)
   1979 if isinstance(fname, str):
-> 1980     fid = np.lib._datasource.open(fname, 'rt', encoding=encoding)
   1981     fid_ctx = contextlib.closing(fid)
   1982 else:

File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/_datasource.py:193, in open(path, mode, destpath, encoding, newline)
    156 """
    157 Open `path` with `mode` and return the file object.
    158 
   (...)
    189 
    190 """
    192 ds = DataSource(destpath)
--> 193 return ds.open(path, mode, encoding=encoding, newline=newline)

File ~/miniconda3/envs/pybook/lib/python3.10/site-packages/numpy/lib/_datasource.py:533, in DataSource.open(self, path, mode, encoding, newline)
    530     return _file_openers[ext](found, mode=mode,
    531                               encoding=encoding, newline=newline)
    532 else:
--> 533     raise FileNotFoundError(f"{path} not found.")

FileNotFoundError: data.txt not found.

Slicing#

You can slice Numpy arrays similar to Python lists:

arr = np.array([1, 2, 3, 4, 5])
print(arr[1:4])  # prints [2 3 4]
[2 3 4]

Shape and Reshape#

You can get the shape of an array using the shape attribute and change the shape using reshape function:

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # prints (2, 3)

reshaped_arr = arr.reshape(3, 2)
print(reshaped_arr)
(2, 3)
[[1 2]
 [3 4]
 [5 6]]

Practice Time#

Time: 5 mins

  1. Read the example dataset.

  2. Generate another random array with the same size.

  3. Compare their statistics such as mean, min, max, etc.

  4. Perform operations on these two arrays.

Further Reading#

NumPy is a super useful and basic package in python. There are many free online resources to explore.

For example:

  1. Check their official documentation, which provides very helpful descriptions and examples.

  2. Numpy Tutorial (2022): For Physicists, Engineers, and Mathematicians

  3. Numpy for Machine Learning

  4. NumPy Explained - FUll Course (3 Hrs)