NumPy Master Hub + Cheatsheets (v2.x) | Pythoneo

NumPy Master Hub + Cheatsheets (v2.x)

The definitive, bookmarkable NumPy resource: arrays, dtypes, broadcasting, indexing, linear algebra, random, performance, interop, and troubleshooting.

Updated: 2025‑08‑20
Covers: NumPy 1.24 → 2.x
License: CC BY 4.0

Quickstart
Cheatsheets
Performance

Start

Quickstart: Arrays, Shapes, Basics

import numpy as np

# Create arrays
arr = np.array([1][2], dtype=np.int64)
Z = np.zeros((2,3), dtype=np.float64)
O = np.ones((3,3))
I = np.eye(3)                     # identity
R = np.arange(0, 10, 2)           # 0..8 step 2
L = np.linspace(0, 1, 5)          # 0. ,0.25, ..,1.

# Shapes & dtypes
arr.shape, arr.ndim, arr.dtype
arr.astype(np.float32)

# Elementwise ops (vectorized)
x = np.array()
y = np.array([1][2])
x + y, x * y, x / y, x**2

# Aggregations
M = np.arange(12).reshape(3,4)
M.sum(), M.mean(axis=0), M.max(axis=1)

# Boolean masks
mask = M % 2 == 0
evens = M[mask]

Prefer NumPy arrays for numeric workloads. Use dtype explicitly in performance‑sensitive code to control memory and speed.

Cheatsheets

Indexing, Slicing, Reshaping, Broadcasting

Indexing & Slicing

M = np.arange(12).reshape(3,4)
M[1]          # single element
M[:, 0]          # first column
M[1, :]          # second row
M[::2, ::-1]     # stride rows, reverse cols
M[[2], [1]]  # advanced indexing (pairs)
M[np.ix_([2],[1])]  # outer indexing

Reshape & Views

a = np.arange(6)        # shape (6,)
a2 = a.reshape(2,3)      # may be a view (no copy)
f = a2.flatten()         # copy
r = a2.ravel()           # view when possible
b = a2.T                 # transposed view

Views share memory; copies don’t. Modifying a view affects the original. Use .copy() when isolation is needed.

Broadcasting Guide

# Shapes align from right to left:
# (3, 1) and (1, 4) -> (3, 4)
A = np.arange(3).reshape(3,1)     # [,[1],[2]]
B = np.arange(4).reshape(1,4)     # [[1][2]]
C = A + B                         # shape (3,4)

Stacking & Concatenation

x = np.array([1][2]); y = np.array()
np.concatenate([x,y])         # [1 2 3 4 5 6]
np.stack([x,y], axis=0)       # [[1 2 3],[4 5 6]]
np.vstack([x,y])              # 2x3
np.hstack([x,y])              # 1x6

Dtype Cheatsheet

Use case	dtype	Notes
Precise floats (most)	float64	Default precision in scientific computing
Memory‑save large arrays	float32/int32	Halves memory; check precision impact
Binary masks	bool	Use vectorized boolean ops
Categorical labels	int8/int16	Compact when label space is small

Linear Algebra

Linear Algebra: Copy‑Paste Recipes

M = np.array([[3.,2.],[1.,4.]])
v = np.array([1.,2.])
M @ v                 # matrix-vector
M @ M                 # matrix-matrix
np.linalg.det(M)      # determinant
np.linalg.inv(M)      # inverse (avoid in numeric pipelines if possible)
w, V = np.linalg.eig(M)     # eigenvalues/vectors
U, S, VT = np.linalg.svd(M) # singular value decomposition
x = np.linalg.solve(M, v)   # solve Mx=v (prefer over inv(M)@v)

Prefer np.linalg.solve and decompositions to explicit inversion. It’s more stable and often faster.

Random

Random & Sampling (Generator API)

# Modern API: create a Generator; avoid global legacy RandomState
rng = np.random.default_rng(seed=42)
rng.integers(0, 10, size=(2,3))
rng.normal(loc=0, scale=1, size=1000)
rng.choice([1][2], size=5, replace=True, p=[0.7,0.2,0.1])
rng.shuffle(arr)      # in-place shuffle

Keep a dedicated Generator per component for reproducibility and parallel safety.

I/O

Fast I/O: CSV, NPZ, Memory‑Mapped

# Binary formats: fastest for NumPy
np.save("array.npy", arr)                       # single array
np.savez_compressed("bundle.npz", a=A, b=B)    # multiple arrays

# Text/CSV (small/medium)
np.savetxt("data.csv", A, delimiter=",", fmt="%.6f")

# Load
A2 = np.load("array.npy")
bundle = np.load("bundle.npz")
A3 = bundle["a"]

# Memory‑mapped (huge arrays, low RAM)
mm = np.memmap("big.dat", dtype=np.float32, mode="r", shape=(10000,10000))
mean = mm.mean()     # OS streams pages; no full load

For tabular workflows, consider interop with Pandas for CSV/Parquet, then convert to NumPy via .to_numpy() when needed.

Interop

Interoperability: Pandas, SciPy, Plotly

Pandas

import pandas as pd
df = pd.read_csv("data.csv")
A = df[["x","y","z"]].to_numpy(dtype=np.float64, copy=False)
df["magnitude"] = np.sqrt((A**2).sum(axis=1))

SciPy

import scipy.signal as sig
x = np.linspace(0,1,500)
y = np.sin(2*np.pi*10*x)
ys = sig.savgol_filter(y, window_length=31, polyorder=3)

Plotly

import plotly.express as px
import pandas as pd
df = pd.DataFrame({"x":np.arange(100), "y":np.sin(np.arange(100)/10)})
fig = px.line(df, x="x", y="y", title="Sine")
fig.write_html("chart.html")

Speed

Performance: Vectorization, Strides, Memory

Vectorize and Batch

# Slow (Python loop)
out = np.empty_like(x)
for i in range(len(x)):
    out[i] = x[i]*x[i] + y[i]
# Fast (vectorized)
out = x*x + y

Broadcast Instead of Tile

# Avoid materializing large intermediate arrays via tile/repeat.
A = np.random.rand(1000,1)
B = np.random.rand(1,1000)
C = A + B   # broadcasted (no giant copies)

Memory & Views

# Use views when possible
B = A.reshape(-1, 1)   # likely view
C = A[::2]             # strided view (every other element)

# Force copy when isolating
C_iso = np.array(C, copy=True)

Choose Dtypes Wisely

# Halve memory with float32 (check accuracy)
A32 = A.astype(np.float32, copy=False)

Profile with %timeit (IPython), cProfile, or scalene. For CPU‑heavy kernels, consider numba or numexpr.

Fix

Troubleshooting & Gotchas

Shape Mismatch

Symptom: ValueError: operands could not be broadcast together

# Inspect shapes before ops
print(A.shape, B.shape)
# Expand dims
B2 = np.expand_dims(B, axis=0)  # or axis=1
# Or use np.newaxis: B[np.newaxis, :]

Unintended In‑Place Mutation

# a view modifies source
B = A.T
B = 99     # A changes too
# Fix: explicit copy
B = A.T.copy()

Precision Loss

# Casting float64 -> float32 lowers precision
A32 = A.astype(np.float32)
# Fix: stay in float64 for sensitive calculations; cast late at I/O boundaries

Slow Python Loops

# Replace loops with vectorized ufuncs or broadcasting
# If custom kernel needed:
import numba as nb
@nb.njit
def kernel(x,y,out):
    for i in range(x.size):
        out[i] = x[i]*x[i] + y[i]

Be cautious with chained advanced indexing (it often returns a copy). Mutations may not affect the original array.

FAQ

Frequently Asked Questions

When should I use float32 vs float64?

Use float64 for numeric stability in analytics and modeling; float32 when memory is tight and accuracy requirements are modest (e.g., large images, embeddings, or when pushing GPU memory limits).

How do I safely normalize large arrays?

# Avoid nan/inf with epsilon
eps = 1e-12
norm = np.sqrt((A*A).sum(axis=1, keepdims=True)) + eps
A_norm = A / norm

Best way to compute pairwise distances?

# For moderate sizes:
from numpy.linalg import norm
D = norm(A[:,None,:] - B[None,:,:], axis=2)
# For larger tasks, consider sklearn.metrics.pairwise or faiss.

How do I ensure reproducible randomness?

rng = np.random.default_rng(0)  # single seed per component/module

If this page helped, consider linking to it from your “Resources,” “Data Science 101,” or “Numerical Computing” sections. Sharing supports more free content like this.