NumPy Master Hub + Cheatsheets (v2.x)
The definitive, bookmarkable NumPy resource: arrays, dtypes, broadcasting, indexing, linear algebra, random, performance, interop, and troubleshooting.
Quickstart: Arrays, Shapes, Basics
import numpy as np
# Create arrays
arr = np.array([1][2], dtype=np.int64)
Z = np.zeros((2,3), dtype=np.float64)
O = np.ones((3,3))
I = np.eye(3) # identity
R = np.arange(0, 10, 2) # 0..8 step 2
L = np.linspace(0, 1, 5) # 0. ,0.25, ..,1.
# Shapes & dtypes
arr.shape, arr.ndim, arr.dtype
arr.astype(np.float32)
# Elementwise ops (vectorized)
x = np.array()
y = np.array([1][2])
x + y, x * y, x / y, x**2
# Aggregations
M = np.arange(12).reshape(3,4)
M.sum(), M.mean(axis=0), M.max(axis=1)
# Boolean masks
mask = M % 2 == 0
evens = M[mask]
Prefer NumPy arrays for numeric workloads. Use
dtype
explicitly in performance‑sensitive code to control memory and speed.
Indexing, Slicing, Reshaping, Broadcasting
Indexing & Slicing
M = np.arange(12).reshape(3,4)
M[1] # single element
M[:, 0] # first column
M[1, :] # second row
M[::2, ::-1] # stride rows, reverse cols
M[[2], [1]] # advanced indexing (pairs)
M[np.ix_([2],[1])] # outer indexing
Reshape & Views
a = np.arange(6) # shape (6,)
a2 = a.reshape(2,3) # may be a view (no copy)
f = a2.flatten() # copy
r = a2.ravel() # view when possible
b = a2.T # transposed view
Views share memory; copies don’t. Modifying a view affects the original. Use
.copy()
when isolation is needed.
Broadcasting Guide
# Shapes align from right to left:
# (3, 1) and (1, 4) -> (3, 4)
A = np.arange(3).reshape(3,1) # [,[1],[2]]
B = np.arange(4).reshape(1,4) # [[1][2]]
C = A + B # shape (3,4)
Stacking & Concatenation
x = np.array([1][2]); y = np.array()
np.concatenate([x,y]) # [1 2 3 4 5 6]
np.stack([x,y], axis=0) # [[1 2 3],[4 5 6]]
np.vstack([x,y]) # 2x3
np.hstack([x,y]) # 1x6
Dtype Cheatsheet
Use case | dtype | Notes |
---|---|---|
Precise floats (most) | float64 | Default precision in scientific computing |
Memory‑save large arrays | float32/int32 | Halves memory; check precision impact |
Binary masks | bool | Use vectorized boolean ops |
Categorical labels | int8/int16 | Compact when label space is small |
Linear Algebra: Copy‑Paste Recipes
M = np.array([[3.,2.],[1.,4.]])
v = np.array([1.,2.])
M @ v # matrix-vector
M @ M # matrix-matrix
np.linalg.det(M) # determinant
np.linalg.inv(M) # inverse (avoid in numeric pipelines if possible)
w, V = np.linalg.eig(M) # eigenvalues/vectors
U, S, VT = np.linalg.svd(M) # singular value decomposition
x = np.linalg.solve(M, v) # solve Mx=v (prefer over inv(M)@v)
Prefer
np.linalg.solve
and decompositions to explicit inversion. It’s more stable and often faster.
Random & Sampling (Generator API)
# Modern API: create a Generator; avoid global legacy RandomState
rng = np.random.default_rng(seed=42)
rng.integers(0, 10, size=(2,3))
rng.normal(loc=0, scale=1, size=1000)
rng.choice([1][2], size=5, replace=True, p=[0.7,0.2,0.1])
rng.shuffle(arr) # in-place shuffle
Keep a dedicated
Generator
per component for reproducibility and parallel safety.
Fast I/O: CSV, NPZ, Memory‑Mapped
# Binary formats: fastest for NumPy
np.save("array.npy", arr) # single array
np.savez_compressed("bundle.npz", a=A, b=B) # multiple arrays
# Text/CSV (small/medium)
np.savetxt("data.csv", A, delimiter=",", fmt="%.6f")
# Load
A2 = np.load("array.npy")
bundle = np.load("bundle.npz")
A3 = bundle["a"]
# Memory‑mapped (huge arrays, low RAM)
mm = np.memmap("big.dat", dtype=np.float32, mode="r", shape=(10000,10000))
mean = mm.mean() # OS streams pages; no full load
For tabular workflows, consider interop with Pandas for CSV/Parquet, then convert to NumPy via
.to_numpy()
when needed.
Interoperability: Pandas, SciPy, Plotly
Pandas
import pandas as pd
df = pd.read_csv("data.csv")
A = df[["x","y","z"]].to_numpy(dtype=np.float64, copy=False)
df["magnitude"] = np.sqrt((A**2).sum(axis=1))
SciPy
import scipy.signal as sig
x = np.linspace(0,1,500)
y = np.sin(2*np.pi*10*x)
ys = sig.savgol_filter(y, window_length=31, polyorder=3)
Plotly
import plotly.express as px
import pandas as pd
df = pd.DataFrame({"x":np.arange(100), "y":np.sin(np.arange(100)/10)})
fig = px.line(df, x="x", y="y", title="Sine")
fig.write_html("chart.html")
Performance: Vectorization, Strides, Memory
Vectorize and Batch
# Slow (Python loop)
out = np.empty_like(x)
for i in range(len(x)):
out[i] = x[i]*x[i] + y[i]
# Fast (vectorized)
out = x*x + y
Broadcast Instead of Tile
# Avoid materializing large intermediate arrays via tile/repeat.
A = np.random.rand(1000,1)
B = np.random.rand(1,1000)
C = A + B # broadcasted (no giant copies)
Memory & Views
# Use views when possible
B = A.reshape(-1, 1) # likely view
C = A[::2] # strided view (every other element)
# Force copy when isolating
C_iso = np.array(C, copy=True)
Choose Dtypes Wisely
# Halve memory with float32 (check accuracy)
A32 = A.astype(np.float32, copy=False)
Profile with
%timeit
(IPython), cProfile
, or scalene
. For CPU‑heavy kernels, consider numba
or numexpr
.
Troubleshooting & Gotchas
Shape Mismatch
Symptom: ValueError: operands could not be broadcast together
# Inspect shapes before ops
print(A.shape, B.shape)
# Expand dims
B2 = np.expand_dims(B, axis=0) # or axis=1
# Or use np.newaxis: B[np.newaxis, :]
Unintended In‑Place Mutation
# a view modifies source
B = A.T
B = 99 # A changes too
# Fix: explicit copy
B = A.T.copy()
Precision Loss
# Casting float64 -> float32 lowers precision
A32 = A.astype(np.float32)
# Fix: stay in float64 for sensitive calculations; cast late at I/O boundaries
Slow Python Loops
# Replace loops with vectorized ufuncs or broadcasting
# If custom kernel needed:
import numba as nb
@nb.njit
def kernel(x,y,out):
for i in range(x.size):
out[i] = x[i]*x[i] + y[i]
Be cautious with chained advanced indexing (it often returns a copy). Mutations may not affect the original array.
Frequently Asked Questions
When should I use float32 vs float64?
Use float64 for numeric stability in analytics and modeling; float32 when memory is tight and accuracy requirements are modest (e.g., large images, embeddings, or when pushing GPU memory limits).
How do I safely normalize large arrays?
# Avoid nan/inf with epsilon
eps = 1e-12
norm = np.sqrt((A*A).sum(axis=1, keepdims=True)) + eps
A_norm = A / norm
Best way to compute pairwise distances?
# For moderate sizes:
from numpy.linalg import norm
D = norm(A[:,None,:] - B[None,:,:], axis=2)
# For larger tasks, consider sklearn.metrics.pairwise or faiss.
How do I ensure reproducible randomness?
rng = np.random.default_rng(0) # single seed per component/module
If this page helped, consider linking to it from your “Resources,” “Data Science 101,” or “Numerical Computing” sections. Sharing supports more free content like this.
© 2025 Pythoneo · Designed to be highly linkable: comprehensive, evergreen, copy‑paste friendly, with performance and troubleshooting built‑in.