Skip to main content

std

Compute the standard deviation along the specified axis.
numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, *, 
          where=True, mean=None, correction=None)

Parameters

a
array_like
Calculate the standard deviation of these values.
axis
None or int or tuple of ints
Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
dtype
dtype
Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.
out
ndarray
Alternative output array in which to place the result.
ddof
int or float
default:"0"
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
keepdims
bool
default:"False"
If True, the axes which are reduced are left in the result as dimensions with size one.
where
array_like of bool
Elements to include in the standard deviation.
mean
array_like
Provide the mean to prevent its recalculation. The mean should have a shape as if it was calculated with keepdims=True.
correction
int or float
Array API compatible name for the ddof parameter. Only one of them can be provided at the same time.

Returns

standard_deviation : ndarray Array containing the standard deviation values.

What It Represents

The standard deviation measures how spread out data is from the mean: σ=1Ni=1N(xiμ)2\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2} Where:
  • σ (sigma) is the standard deviation
  • μ (mu) is the mean
  • N is the number of elements
A low standard deviation means data points are close to the mean (consistent). A high standard deviation means data points are spread out (variable). The ddof parameter:
  • ddof=0: population standard deviation (divide by N)
  • ddof=1: sample standard deviation (divide by N-1, Bessel’s correction)

Examples

import numpy as np

# Basic standard deviation
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
np.std(data)
# 2.0

# Compare consistent vs variable data
consistent = np.array([5, 5, 5, 5, 5])
variable = np.array([1, 3, 5, 7, 9])

np.std(consistent)  # 0.0 (no variation)
np.std(variable)    # 2.828... (high variation)

# Population vs sample standard deviation
data = np.array([1, 2, 3, 4, 5])

np.std(data, ddof=0)  # 1.414... (population)
np.std(data, ddof=1)  # 1.581... (sample, larger)

# Standard deviation along axis
scores = np.array([
    [85, 90, 78],  # Student 1
    [92, 88, 91],  # Student 2
    [78, 82, 75]   # Student 3
])

# Consistency per student (across tests)
np.std(scores, axis=1)
# array([4.9, 1.7, 2.9])
# Student 2 is most consistent

# Variation per test (across students)
np.std(scores, axis=0)
# array([5.9, 3.6, 7.0])
# Test 3 has highest variation

# Temperature data analysis
temps = np.array([72, 75, 71, 78, 73, 76, 74])
mean_temp = np.mean(temps)
std_temp = np.std(temps)

print(f"Average: {mean_temp:.1f}°F ± {std_temp:.1f}°F")
# Average: 74.1°F ± 2.3°F

var

Compute the variance along the specified axis.
numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, *, 
          where=True, mean=None, correction=None)

Parameters

a
array_like
Array containing numbers whose variance is desired.
axis
None or int or tuple of ints
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
dtype
dtype
Type to use in computing the variance. For arrays of integer type the default is float64.
out
ndarray
Alternative output array in which to place the result.
ddof
int or float
default:"0"
Delta Degrees of Freedom. The divisor used in calculations is N - ddof.
keepdims
bool
default:"False"
If True, the axes which are reduced are left in the result as dimensions with size one.
where
array_like of bool
Elements to include in the variance.
mean
array_like
Provide the mean to prevent its recalculation.
correction
int or float
Array API compatible name for the ddof parameter.

Returns

variance : ndarray Array containing the variance values.

What It Represents

The variance is the average of squared deviations from the mean: σ2=1Ni=1N(xiμ)2\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 Variance is the square of standard deviation: var = std² Variance measures spread in squared units of the original data:
  • If data is in meters, variance is in meters²
  • If data is in dollars, variance is in dollars²
Standard deviation is often preferred because it’s in the same units as the data.

Examples

import numpy as np

# Basic variance
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
np.var(data)
# 4.0

# Relationship between variance and standard deviation
var = np.var(data)
std = np.std(data)

var  # 4.0
std  # 2.0
std ** 2  # 4.0 (equals var)

# Why variance is useful: additivity property
# var(X + Y) = var(X) + var(Y) for independent variables

# Portfolio variance example
stock_a_returns = np.array([0.05, 0.02, 0.07, 0.03, 0.06])
stock_b_returns = np.array([0.04, 0.08, 0.02, 0.05, 0.06])

var_a = np.var(stock_a_returns, ddof=1)
var_b = np.var(stock_b_returns, ddof=1)

print(f"Stock A variance: {var_a:.6f}")
print(f"Stock B variance: {var_b:.6f}")
# Stock B is riskier (higher variance)

# Compare groups
group1 = np.array([10, 12, 11, 13, 12, 11])
group2 = np.array([8, 15, 9, 16, 7, 17])

np.mean(group1)  # 11.5
np.mean(group2)  # 12.0 (similar means)

np.var(group1)   # 0.9 (low variance, consistent)
np.var(group2)   # 15.7 (high variance, variable)

nanstd

Compute the standard deviation along the specified axis, ignoring NaNs.
numpy.nanstd(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, *, where=True)

Parameters

a
array_like
Calculate the standard deviation of the non-NaN values.
axis
None or int or tuple of ints
Axis or axes along which the standard deviation is computed.
dtype
dtype
Type to use in computing the standard deviation.
out
ndarray
Alternative output array in which to place the result.
ddof
int or float
default:"0"
Delta Degrees of Freedom.
keepdims
bool
default:"False"
If True, the axes which are reduced are left in the result as dimensions with size one.
where
array_like of bool
Elements to include in the standard deviation.

Returns

standard_deviation : ndarray Standard deviation with NaN values ignored.

What It Represents

Same as std, but automatically excludes NaN (Not a Number) values from the calculation. This is essential when working with real-world data that has missing values. Using regular std on data with NaNs returns NaN. Using nanstd ignores the NaN values and computes the statistic on valid data only.

Examples

import numpy as np

# Data with missing values
data = np.array([1.0, 2.0, np.nan, 4.0, 5.0])

# Regular std returns NaN
np.std(data)
# nan

# nanstd ignores NaN values
np.nanstd(data)
# 1.58...  # std of [1, 2, 4, 5]

# Real-world example: sensor data with failures
temperature_readings = np.array([
    [72.0, 73.0, np.nan, 74.0],
    [71.0, np.nan, 73.0, 75.0],
    [73.0, 74.0, 72.0, 73.0]
])

# Standard deviation per sensor (axis=0)
np.nanstd(temperature_readings, axis=0)
# array([0.81..., 0.5, 0.5, 0.81...])

# Standard deviation per time (axis=1)
np.nanstd(temperature_readings, axis=1)
# array([0.81..., 1.63..., 0.70...])

# Financial data with missing days
returns = np.array([0.02, np.nan, 0.01, 0.03, np.nan, -0.01])
volatility = np.nanstd(returns, ddof=1)
print(f"Volatility: {volatility:.4f}")
# Volatility: 0.0158

nanvar

Compute the variance along the specified axis, ignoring NaNs.
numpy.nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, *, where=True)

Parameters

a
array_like
Array containing numbers whose variance is desired, possibly with NaN values.
axis
None or int or tuple of ints
Axis or axes along which the variance is computed.
dtype
dtype
Type to use in computing the variance.
out
ndarray
Alternative output array in which to place the result.
ddof
int or float
default:"0"
Delta Degrees of Freedom.
keepdims
bool
default:"False"
If True, the axes which are reduced are left in the result as dimensions with size one.
where
array_like of bool
Elements to include in the variance.

Returns

variance : ndarray Variance with NaN values ignored.

What It Represents

Same as var, but automatically excludes NaN values from the calculation. Essential for computing variance on datasets with missing or invalid data. The relationship nanvar = (nanstd)² holds, just like var = std².

Examples

import numpy as np

# Data with missing values
data = np.array([1.0, 2.0, np.nan, 4.0, 5.0])

# Regular var returns NaN
np.var(data)
# nan

# nanvar ignores NaN values
np.nanvar(data)
# 2.5  # var of [1, 2, 4, 5]

# Verify relationship with nanstd
var = np.nanvar(data)
std = np.nanstd(data)
var  # 2.5
std ** 2  # 2.5 (equals var)

# Experimental measurements with equipment failures
measurements = np.array([
    [10.2, 10.5, np.nan, 10.1],
    [10.3, 10.4, 10.6, np.nan],
    [np.nan, 10.3, 10.4, 10.5]
])

# Variance across all measurements
np.nanvar(measurements)
# 0.020...

# Variance per trial (axis=0)
np.nanvar(measurements, axis=0)
# array([0.0025, 0.0067, 0.01, 0.04])

# Quality control: detect high-variance batches
batch_data = np.array([
    [50.1, 50.2, np.nan, 50.0],
    [50.5, 49.8, 51.2, np.nan],  # High variance
    [50.0, 50.1, 50.0, 50.1]
])

variances = np.nanvar(batch_data, axis=1, ddof=1)
for i, var in enumerate(variances):
    status = "REJECT" if var > 0.05 else "ACCEPT"
    print(f"Batch {i+1}: var={var:.4f} - {status}")
# Batch 1: var=0.0067 - ACCEPT
# Batch 2: var=0.3467 - REJECT
# Batch 3: var=0.0033 - ACCEPT

See Also

Averages

Mean, median, and central tendency measures

Correlating

Correlation and covariance functions

Build docs developers (and LLMs) love