std
Compute the standard deviation along the specified axis.
numpy.std(a, axis = None , dtype = None , out = None , ddof = 0 , keepdims = False , * ,
where = True , mean = None , correction = None )
Parameters
Calculate the standard deviation of these values.
axis
None or int or tuple of ints
Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.
Alternative output array in which to place the result.
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
If True, the axes which are reduced are left in the result as dimensions with size one.
Elements to include in the standard deviation.
Provide the mean to prevent its recalculation. The mean should have a shape as if it was calculated with keepdims=True.
Array API compatible name for the ddof parameter. Only one of them can be provided at the same time.
Returns
standard_deviation : ndarray
Array containing the standard deviation values.
What It Represents
The standard deviation measures how spread out data is from the mean:
σ = 1 N ∑ i = 1 N ( x i − μ ) 2 \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2} σ = N 1 i = 1 ∑ N ( x i − μ ) 2
Where:
σ (sigma) is the standard deviation
μ (mu) is the mean
N is the number of elements
A low standard deviation means data points are close to the mean (consistent).
A high standard deviation means data points are spread out (variable).
The ddof parameter:
ddof=0: population standard deviation (divide by N)
ddof=1: sample standard deviation (divide by N-1, Bessel’s correction)
Examples
import numpy as np
# Basic standard deviation
data = np.array([ 2 , 4 , 4 , 4 , 5 , 5 , 7 , 9 ])
np.std(data)
# 2.0
# Compare consistent vs variable data
consistent = np.array([ 5 , 5 , 5 , 5 , 5 ])
variable = np.array([ 1 , 3 , 5 , 7 , 9 ])
np.std(consistent) # 0.0 (no variation)
np.std(variable) # 2.828... (high variation)
# Population vs sample standard deviation
data = np.array([ 1 , 2 , 3 , 4 , 5 ])
np.std(data, ddof = 0 ) # 1.414... (population)
np.std(data, ddof = 1 ) # 1.581... (sample, larger)
# Standard deviation along axis
scores = np.array([
[ 85 , 90 , 78 ], # Student 1
[ 92 , 88 , 91 ], # Student 2
[ 78 , 82 , 75 ] # Student 3
])
# Consistency per student (across tests)
np.std(scores, axis = 1 )
# array([4.9, 1.7, 2.9])
# Student 2 is most consistent
# Variation per test (across students)
np.std(scores, axis = 0 )
# array([5.9, 3.6, 7.0])
# Test 3 has highest variation
# Temperature data analysis
temps = np.array([ 72 , 75 , 71 , 78 , 73 , 76 , 74 ])
mean_temp = np.mean(temps)
std_temp = np.std(temps)
print ( f "Average: { mean_temp :.1f} °F ± { std_temp :.1f} °F" )
# Average: 74.1°F ± 2.3°F
var
Compute the variance along the specified axis.
numpy.var(a, axis = None , dtype = None , out = None , ddof = 0 , keepdims = False , * ,
where = True , mean = None , correction = None )
Parameters
Array containing numbers whose variance is desired.
axis
None or int or tuple of ints
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
Type to use in computing the variance. For arrays of integer type the default is float64.
Alternative output array in which to place the result.
Delta Degrees of Freedom. The divisor used in calculations is N - ddof.
If True, the axes which are reduced are left in the result as dimensions with size one.
Elements to include in the variance.
Provide the mean to prevent its recalculation.
Array API compatible name for the ddof parameter.
Returns
variance : ndarray
Array containing the variance values.
What It Represents
The variance is the average of squared deviations from the mean:
σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 σ 2 = N 1 i = 1 ∑ N ( x i − μ ) 2
Variance is the square of standard deviation : var = std²
Variance measures spread in squared units of the original data:
If data is in meters, variance is in meters²
If data is in dollars, variance is in dollars²
Standard deviation is often preferred because it’s in the same units as the data.
Examples
import numpy as np
# Basic variance
data = np.array([ 2 , 4 , 4 , 4 , 5 , 5 , 7 , 9 ])
np.var(data)
# 4.0
# Relationship between variance and standard deviation
var = np.var(data)
std = np.std(data)
var # 4.0
std # 2.0
std ** 2 # 4.0 (equals var)
# Why variance is useful: additivity property
# var(X + Y) = var(X) + var(Y) for independent variables
# Portfolio variance example
stock_a_returns = np.array([ 0.05 , 0.02 , 0.07 , 0.03 , 0.06 ])
stock_b_returns = np.array([ 0.04 , 0.08 , 0.02 , 0.05 , 0.06 ])
var_a = np.var(stock_a_returns, ddof = 1 )
var_b = np.var(stock_b_returns, ddof = 1 )
print ( f "Stock A variance: { var_a :.6f} " )
print ( f "Stock B variance: { var_b :.6f} " )
# Stock B is riskier (higher variance)
# Compare groups
group1 = np.array([ 10 , 12 , 11 , 13 , 12 , 11 ])
group2 = np.array([ 8 , 15 , 9 , 16 , 7 , 17 ])
np.mean(group1) # 11.5
np.mean(group2) # 12.0 (similar means)
np.var(group1) # 0.9 (low variance, consistent)
np.var(group2) # 15.7 (high variance, variable)
nanstd
Compute the standard deviation along the specified axis, ignoring NaNs.
numpy.nanstd(a, axis = None , dtype = None , out = None , ddof = 0 , keepdims = False , * , where = True )
Parameters
Calculate the standard deviation of the non-NaN values.
axis
None or int or tuple of ints
Axis or axes along which the standard deviation is computed.
Type to use in computing the standard deviation.
Alternative output array in which to place the result.
Delta Degrees of Freedom.
If True, the axes which are reduced are left in the result as dimensions with size one.
Elements to include in the standard deviation.
Returns
standard_deviation : ndarray
Standard deviation with NaN values ignored.
What It Represents
Same as std, but automatically excludes NaN (Not a Number) values from the calculation. This is essential when working with real-world data that has missing values.
Using regular std on data with NaNs returns NaN. Using nanstd ignores the NaN values and computes the statistic on valid data only.
Examples
import numpy as np
# Data with missing values
data = np.array([ 1.0 , 2.0 , np.nan, 4.0 , 5.0 ])
# Regular std returns NaN
np.std(data)
# nan
# nanstd ignores NaN values
np.nanstd(data)
# 1.58... # std of [1, 2, 4, 5]
# Real-world example: sensor data with failures
temperature_readings = np.array([
[ 72.0 , 73.0 , np.nan, 74.0 ],
[ 71.0 , np.nan, 73.0 , 75.0 ],
[ 73.0 , 74.0 , 72.0 , 73.0 ]
])
# Standard deviation per sensor (axis=0)
np.nanstd(temperature_readings, axis = 0 )
# array([0.81..., 0.5, 0.5, 0.81...])
# Standard deviation per time (axis=1)
np.nanstd(temperature_readings, axis = 1 )
# array([0.81..., 1.63..., 0.70...])
# Financial data with missing days
returns = np.array([ 0.02 , np.nan, 0.01 , 0.03 , np.nan, - 0.01 ])
volatility = np.nanstd(returns, ddof = 1 )
print ( f "Volatility: { volatility :.4f} " )
# Volatility: 0.0158
nanvar
Compute the variance along the specified axis, ignoring NaNs.
numpy.nanvar(a, axis = None , dtype = None , out = None , ddof = 0 , keepdims = False , * , where = True )
Parameters
Array containing numbers whose variance is desired, possibly with NaN values.
axis
None or int or tuple of ints
Axis or axes along which the variance is computed.
Type to use in computing the variance.
Alternative output array in which to place the result.
Delta Degrees of Freedom.
If True, the axes which are reduced are left in the result as dimensions with size one.
Elements to include in the variance.
Returns
variance : ndarray
Variance with NaN values ignored.
What It Represents
Same as var, but automatically excludes NaN values from the calculation. Essential for computing variance on datasets with missing or invalid data.
The relationship nanvar = (nanstd)² holds, just like var = std².
Examples
import numpy as np
# Data with missing values
data = np.array([ 1.0 , 2.0 , np.nan, 4.0 , 5.0 ])
# Regular var returns NaN
np.var(data)
# nan
# nanvar ignores NaN values
np.nanvar(data)
# 2.5 # var of [1, 2, 4, 5]
# Verify relationship with nanstd
var = np.nanvar(data)
std = np.nanstd(data)
var # 2.5
std ** 2 # 2.5 (equals var)
# Experimental measurements with equipment failures
measurements = np.array([
[ 10.2 , 10.5 , np.nan, 10.1 ],
[ 10.3 , 10.4 , 10.6 , np.nan],
[np.nan, 10.3 , 10.4 , 10.5 ]
])
# Variance across all measurements
np.nanvar(measurements)
# 0.020...
# Variance per trial (axis=0)
np.nanvar(measurements, axis = 0 )
# array([0.0025, 0.0067, 0.01, 0.04])
# Quality control: detect high-variance batches
batch_data = np.array([
[ 50.1 , 50.2 , np.nan, 50.0 ],
[ 50.5 , 49.8 , 51.2 , np.nan], # High variance
[ 50.0 , 50.1 , 50.0 , 50.1 ]
])
variances = np.nanvar(batch_data, axis = 1 , ddof = 1 )
for i, var in enumerate (variances):
status = "REJECT" if var > 0.05 else "ACCEPT"
print ( f "Batch { i + 1 } : var= { var :.4f} - { status } " )
# Batch 1: var=0.0067 - ACCEPT
# Batch 2: var=0.3467 - REJECT
# Batch 3: var=0.0033 - ACCEPT
See Also
Averages Mean, median, and central tendency measures
Correlating Correlation and covariance functions