Count Functions
count
Counts the number of rows. Signature:count()
Returns: LONG
Example:
count (column)
Counts non-NULL values in a column. Signature:count(column)
Parameters:
column- Any column
LONG
Example:
count_distinct
Counts unique non-NULL values. Signature:count_distinct(column)
Parameters:
column- Any column
LONG
Example:
Sum Functions
sum
Computes sum of values. Signature:sum(column)
Parameters:
column- Numeric column
nsum
Neumaier sum - more accurate summation using compensated algorithm. Signature:nsum(D)
Parameters:
D- DOUBLE column
DOUBLE
Example:
Average Functions
avg
Computes arithmetic mean. Signature:avg(column)
Parameters:
column- Numeric column
DOUBLE
Example:
weighted_avg
Weighted average. Signature:weighted_avg(DD)
Parameters:
- First
D- Value column - Second
D- Weight column
DOUBLE
Example:
Min/Max Functions
min
Finds minimum value. Signature:min(column)
Parameters:
column- Any comparable column
max
Finds maximum value. Signature:max(column)
Parameters:
column- Any comparable column
Statistical Functions
stddev
Computes sample standard deviation. Signature:stddev(D)
Parameters:
D- DOUBLE column
DOUBLE
Example:
stddev_pop
Computes population standard deviation. Signature:stddev_pop(D)
Parameters:
D- DOUBLE column
DOUBLE
stddev_samp
Alias forstddev(). Sample standard deviation.
Signature: stddev_samp(D)
Parameters:
D- DOUBLE column
DOUBLE
variance
Computes sample variance. Signature:variance(D)
Parameters:
D- DOUBLE column
DOUBLE
Example:
var_pop
Computes population variance. Signature:var_pop(D)
Parameters:
D- DOUBLE column
DOUBLE
var_samp
Alias forvariance(). Sample variance.
Signature: var_samp(D)
Parameters:
D- DOUBLE column
DOUBLE
Regression Functions
regr_slope
Computes slope of linear regression. Signature:regr_slope(DD)
Parameters:
- First
D- Dependent variable (y) - Second
D- Independent variable (x)
DOUBLE
Example:
regr_intercept
Computes y-intercept of linear regression. Signature:regr_intercept(DD)
Parameters:
- First
D- Dependent variable (y) - Second
D- Independent variable (x)
DOUBLE
String Aggregation
string_agg
Concatenates string values with separator. Signature:string_agg(Ss)
Parameters:
S- STRING columns- Separator string
STRING
Example:
string_distinct_agg
Concatenates unique string values. Signature:string_distinct_agg(Ss)
Parameters:
S- STRING columns- Separator string
STRING
Example:
Mode Function
mode
Finds most frequent value. Signature:mode(column)
Parameters:
column- Any column
First/Last Functions
first
Returns first value in group. Signature:first(column)
Parameters:
column- Any column
last
Returns last value in group. Signature:last(column)
Parameters:
column- Any column
Advanced Aggregations
ksum
Kahan summation - compensated sum for better accuracy. Signature:ksum(D)
Parameters:
D- DOUBLE column
DOUBLE
Example:
haversine_dist_deg
Computes average haversine distance between lat/lon points. Signature:haversine_dist_deg(DDDD)
Parameters:
- Latitude 1 (degrees)
- Longitude 1 (degrees)
- Latitude 2 (degrees)
- Longitude 2 (degrees)
DOUBLE (kilometers)
Using Aggregations
With GROUP BY
With SAMPLE BY
Without GROUP BY
Performance Notes
SIMD Acceleration
These functions use SIMD instructions for vectorized computation:sum(),avg(),min(),max()stddev(),variance()- Statistical aggregations
Batch Computation
Many aggregations support batch processing for better CPU cache utilization.Memory Efficiency
Aggregations use constant memory per group, regardless of group size.NULL Handling
- Most aggregations skip NULL values
count()counts all rowscount(column)counts non-NULL values- Empty groups return NULL for most functions, 0 for
count()