Skip to main content
Precision metrics measure how many of the items your system retrieved are actually relevant. The reval package provides three functions that build on each other: Precision for a single cutoff, AveragePrecision to account for ranking order, and MeanAveragePrecision to aggregate across multiple queries.

Concepts

Precision@K answers: “Of the top-K items I returned, what fraction were relevant?” It treats all positions equally and ignores items ranked below K. Average Precision (AP) answers: “How well did I rank the relevant items near the top?” It computes precision at each rank where a relevant item appears, then averages those values — so retrieving relevant items earlier earns a higher score than retrieving them later. Mean Average Precision (MAP) is the macro-average of AP across a set of queries. It provides a single number summarizing system-wide retrieval quality.

QueryResult

The QueryResult struct bundles the predicted ranking and relevance judgements for a single query, and is the input type for MeanAveragePrecision.
type QueryResult struct {
    Predicted []string
    Relevance map[string]int
}
Predicted
[]string
required
The ranked list of item identifiers returned by the system, ordered from most to least relevant.
Relevance
map[string]int
required
A map of item identifier to relevance grade. Items with a grade of 1 or higher are considered relevant; items with a grade below 1 are treated as non-relevant.

Precision

func Precision(predicted []string, relevance map[string]int, k int) float64
Returns the fraction of the top-K predicted items that are relevant. Precision@K measures result quality at a fixed retrieval depth, making it useful when users only inspect the first K results. Returns 0.0 when k is 0.
predicted
[]string
required
The ranked list of retrieved item identifiers.
relevance
map[string]int
required
A map of item identifier to relevance grade. Items with grade ≥ 1 are counted as hits.
k
int
required
The cutoff depth. Only the first k items in predicted are evaluated.
Returns float64 — the ratio of relevant hits in the top-K to K.

Example

func ExamplePrecision() {
	predicted := []string{"A", "B", "C", "D"}
	relevance := map[string]int{
		"A": 3,
		"B": 2,
		"C": 0,
		"D": 0,
		"E": 3,
	}

	s := reval.Precision(predicted, relevance, 3)
	fmt.Println("Precision@3:", s)

	// Output:
	// Precision@3: 0.6666666666666666
}
Items “A” and “B” are relevant (grades 3 and 2), “C” is not (grade 0). With K=3, precision = 2/3 ≈ 0.667. Item “D” is never considered because it falls outside the K=3 cutoff.

AveragePrecision

func AveragePrecision(predicted []string, relevance map[string]int, k int) float64
Returns the mean of precision values computed at each rank position where a relevant item appears in the top-K list. Unlike flat Precision@K, AP rewards systems that rank relevant items higher — the same set of hits produces a higher AP when concentrated near rank 1. Returns 0.0 if no relevant items are found.
predicted
[]string
required
The ranked list of retrieved item identifiers.
relevance
map[string]int
required
A map of item identifier to relevance grade. Items with grade ≥ 1 are treated as relevant.
k
int
required
The cutoff depth. Only the first k items in predicted are evaluated.
Returns float64 — the average of per-rank precision scores at each relevant hit.

Example

func ExampleAveragePrecision() {
	predicted := []string{"C", "A", "B", "D"}
	relevance := map[string]int{
		"A": 1,
		"B": 1,
		"C": 0,
		"D": 0,
		"E": 1,
	}

	s := reval.AveragePrecision(predicted, relevance, 4)
	fmt.Printf("Average Precision@4: %.4f\n", s)

	// Output:
	// Average Precision@4: 0.5833
}
The first hit is “A” at rank 2 (precision = 1/2), the second hit is “B” at rank 3 (precision = 2/3). AP = (0.5 + 0.667) / 2 ≈ 0.583. “C” and “D” are not relevant so they contribute nothing.

MeanAveragePrecision

func MeanAveragePrecision(results []QueryResult, k int) float64
Returns the arithmetic mean of AveragePrecision across all queries in results. MAP summarises retrieval quality over an entire test set, making it the standard offline evaluation metric for ranked retrieval systems. Returns 0.0 for an empty slice.
results
[]QueryResult
required
A slice of QueryResult values, one per query. Each entry contains the system’s ranked output and the corresponding relevance judgements.
k
int
required
The cutoff depth passed to AveragePrecision for every query.
Returns float64 — the mean Average Precision across all queries.

Example

func ExampleMeanAveragePrecision() {
	results := []reval.QueryResult{
		{
			Predicted: []string{"C", "A", "B", "D"},
			Relevance: map[string]int{
				"A": 1,
				"B": 1,
				"C": 0,
				"D": 0,
				"E": 1,
			},
		},
		{
			Predicted: []string{"A", "B", "C", "D"},
			Relevance: map[string]int{
				"A": 1,
				"B": 0,
				"C": 1,
				"D": 0,
				"E": 1,
			},
		},
	}

	s := reval.MeanAveragePrecision(results, 4)
	fmt.Printf("Mean Average Precision@4: %.4f\n", s)

	// Output:
	// Mean Average Precision@4: 0.7083
}

Build docs developers (and LLMs) love