Precision

Precision metrics measure how many of the items your system retrieved are actually relevant. The reval package provides three functions that build on each other: Precision for a single cutoff, AveragePrecision to account for ranking order, and MeanAveragePrecision to aggregate across multiple queries.

Concepts

Precision@K answers: “Of the top-K items I returned, what fraction were relevant?” It treats all positions equally and ignores items ranked below K. Average Precision (AP) answers: “How well did I rank the relevant items near the top?” It computes precision at each rank where a relevant item appears, then averages those values — so retrieving relevant items earlier earns a higher score than retrieving them later. Mean Average Precision (MAP) is the macro-average of AP across a set of queries. It provides a single number summarizing system-wide retrieval quality.

`QueryResult`

The QueryResult struct bundles the predicted ranking and relevance judgements for a single query, and is the input type for MeanAveragePrecision.

type QueryResult struct {
    Predicted []string
    Relevance map[string]int
}

Predicted

[]string

required

The ranked list of item identifiers returned by the system, ordered from most to least relevant.

Relevance

map[string]int

required

A map of item identifier to relevance grade. Items with a grade of 1 or higher are considered relevant; items with a grade below 1 are treated as non-relevant.

`Precision`

func Precision(predicted []string, relevance map[string]int, k int) float64

Returns the fraction of the top-K predicted items that are relevant. Precision@K measures result quality at a fixed retrieval depth, making it useful when users only inspect the first K results. Returns 0.0 when k is 0.

predicted

[]string

required

The ranked list of retrieved item identifiers.

relevance

map[string]int

required

A map of item identifier to relevance grade. Items with grade ≥ 1 are counted as hits.

int

required

The cutoff depth. Only the first k items in predicted are evaluated.

Returns float64 — the ratio of relevant hits in the top-K to K.

Example

func ExamplePrecision() {
	predicted := []string{"A", "B", "C", "D"}
	relevance := map[string]int{
		"A": 3,
		"B": 2,
		"C": 0,
		"D": 0,
		"E": 3,
	}

	s := reval.Precision(predicted, relevance, 3)
	fmt.Println("Precision@3:", s)

	// Output:
	// Precision@3: 0.6666666666666666
}

Items “A” and “B” are relevant (grades 3 and 2), “C” is not (grade 0). With K=3, precision = 2/3 ≈ 0.667. Item “D” is never considered because it falls outside the K=3 cutoff.

`AveragePrecision`

func AveragePrecision(predicted []string, relevance map[string]int, k int) float64

Returns the mean of precision values computed at each rank position where a relevant item appears in the top-K list. Unlike flat Precision@K, AP rewards systems that rank relevant items higher — the same set of hits produces a higher AP when concentrated near rank 1. Returns 0.0 if no relevant items are found.

predicted

[]string

required

The ranked list of retrieved item identifiers.

relevance

map[string]int

required

A map of item identifier to relevance grade. Items with grade ≥ 1 are treated as relevant.

int

required

The cutoff depth. Only the first k items in predicted are evaluated.

Returns float64 — the average of per-rank precision scores at each relevant hit.

Example

func ExampleAveragePrecision() {
	predicted := []string{"C", "A", "B", "D"}
	relevance := map[string]int{
		"A": 1,
		"B": 1,
		"C": 0,
		"D": 0,
		"E": 1,
	}

	s := reval.AveragePrecision(predicted, relevance, 4)
	fmt.Printf("Average Precision@4: %.4f\n", s)

	// Output:
	// Average Precision@4: 0.5833
}

The first hit is “A” at rank 2 (precision = 1/2), the second hit is “B” at rank 3 (precision = 2/3). AP = (0.5 + 0.667) / 2 ≈ 0.583. “C” and “D” are not relevant so they contribute nothing.

`MeanAveragePrecision`

func MeanAveragePrecision(results []QueryResult, k int) float64

Returns the arithmetic mean of AveragePrecision across all queries in results. MAP summarises retrieval quality over an entire test set, making it the standard offline evaluation metric for ranked retrieval systems. Returns 0.0 for an empty slice.

results

[]QueryResult

required

A slice of QueryResult values, one per query. Each entry contains the system’s ranked output and the corresponding relevance judgements.

int

required

The cutoff depth passed to AveragePrecision for every query.

Returns float64 — the mean Average Precision across all queries.

Example

func ExampleMeanAveragePrecision() {
	results := []reval.QueryResult{
		{
			Predicted: []string{"C", "A", "B", "D"},
			Relevance: map[string]int{
				"A": 1,
				"B": 1,
				"C": 0,
				"D": 0,
				"E": 1,
			},
		},
		{
			Predicted: []string{"A", "B", "C", "D"},
			Relevance: map[string]int{
				"A": 1,
				"B": 0,
				"C": 1,
				"D": 0,
				"E": 1,
			},
		},
	}

	s := reval.MeanAveragePrecision(results, 4)
	fmt.Printf("Mean Average Precision@4: %.4f\n", s)

	// Output:
	// Mean Average Precision@4: 0.7083
}

Get Started

Metrics Reference

Guides

Concepts

`QueryResult`

`Precision`

Example

`AveragePrecision`

Example

`MeanAveragePrecision`

Example

Build docs developers (and LLMs) love

Get Started

Metrics Reference

Guides

Documentation Index

​Concepts

​QueryResult

​Precision

​Example

​AveragePrecision

​Example

​MeanAveragePrecision

​Example

Build docs developers (and LLMs) love

Concepts

`QueryResult`

`Precision`

Example

`AveragePrecision`

Example

`MeanAveragePrecision`

Example