Recall
k positions of predicted. A score of 1.0 means every relevant item was found within the top-K results; a score of 0.0 means none were. Returns 0.0 when the relevance map contains no relevant items.
The ranked list of retrieved item identifiers, ordered from most to least relevant.
A map of item identifier to relevance grade. Items with grade ≥ 1 are counted as relevant. The total number of such items forms the denominator of the recall calculation.
The cutoff depth. Only the first
k items in predicted are examined when counting hits.float64 — the ratio of relevant items found in the top-K to the total number of relevant items.
Example
There are 4 relevant items in total (“A”, “B”, “C”, “E” — all with grade ≥ 1). With K=3, the top-3 predicted items are “A”, “B”, and “C” — all three are relevant. Recall@3 = 3/4 = 0.75. Item “E” is relevant but was never retrieved within the cutoff, so the score is less than 1.0.
Precision vs. recall
Precision@K
Measures result quality. Answers: “Are the items I returned actually useful?” A high-precision system returns fewer but more reliable results.
Recall@K
Measures result completeness. Answers: “Did I surface all the useful items?” A high-recall system ensures few relevant items are missed within the top-K.