BERTScore

BERTScore measures semantic similarity between a candidate text and a reference text using dense token embeddings. Rather than counting exact token matches like ROUGE, BERTScore compares meaning by looking at how similar the embedding vectors are — so paraphrases and synonyms can still earn high scores.

BERTScore expects pre-computed token embeddings — for example, the contextual vectors produced by a BERT model for each token in a sentence. You must extract these embeddings yourself before calling this function; reval does not perform tokenisation or model inference.Internally, the function computes greedy alignment using dot product similarity: each token in the candidate is matched to its most similar token in the reference (and vice versa), and the scores are averaged to produce precision, recall, and F1.For best results, L2-normalise your embeddings with Normalize before passing them in so that dot product equals cosine similarity.

`BERTScore`

func BERTScore(candidates, refs [][]float64) (precision, recall, f1 float64)

Returns the BERTScore between a candidate and a reference, both represented as sequences of dense token embedding vectors. Precision is computed by greedily matching each candidate token to its best reference token; recall is computed in the reverse direction; F1 is their harmonic mean. Returns zero for all outputs when either slice is empty.

candidates

[][]float64

required

A sequence of token embedding vectors for the candidate text. Each inner slice is the embedding for one token. All vectors should have the same dimensionality.

refs

[][]float64

required

A sequence of token embedding vectors for the reference text. Each inner slice is the embedding for one token. All vectors should have the same dimensionality as those in candidates.

Returns three float64 values:

precision

float64

Average maximum similarity from each candidate token to the most similar reference token.

recall

float64

Average maximum similarity from each reference token to the most similar candidate token.

float64

Harmonic mean of precision and recall.

Example

func ExampleBERTScore() {
	candidates := [][]float64{
		{0.1, 0.2, 0.3},
		{0.4, 0.5, 0.6},
	}
	refs := [][]float64{
		{0.1, 0.2, 0.3},
		{0.7, 0.8, 0.9},
	}

	precision, recall, f1 := reval.BERTScore(candidates, refs)
	fmt.Printf("%.4f, %.4f, %.4f\n", precision, recall, f1)

	// Output:
	// 0.8600, 0.7700, 0.8125
}

`DotProduct`

func DotProduct(a, b []float64) float64

Returns the dot product of two vectors. This is the similarity function used internally by BERTScore for greedy token alignment. When vectors are L2-normalised, the dot product equals cosine similarity. Returns 0 if the vectors have different lengths.

[]float64

required

The first vector.

[]float64

required

The second vector. Must have the same length as a, otherwise 0 is returned.

Returns float64 — the sum of element-wise products.

`L2Norm`

func L2Norm(a []float64) float64

Returns the L2 (Euclidean) norm of vector a — the square root of the sum of squared elements. This is the magnitude used to normalise vectors before computing cosine similarity.

[]float64

required

The input vector.

Returns float64 — the Euclidean length of the vector.

`Normalize`

func Normalize(a []float64) []float64

Returns a new vector that is the L2-normalised version of a — that is, a divided by its L2 norm so that the result has unit length. If the norm is zero (the zero vector), the original slice is returned unchanged.

[]float64

required

The input vector to normalise.

Returns []float64 — a new slice with the same direction as a and a magnitude of 1.0.

Normalise your embeddings before passing them to BERTScore. When both the candidate and reference embeddings are unit vectors, the dot product computed internally is equivalent to cosine similarity, which is the standard similarity measure used in the original BERTScore paper.

Get Started

Metrics Reference

Guides

`BERTScore`

Example

`DotProduct`

`L2Norm`

`Normalize`

Build docs developers (and LLMs) love

Get Started

Metrics Reference

Guides

Documentation Index

​BERTScore

​Example

​DotProduct

​L2Norm

​Normalize

Build docs developers (and LLMs) love

`BERTScore`

Example

`DotProduct`

`L2Norm`

`Normalize`