Documentation Index
Fetch the complete documentation index at: https://mintlify.com/avnlp/dspy-opt/llms.txt
Use this file to discover all available pages before exploring further.
WeaviateRetriever is a dspy.Module that connects to a Weaviate Cloud cluster and executes hybrid search — combining dense vector similarity with sparse keyword matching — against a named collection. It accepts an optional precomputed query embedding for the vector leg, and an optional metadata dictionary (produced by MetadataExtractor) that is automatically converted into Weaviate property filters. The result is a dspy.Prediction containing a ranked list of passage strings ready for the answer generation step.
Constructor
Full URL of your Weaviate Cloud cluster endpoint (e.g.
"https://your-cluster.weaviate.cloud"). Must be provided — omitting it raises a ValueError at construction time.API key used to authenticate with the Weaviate cluster. Must be provided — omitting it raises a
ValueError at construction time.Name of the Weaviate collection (class) to query. The collection must already exist in the cluster; if not, a
ValueError is raised during initialization.Default number of passages to retrieve per query. Can be overridden per call via the
top_k parameter in forward.Optional dictionary mapping metadata property names to their expected Python types — for example
{"category": str, "title": str}. Only keys present in this schema are used when building Weaviate property filters from the metadata argument in forward. Keys not in the schema are silently skipped.Methods
forward
The user query string used for the keyword leg of hybrid search. An empty or whitespace-only query immediately returns an empty
passages list without contacting Weaviate.Precomputed dense embedding vector for the vector leg of hybrid search. Passed directly as the
vector argument to collection.query.hybrid. When None, Weaviate falls back to keyword-only search.Number of passages to retrieve for this call. Overrides the constructor-level
top_k when provided; otherwise the constructor default is used.Optional metadata dictionary (typically produced by
MetadataExtractor.forward) to filter results. Converted to a Weaviate Filter object via metadata_to_weaviate_filter. Keys not in metadata_schema are ignored.dspy.Prediction with a single attribute:
passages—List[str]of non-emptydocument_textproperty values from the matching Weaviate objects, ranked by hybrid score.
passages list is returned so the pipeline can continue.
metadata_to_weaviate_filter
Filter object suitable for the filters argument of collection.query.hybrid.
Metadata key-value pairs to translate into property filters.
wvc.query.Filter built from Filter.and_(...) over individual Filter.by_property(key).like(val) conditions, or None if no valid filter conditions were found.
The method applies the following logic for each key-value pair:
- Skip the key if it is not present in
self.metadata_schema. - Check whether the value matches the expected Python type in the schema.
- If the type doesn’t match, attempt coercion (e.g.
str("2021")→"2021"); skip the field if coercion fails. - Build a
.like()property filter and add it to the filter list. - Combine all valid filters with a logical
AND.
Hybrid search internals
Internally,forward calls:
certainty metadata field is requested alongside each result but is not currently surfaced in the returned dspy.Prediction. Passages are extracted from the document_text property of each returned object, and any object with an empty or missing document_text is silently dropped.