Embedding

OpenAIEmbeddingModel

OpenAI-based embedding model for converting text to vector representations.

Constructor

OpenAIEmbeddingModel(
    model_name: str = "text-embedding-3-small",
    model_batch_size: int = 50,
    n_concurrent_jobs: int = 5,
)

model_name

str

default:"text-embedding-3-small"

OpenAI embedding model to use (e.g., “text-embedding-3-small”, “text-embedding-3-large”)

model_batch_size

int

default:"50"

Number of texts to embed in each batch

n_concurrent_jobs

int

default:"5"

Maximum number of concurrent API requests

Methods

embed()

Embed a list of texts into vector representations.

async def embed(texts: list[str]) -> list[list[float]]

texts

list[str]

required

List of text strings to embed

return

list[list[float]]

List of embedding vectors (one per input text)

Example:

from kura.embedding import OpenAIEmbeddingModel

model = OpenAIEmbeddingModel(
    model_name="text-embedding-3-large",
    model_batch_size=100,
    n_concurrent_jobs=10
)

texts = ["Hello world", "Embedding example"]
embeddings = await model.embed(texts)
print(f"Generated {len(embeddings)} embeddings")

SentenceTransformerEmbeddingModel

Local embedding model using Sentence Transformers (requires sentence-transformers package).

Constructor

SentenceTransformerEmbeddingModel(
    model_name: str = "all-MiniLM-L6-v2",
    model_batch_size: int = 128,
    device: str = "cpu",
)

model_name

str

default:"all-MiniLM-L6-v2"

Sentence Transformer model name from HuggingFace

model_batch_size

int

default:"128"

Number of texts to embed in each batch

device

str

default:"cpu"

Device to run model on (“cpu”, “cuda”, “mps”)

Methods

embed()

Embed a list of texts into vector representations.

async def embed(texts: list[str]) -> list[list[float]]

texts

list[str]

required

List of text strings to embed

return

list[list[float]]

List of embedding vectors (one per input text)

Example:

from kura.embedding import SentenceTransformerEmbeddingModel

# Use local model (no API calls)
model = SentenceTransformerEmbeddingModel(
    model_name="all-MiniLM-L6-v2",
    device="cuda"  # Use GPU if available
)

texts = ["Local embedding", "No API needed"]
embeddings = await model.embed(texts)

CohereEmbeddingModel

Cohere-based embedding model (requires cohere package).

Constructor

CohereEmbeddingModel(
    model_name: str = "embed-v4.0",
    model_batch_size: int = 96,
    n_concurrent_jobs: int = 5,
    input_type: str = "clustering",
    api_key: str | None = None,
)

model_name

str

default:"embed-v4.0"

Cohere embedding model to use

model_batch_size

int

default:"96"

Number of texts to embed in each batch

n_concurrent_jobs

int

default:"5"

Maximum number of concurrent API requests

input_type

str

default:"clustering"

Type of input for Cohere (“clustering”, “search_document”, “search_query”)

api_key

str | None

default:"None"

Cohere API key (if None, uses environment variable)

Methods

embed()

Embed a list of texts into vector representations.

async def embed(texts: list[str]) -> list[list[float]]

texts

list[str]

required

List of text strings to embed

return

list[list[float]]

List of embedding vectors (one per input text)

Example:

from kura.embedding import CohereEmbeddingModel

model = CohereEmbeddingModel(
    model_name="embed-v4.0",
    input_type="clustering"
)

texts = ["Cohere embedding", "Alternative provider"]
embeddings = await model.embed(texts)

embed_summaries()

Embed conversation summaries and return items ready for clustering. This is a utility function that wraps the embedding model to produce the dictionary format expected by clustering methods.

async def embed_summaries(
    summaries: list[ConversationSummary],
    embedding_model: BaseEmbeddingModel
) -> list[dict[str, Union[ConversationSummary, list[float]]]]

summaries

list[ConversationSummary]

required

List of conversation summaries to embed

embedding_model

BaseEmbeddingModel

required

Embedding model to use

return

list[dict[str, Union[ConversationSummary, list[float]]]]

List of dictionaries with “item” (ConversationSummary) and “embedding” (list[float]) keys

Example:

from kura.embedding import embed_summaries, OpenAIEmbeddingModel
from kura.cluster import KmeansClusteringModel

# Embed summaries
embedding_model = OpenAIEmbeddingModel()
embedded_items = await embed_summaries(summaries, embedding_model)

# Use with clustering
clustering_method = KmeansClusteringModel()
cluster_mapping = clustering_method.cluster(embedded_items)

Core Components

Data Types

Utilities

OpenAIEmbeddingModel

Constructor

Methods

embed()

SentenceTransformerEmbeddingModel

Constructor

Methods

embed()

CohereEmbeddingModel

Constructor

Methods

embed()

embed_summaries()

Build docs developers (and LLMs) love

Core Components

Data Types

Utilities

Documentation Index

​OpenAIEmbeddingModel

​Constructor

​Methods

​embed()

​SentenceTransformerEmbeddingModel

​Constructor

​Methods

​embed()

​CohereEmbeddingModel

​Constructor

​Methods

​embed()

​embed_summaries()

Build docs developers (and LLMs) love

OpenAIEmbeddingModel

Constructor

Methods

embed()

SentenceTransformerEmbeddingModel

Constructor

Methods

embed()

CohereEmbeddingModel

Constructor

Methods

embed()

embed_summaries()