Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/sqlmorph/llms.txt

Use this file to discover all available pages before exploring further.

SQLMorph uses large language models in two distinct roles: completion models generate natural-language query variants in the JQE and TQA pipelines, and embedding models power the semantic evaluation metrics that compare column names by meaning rather than exact string match. All providers are accessed through a single ModelManager.create_model() factory that accepts a ModelProvider enum, a ModelType enum, and a provider-specific model name enum. This page shows how to configure each supported provider.

ModelManager factory

from src.core.model_manager import ModelManager, ModelProvider, ModelType
from src.core.model_manager.openai_model import OpenAIModel
import os

model = ModelManager.create_model(
    model_provider=ModelProvider.OPENAI,
    model_type=ModelType.COMPLETION,
    model_name=OpenAIModel.GPT_4O,
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)
create_model() returns an instance of the appropriate class (OpenAIChatCompletion, OllamaChatCompletion, HuggingFaceChatCompletion, or their embedding counterparts). All completion instances expose get_chat_completion(messages) and all embedding instances expose get_embedding(input_data).
model_provider
ModelProvider
required
Selects the backend. One of ModelProvider.OPENAI, ModelProvider.OLLAMA, or ModelProvider.HUGGINGFACE.
model_type
ModelType
required
ModelType.COMPLETION for chat/instruction models; ModelType.EMBEDDING for embedding models.
model_name
OpenAIModel | OllamaModel | HuggingFaceModel
required
A provider-specific enum value identifying the model. Must match the chosen model_provider.
openai_api_key
string
Your OpenAI API key. Required when model_provider is ModelProvider.OPENAI. Pass os.getenv("OPENAI_API_KEY") after sourcing scripts/load_dotenv.sh.
portkey_api_key
string
Optional Portkey gateway API key. When set, all OpenAI requests are routed through the Portkey gateway for observability and caching.
portkey_config_id
string
Optional Portkey config ID for advanced routing and fallback rules. Used together with portkey_api_key.

Provider configuration

Set OPENAI_API_KEY in your .env file and source it before running experiments:
source scripts/load_dotenv.sh
Completion models
from src.core.model_manager import ModelManager, ModelProvider, ModelType
from src.core.model_manager.openai_model import OpenAIModel
import os

llm = ModelManager.create_model(
    model_provider=ModelProvider.OPENAI,
    model_type=ModelType.COMPLETION,
    model_name=OpenAIModel.GPT_4O,
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)

response = llm.get_chat_completion(
    messages=[{"role": "user", "content": "Generate a SQL query for..."}]
)
Embedding models
from src.core.model_manager import ModelManager, ModelProvider, ModelType
from src.core.model_manager.openai_model import OpenAIModel
import os

embedder = ModelManager.create_model(
    model_provider=ModelProvider.OPENAI,
    model_type=ModelType.EMBEDDING,
    model_name=OpenAIModel.TEXT_EMBEDDING_3_SMALL,
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)

vector = embedder.get_embedding("california_schools")
Available OpenAI models
Enum valueModel stringType
OpenAIModel.GPT_52gpt-5.2Completion
OpenAIModel.O1_PREVIEWo1-previewCompletion
OpenAIModel.O1_MINIo1-miniCompletion
OpenAIModel.GPT_4Ogpt-4oCompletion
OpenAIModel.GPT_4O_MINIgpt-4o-miniCompletion
OpenAIModel.GPT_4_TURBOgpt-4-turboCompletion
OpenAIModel.GPT_4gpt-4Completion
OpenAIModel.GPT_3_5_TURBOgpt-3.5-turboCompletion
OpenAIModel.TEXT_EMBEDDING_3_SMALLtext-embedding-3-smallEmbedding
OpenAIModel.TEXT_EMBEDDING_3_LARGEtext-embedding-3-largeEmbedding
OpenAIModel.TEXT_EMBEDDING_ADA_002text-embedding-ada-002Embedding
GPT-4o (OpenAIModel.GPT_4O) is the default model for JQE NL query generation. For metrics, set EMBEDDING_MODEL in scripts/metrics_config.sh to one of TEXT_EMBEDDING_3_SMALL, TEXT_EMBEDDING_3_LARGE, or TEXT_EMBEDDING_ADA_002.

Choosing a provider

Use caseRecommended providerNotes
JQE NL query generationOpenAI (GPT_4O)Default in the JQE pipeline.
TQA NL query generationOpenAI (GPT_4O)Requires OPENAI_API_KEY.
Semantic evaluation metricsOpenAI (embedding models)Only OpenAI embeddings are currently supported for metrics. Configure via EMBEDDING_MODEL in scripts/metrics_config.sh.
Local / offline experimentsOllamaNo API key required; requires the Ollama daemon.
Custom research pipelinesHuggingFaceFull control over model weights; downloads from HuggingFace Hub.

Build docs developers (and LLMs) love