Endpoints
| Method | Path |
|---|---|
POST | /v1/chat/completions |
POST | /v1/completions |
/v1/completions uses a prompt string rather than a messages array.
Request parameters
The model identifier to use for generation. Corresponds to the
base_model value returned by GET /v1/models. If omitted, the server selects the first available model.Array of message objects forming the conversation history. Each object must have
role (system, user, or assistant) and content (string or array for vision).Maximum number of new tokens to generate.
When
true, tokens are returned as server-sent events as they are generated.Sampling temperature between 0 and 2. Lower values produce more deterministic output.
Nucleus sampling probability mass.
Random seed.
0 means a random seed is chosen each request.One or more sequences at which generation stops.
Penalizes tokens based on how frequently they have appeared so far.
Penalizes tokens that have appeared at all in the generated text so far.
An optional user identifier. When h2oGPT authentication is enabled, pass
username:password here to authenticate.List of tools the model may call. Each tool must have
type: "function" and a function object with name and description.Set to
"auto" to let the model decide which tool to call.h2oGPT-specific parameters. Any field from
H2oGPTParams in openai_server/server.py can be passed here, for example langchain_mode, top_k_docs, system_prompt, and chat_conversation.Response
Unique identifier for this completion.
"chat.completion" or "chat.completion.chunk" for streaming.Unix timestamp when the completion was created.
The model used for generation.
Token usage statistics with
prompt_tokens, completion_tokens, and total_tokens.Non-streaming example
Streaming example
Text completions
POST /v1/completions accepts a prompt string instead of a messages array:
Vision / image understanding
To send an image alongside a text prompt, use theimage_url content type in the messages array. The URL can be an https:// URL or a base64-encoded data URI.
Vision requires a vision-capable model such as
OpenGVLab/InternVL-Chat-V1-5 or THUDM/cogvlm2-llama3-chat-19B. Load h2oGPT with that model set as visible_models.Tool calling
Pass a list of function definitions intools and set tool_choice="auto":
JSON mode
Setresponse_format to force JSON output:
json_schema:
response_format.type values: text, json_object, json_code, json_schema.
Authentication with user credentials
When the h2oGPT server uses--auth_access=closed, pass user as username:password: