curl --request POST \
--url https://api.example.com/v1/chat/completions{
"id": "<string>",
"choices": [
{}
]
}curl --request POST \
--url https://api.example.com/v1/chat/completions{
"id": "<string>",
"choices": [
{}
]
}Documentation Index
Fetch the complete documentation index at: https://mintlify.com/concrete-security/umbra/llms.txt
Use this file to discover all available pages before exploring further.
confidential-chat.ts library to connect directly to the provider endpoint inside the TEE.POST {NEXT_PUBLIC_VLLM_BASE_URL}/v1/chat/completions
NEXT_PUBLIC_VLLM_BASE_URL environment variable or provided by the user in the UI.
Authorization header.NEXT_PUBLIC_VLLM_MODEL or user settings.role (“system”, “user”, or “assistant”) and content (string).{
"model": "Qwen/Qwen2.5-32B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Intel TDX?"
}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": true
}
message: Object with role and contentfinish_reason: Reason for completion (“stop”, “length”, etc.){
"id": "cmpl-123456",
"choices": [
{
"message": {
"role": "assistant",
"content": "Intel TDX (Trust Domain Extensions) is a confidential computing technology..."
},
"finish_reason": "stop"
}
]
}
stream: true, the server returns Server-Sent Events (SSE) with data: prefixed lines.
data: {"choices":[{"delta":{"content":"Intel"}}]}
data: {"choices":[{"delta":{"content":" TDX"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]
choices[0].delta.content: Content chunkchoices[0].delta.reasoning_content: Reasoning chunk (for models that support it)choices[0].finish_reason: Present in final chunk before [DONE]max_tokens too small or prompt too longimport { streamConfidentialChat } from "@/lib/confidential-chat";
import { createAtlsFetch } from "@phala/dcap-qvl-web";
// Create aTLS fetch with attestation verification
const atlasFetch = await createAtlsFetch({
attestationServiceUrl: "https://your-attestation-service.com",
verifyQuote: true,
});
// Stream chat completions
const stream = streamConfidentialChat(
{
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is Intel TDX?" },
],
model: "Qwen/Qwen2.5-32B-Instruct",
temperature: 0.7,
max_tokens: 4096,
stream: true,
},
{
provider: {
baseUrl: "https://your-provider.com",
apiKey: "your-bearer-token",
},
fetchImpl: atlasFetch, // Use aTLS fetch
}
);
// Process stream
for await (const chunk of stream) {
if (chunk.type === "delta") {
console.log(chunk.content);
} else if (chunk.type === "error") {
console.error("Error:", chunk.error);
} else if (chunk.type === "done") {
console.log("Complete:", chunk.content);
console.log("Finish reason:", chunk.finish_reason);
}
}
confidential-chat.ts library validates all messages:
NEXT_PUBLIC_VLLM_BASE_URL or user-providedNEXT_PUBLIC_VLLM_MODEL or user-providedNEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT or customNEXT_PUBLIC_DEFAULT_TEMPERATURE (default: 0.7)NEXT_PUBLIC_DEFAULT_MAX_TOKENS (default: 4098)reasoning_content in message objectsreasoning_delta chunksfetchImpl parameter must be provided, typically using createAtlsFetch from @phala/dcap-qvl-web.frontend/lib/confidential-chat.ts