Skip to main content

Overview

The chat endpoint processes user messages and returns AI-generated responses using the Groq LLM service. It supports streaming responses, rate limiting, and automatic authentication.

Endpoint

POST /api/chat

Request

Headers

Content-Type
string
required
Must be application/json

Body Parameters

messages
array
required
Array of message objects representing the conversation history. Minimum 1 message, maximum 100 messages.Each message object contains:
  • role (string, required): One of "user", "assistant", or "system"
  • content (string | object): Message content (max 10KB for strings)
  • id (string, optional): Message identifier
  • createdAt (string | Date, optional): Message timestamp
model
string
default:"llama-3.1-8b-instant"
AI model to use for generating responses. Currently supported:
  • llama-3.1-8b-instant (default)

Message Content Types

The content field supports multiple formats:

Simple Text

{
  "role": "user",
  "content": "What is the status of equipment UMA-001?"
}

Multi-part Content

{
  "role": "user",
  "content": {
    "parts": [
      {
        "type": "text",
        "text": "Analyze this equipment image"
      },
      {
        "type": "image",
        "imageUrl": "https://example.com/image.jpg",
        "mimeType": "image/jpeg"
      }
    ]
  }
}

File Attachments

{
  "role": "user",
  "content": {
    "parts": [
      {
        "type": "file",
        "data": "base64-encoded-data",
        "mediaType": "application/pdf"
      }
    ]
  }
}

Response

Success Response (200 OK)

Returns a streaming response using Server-Sent Events (SSE) format. The stream contains AI-generated text chunks.
stream
ReadableStream
Server-Sent Events stream containing AI response chunks

Error Responses

400 Bad Request
object
Returned when the request format is invalid or validation fails.
{
  "error": "Invalid request format",
  "details": [
    {
      "code": "too_small",
      "minimum": 1,
      "path": ["messages"],
      "message": "Se requiere al menos un mensaje"
    }
  ]
}
429 Too Many Requests
object
Returned when rate limit is exceeded. See Rate Limiting for details.
{
  "error": "Too Many Requests",
  "message": "Has excedido el límite de solicitudes. Intenta nuevamente en unos segundos.",
  "retryAfter": 30
}
Headers:
  • Retry-After: Seconds until next request is allowed
  • X-RateLimit-Remaining: 0
500 Internal Server Error
object
Returned when an unexpected error occurs during processing.
{
  "error": "Error al procesar la solicitud",
  "details": "Error message details"
}

Examples

Simple Chat Request

curl -X POST https://your-domain.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What maintenance tasks are pending?"
      }
    ]
  }'

Multi-turn Conversation

curl -X POST https://your-domain.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Show me HVAC equipment"
      },
      {
        "role": "assistant",
        "content": "I found 3 HVAC units..."
      },
      {
        "role": "user",
        "content": "What is the status of the first one?"
      }
    ],
    "model": "llama-3.1-8b-instant"
  }'

With Image Analysis

curl -X POST https://your-domain.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": {
          "parts": [
            {
              "type": "text",
              "text": "Identify this equipment part"
            },
            {
              "type": "image",
              "imageUrl": "https://example.com/part.jpg",
              "mimeType": "image/jpeg"
            }
          ]
        }
      }
    ]
  }'

Technical Details

Message History Limit

The API automatically limits conversation history to the last 8 messages to avoid token limit issues with the LLM provider. This is especially important for Groq’s llama-3.1-8b-instant model which has a 6000 TPM (tokens per minute) limit.

Authentication

The endpoint automatically attempts silent authentication if no valid session token exists. If authentication fails, the request continues but may encounter network errors when communicating with external AI services.

Streaming Configuration

  • Max Duration: 60 seconds (Vercel function timeout)
  • Stream Sources: Disabled
  • Stream Reasoning: Disabled
  • Stop Condition: Maximum 5 AI tool execution steps

IP Validation

Client IP addresses are validated before processing:
  • Development: Localhost connections allowed
  • Production: Valid public IP required
  • Invalid IPs receive a 400 Bad Request response

Build docs developers (and LLMs) love