AI Models

Overview

OpenWhispr integrates with multiple AI providers for intelligent text processing when you address your named agent (e.g., “Hey Jarvis, summarize this”).

Provider Types

Cloud Providers

OpenAI, Anthropic, Google Gemini, Groq - API-based models

Local Providers

Qwen, Llama, Mistral, Gemma, GPT-OSS - Privacy-first GGUF models via llama.cpp

Cloud Providers

OpenAI

string

openai

endpoint

string

https://api.openai.com/v1/responses (Responses API) or /chat/completions (fallback)

Available Models

GPT-5 Series (Latest)

GPT-5.2 (gpt-5.2)

Latest flagship reasoning model
Best for complex tasks requiring deep reasoning

GPT-5 Mini (gpt-5-mini)

Fast and cost-efficient
Good balance for most use cases

GPT-5 Nano (gpt-5-nano)

Ultra-fast, low latency
Best for real-time processing

GPT-4.1 Series

GPT-4.1 (gpt-4.1)

Strong baseline model
1M token context window

GPT-4.1 Mini (gpt-4.1-mini)

Smaller, faster version
Good for shorter tasks

GPT-4.1 Nano (gpt-4.1-nano)

Lowest latency GPT-4.1 variant

GPT-5 and o-series models use the new Responses API (September 2025). The system automatically falls back to Chat Completions API for older models or if Responses API is unavailable.

Anthropic

string

anthropic

endpoint

string

https://api.anthropic.com/v1/messages (via IPC bridge to avoid CORS)

Available Models

Claude Opus 4.6 (claude-opus-4-6)

Most capable Claude model
Best for complex reasoning tasks

Claude Sonnet 4.6 (claude-sonnet-4-6)

Balanced performance and speed
Recommended for general use

Claude Haiku 4.5 (claude-haiku-4-5)

Fast with near-frontier intelligence
Best for quick tasks

Anthropic API calls are routed through the main process via IPC to avoid CORS restrictions in the renderer process.

Google Gemini

string

gemini

endpoint

string

https://generativelanguage.googleapis.com/v1beta

Available Models

Gemini 3.1 Pro (gemini-3.1-pro-preview)

Next-gen flagship model for complex reasoning
Largest context window
2000+ token minimum output

Gemini 3 Flash (gemini-3-flash-preview)

Ultra-fast, high-capability next-gen model
Good balance of speed and intelligence

Gemini 2.5 Flash Lite (gemini-2.5-flash-lite)

Lowest latency and cost
Best for simple cleanup tasks

Groq

string

groq

endpoint

string

https://api.groq.com/openai/v1/chat/completions

Available Models

Qwen Models

Qwen3 32B (qwen/qwen3-32b)

Powerful reasoning model
131K context window
Thinking mode disabled for speed

OpenAI OSS Models (via Groq)

GPT-OSS 120B (openai/gpt-oss-120b)

OpenAI’s open-source flagship
500 tokens/sec throughput

GPT-OSS 20B (openai/gpt-oss-20b)

Fast open-source model
1000 tokens/sec throughput

Meta Llama Models

LLaMA 3.3 70B (llama-3.3-70b-versatile)

Meta’s versatile model
280 tokens/sec

LLaMA 3.1 8B (llama-3.1-8b-instant)

Ultra-fast: 560 tokens/sec
131K context window

Llama 4 Scout (meta-llama/llama-4-scout-17b-16e-instruct)

Meta’s efficient multimodal model
750 tokens/sec

Groq Compound Models

Compound (groq/compound)

Groq’s compound system
450 tokens/sec

Compound Mini (groq/compound-mini)

Fast compound system
3x lower latency

Moonshot AI

Kimi K2 0905 (moonshotai/kimi-k2-instruct-0905)

Moonshot AI’s 1T MoE model
256K context window

Local Providers

Local models run entirely on your device using llama.cpp for maximum privacy. All models are in GGUF format.

Qwen (Alibaba)

string

qwen

baseUrl

string

https://huggingface.co

promptTemplate

string

ChatML format: <|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{user}<|im_end|>\n<|im_start|>assistant\n

Qwen3 Series (Latest - Thinking Mode Support)

Qwen3 8B (Recommended)

Model ID: qwen3-8b-q4_k_m

size

string

5.0GB

quantization

string

Q4_K_M (4-bit medium quantization)

contextLength

number

131,072 tokens

hfRepo

string

Qwen/Qwen3-8B-GGUF

fileName

string

Qwen3-8B-Q4_K_M.gguf

description

string

Latest Qwen3 with thinking mode support. Best for general reasoning tasks.

recommended

boolean

true

Other Qwen3 Models

Qwen3 8B (Q5) (qwen3-8b-q5_k_m): 5.9GB, higher quality
Qwen3 4B (qwen3-4b-q4_k_m): 2.5GB, compact with reasoning
Qwen3 1.7B (qwen3-1.7b-q8_0): 1.8GB, small but capable
Qwen3 0.6B (qwen3-0.6b-q8_0): 0.6GB, for edge devices
Qwen3 32B (qwen3-32b-q4_k_m): 19.8GB, most powerful local model

Qwen2.5 Series (Legacy)

Qwen2.5 7B (qwen2.5-7b-instruct-q4_k_m): 4.7GB, 128K context
Qwen2.5 7B (Q5) (qwen2.5-7b-instruct-q5_k_m): 5.4GB, higher quality
Qwen2.5 3B (qwen2.5-3b-instruct-q5_k_m): 2.4GB, balanced
Qwen2.5 1.5B (qwen2.5-1.5b-instruct-q5_k_m): 1.3GB, basic tasks
Qwen2.5 0.5B (qwen2.5-0.5b-instruct-q5_k_m): 0.5GB, fastest

Mistral AI

string

mistral

promptTemplate

string

Mistral format: [INST] {system}\n\n{user} [/INST]

Mistral 7B Instruct v0.3 (mistral-7b-instruct-v0.3-q4_k_m) — Recommended

Size: 4.4GB
Context: 32,768 tokens
Fast and efficient instruction model

Mistral 7B Instruct v0.3 (Q5) (mistral-7b-instruct-v0.3-q5_k_m)

Size: 5.1GB
Higher quality version

Meta Llama

string

llama

promptTemplate

string

Llama 3 format:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{user}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

Llama 3.2 3B (llama-3.2-3b-instruct-q4_k_m) — Recommended

Size: 2.0GB
Context: 131,072 tokens
Small but capable multilingual model

Llama 3.2 1B (llama-3.2-1b-instruct-q4_k_m)

Size: 0.8GB
Tiny model for edge devices

Llama 3.1 8B (llama-3.1-8b-instruct-q4_k_m)

Size: 4.9GB
Powerful model with great performance

OpenAI OSS

string

openai-oss

promptTemplate

string

ChatML format (same as Qwen)

GPT-OSS 20B (gpt-oss-20b-mxfp4) — Recommended

Size: 12.1GB
Quantization: MXFP4 (4-bit microscaling float)
Context: 128,000 tokens
OpenAI’s open-weight model for consumer hardware

Gemma (Google)

string

gemma

promptTemplate

string

Gemma format: <bos><start_of_turn>user\n{system}\n\n{user}<end_of_turn>\n<start_of_turn>model\n

Gemma 3 4B (gemma-3-4b-it-q4_k_m) — Recommended

Size: 2.49GB
Context: 131,072 tokens
Great balance of speed and quality

Gemma 3 1B (gemma-3-1b-it-q4_k_m)

Size: 0.81GB
Ultra-fast, best for short dictation cleanup

Using AI Models

Via ReasoningService

import reasoningService from '@/services/ReasoningService';

const result = await reasoningService.processText(
  'Transcribed text here',
  'gpt-5-mini', // model ID
  'Jarvis', // agent name
  {
    systemPrompt: 'Custom system prompt (optional)',
    temperature: 0.3,
    maxTokens: 4096
  }
);

console.log(result); // Processed text

API Call Example (OpenAI)

// Automatically uses Responses API for GPT-5/o-series
const apiKey = await window.electronAPI.getOpenAIKey();

const response = await fetch('https://api.openai.com/v1/responses', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${apiKey}`
  },
  body: JSON.stringify({
    model: 'gpt-5-mini',
    input: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Clean up this text: ...' }
    ],
    store: false
  })
});

const data = await response.json();
const text = data.output.find(item => 
  item.type === 'message'
)?.content?.find(c => 
  c.type === 'output_text'
)?.text;

Local Model Download

const result = await window.electronAPI.downloadLocalModel(
  'qwen3-8b-q4_k_m',
  (progress) => {
    console.log(`Downloading: ${progress.percentage}%`);
  }
);

if (result.success) {
  console.log(`Model downloaded to: ${result.path}`);
}

Check Local Model Availability

const available = await window.electronAPI.checkLocalReasoningAvailable();

if (available) {
  console.log('llama.cpp server is ready');
} else {
  console.log('Local reasoning unavailable');
}

Model Registry

All models are defined in src/models/modelRegistryData.json as a single source of truth:

{
  "cloudProviders": [
    {
      "id": "openai",
      "name": "OpenAI",
      "models": [
        {
          "id": "gpt-5.2",
          "name": "GPT-5.2",
          "description": "Latest flagship reasoning model"
        }
      ]
    }
  ],
  "localProviders": [
    {
      "id": "qwen",
      "name": "Qwen",
      "models": [
        {
          "id": "qwen3-8b-q4_k_m",
          "name": "Qwen3 8B",
          "size": "5.0GB",
          "hfRepo": "Qwen/Qwen3-8B-GGUF",
          "fileName": "Qwen3-8B-Q4_K_M.gguf",
          "recommended": true
        }
      ]
    }
  ]
}

Model Provider Detection

import { getModelProvider, getCloudModel } from '@/models/ModelRegistry';

const provider = getModelProvider('gpt-5-mini');
console.log(provider); // 'openai'

const model = getCloudModel('claude-sonnet-4-6');
console.log(model); // { id: 'claude-sonnet-4-6', name: 'Claude Sonnet 4.6', ... }

API Key Management

// Get API keys (cached for performance)
const openaiKey = await window.electronAPI.getOpenAIKey();
const anthropicKey = await window.electronAPI.getAnthropicKey();
const geminiKey = await window.electronAPI.getGeminiKey();
const groqKey = await window.electronAPI.getGroqKey();

// Save API keys (automatically persists to .env)
await window.electronAPI.saveOpenAIKey('sk-...');
await window.electronAPI.saveAnthropicKey('sk-ant-...');
await window.electronAPI.saveGeminiKey('AIza...');
await window.electronAPI.saveGroqKey('gsk_...');

// Clear API key cache after updating
reasoningService.clearApiKeyCache('openai');

API keys are stored in environment variables and automatically reloaded on app start. Keys are cached in memory during runtime for better performance.

Custom Reasoning Endpoint

For self-hosted or custom OpenAI-compatible APIs:

import { saveSettings } from '@/stores/settingsStore';

await saveSettings({
  reasoningProvider: 'custom',
  cloudReasoningBaseUrl: 'https://your-api.com/v1',
  customReasoningApiKey: 'your-api-key'
});

Custom endpoints must use HTTPS (HTTP only allowed for local network: localhost, 127.0.0.1, 192.168.*, 10.*).

Thinking Mode

Models with Thinking/Reasoning Support

Cloud Models:

GPT-5 series (via Responses API)
Claude Opus/Sonnet 4.6 (extended thinking)
Gemini 3.1 Pro (reasoning mode)

Local Models:

Qwen3 series (thinking mode in ChatML format)
GPT-OSS 20B (reasoning capabilities)

Disabled for Speed:

Groq Qwen models (set reasoning_effort: "none" for faster inference)

Token Limits

// From src/config/constants.ts
const TOKEN_LIMITS = {
  MIN_TOKENS: 512,
  MAX_TOKENS: 8192,
  MIN_TOKENS_GEMINI: 2000, // Gemini 3.1 Pro requires higher minimum
  MAX_TOKENS_GEMINI: 8192,
  TOKEN_MULTIPLIER: 2 // Output tokens = input length * 2
};

The system automatically calculates appropriate max_tokens based on input length.

IPC API

Models

Overview

Provider Types

Cloud Providers

Local Providers

Cloud Providers

OpenAI

Available Models

Anthropic

Available Models

Google Gemini

Available Models

Groq

Available Models

Local Providers

Qwen (Alibaba)

Qwen3 Series (Latest - Thinking Mode Support)

Mistral AI

Meta Llama

OpenAI OSS

Gemma (Google)

Using AI Models

Via ReasoningService

API Call Example (OpenAI)

Local Model Download

Check Local Model Availability

Model Registry

Model Provider Detection

API Key Management

Custom Reasoning Endpoint

Thinking Mode

Token Limits

Build docs developers (and LLMs) love

IPC API

Models

​Overview

​Provider Types

Cloud Providers

Local Providers

​Cloud Providers

​OpenAI

​Available Models

​Anthropic

​Available Models

​Google Gemini

​Available Models

​Groq

​Available Models

​Local Providers

​Qwen (Alibaba)

​Qwen3 Series (Latest - Thinking Mode Support)

​Mistral AI

​Meta Llama

​OpenAI OSS

​Gemma (Google)

​Using AI Models

​Via ReasoningService

​API Call Example (OpenAI)

​Local Model Download

​Check Local Model Availability

​Model Registry

​Model Provider Detection

​API Key Management

​Custom Reasoning Endpoint

​Thinking Mode

​Token Limits

Build docs developers (and LLMs) love

Overview

Provider Types

Cloud Providers

OpenAI

Available Models

Anthropic

Available Models

Google Gemini

Available Models

Groq

Available Models

Local Providers

Qwen (Alibaba)

Qwen3 Series (Latest - Thinking Mode Support)

Mistral AI

Meta Llama

OpenAI OSS

Gemma (Google)

Using AI Models

Via ReasoningService

API Call Example (OpenAI)

Local Model Download

Check Local Model Availability

Model Registry

Model Provider Detection

API Key Management

Custom Reasoning Endpoint

Thinking Mode

Token Limits