Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/esphome/esphome.io/llms.txt

Use this file to discover all available pages before exploring further.

ESPHome’s voice_assistant component turns any ESP32 device with a microphone into a local voice assistant. It streams microphone audio to Home Assistant’s Assist pipeline, which handles wake-word detection, speech-to-text, intent processing, and text-to-speech — all locally without cloud services. Responses can be played back through a connected speaker or media player.
Voice Assistant requires Home Assistant 2023.5 or later.
Audio and voice components consume significant RAM and CPU. Crashes may occur if you include too many additional components, especially Bluetooth/BLE. If you experience crashes, consult the Troubleshooting guide for backtrace instructions.

Minimal Example

i2s_audio:
  i2s_lrclk_pin: GPIO25
  i2s_bclk_pin: GPIO26

microphone:
  - platform: i2s_audio
    id: board_mic
    i2s_din_pin: GPIO23
    adc_type: external

speaker:
  - platform: i2s_audio
    id: board_speaker
    i2s_dout_pin: GPIO22

voice_assistant:
  microphone: board_mic
  speaker: board_speaker

Configuration Variables

microphone
Microphone ID or list
The microphone source(s) for audio input. A single source can be provided directly, or a list of up to two sources for dual-channel streaming (see Dual Microphone below).
speaker
ID
The speaker component to use for TTS response playback. Cannot be used together with media_player.
media_player
ID
The media player component to use for TTS response playback. Cannot be used together with speaker.
micro_wake_word
ID
The micro_wake_word component for on-device wake-word detection. When configured, Home Assistant can remotely change which wake-word model is active.
use_wake_word
boolean
Enable wake-word detection in the Home Assistant Assist pipeline. Defaults to false.
conversation_timeout
Time
How long to preserve conversation context before resetting the conversation_id. Defaults to 300s.
noise_suppression_level
int 0–4
Noise suppression level applied in the Assist pipeline. 0 = disabled (default).
auto_gain
dBFS 0–31
Automatic gain control level. 0dBFS = disabled (default).
volume_multiplier
float
Volume scaling multiplier applied to the microphone. Must be > 0. Defaults to 1 (disabled).

Automation Triggers

Pipeline State Triggers

voice_assistant:
  microphone: board_mic
  speaker: board_speaker
  on_listening:
    - light.turn_on:
        id: status_led
        effect: "pulse"
  on_stt_vad_start:
    - logger.log: "Voice activity detected"
  on_stt_end:
    - logger.log:
        format: "You said: %s"
        args: [x.c_str()]
  on_tts_start:
    - logger.log:
        format: "Response: %s"
        args: [x.c_str()]
  on_tts_end:
    - logger.log:
        format: "Audio URL: %s"
        args: [x.c_str()]
  on_end:
    - light.turn_off: status_led
  on_error:
    - logger.log:
        format: "Error: %s - %s"
        args: [code.c_str(), message.c_str()]

Wake Word Triggers

voice_assistant:
  on_wake_word_detected:
    - logger.log: "Wake word heard!"
    - light.turn_on: status_led

Client Connection Triggers

voice_assistant:
  on_client_connected:
    - logger.log: "Home Assistant connected"
  on_client_disconnected:
    - logger.log: "Home Assistant disconnected"

Intent Triggers

voice_assistant:
  on_intent_start:
    - logger.log: "Processing intent..."
  on_intent_progress:
    - lambda: |-
        if (!x.empty()) {
          ESP_LOGI("va", "Streaming TTS URL: %s", x.c_str());
        }
  on_intent_end:
    - logger.log: "Intent complete"

TTS Stream Triggers (requires speaker)

voice_assistant:
  speaker: board_speaker
  on_tts_stream_start:
    - logger.log: "TTS audio starting"
  on_tts_stream_end:
    - light.turn_off: status_led

Timer Triggers

voice_assistant:
  on_timer_started:
    - logger.log:
        format: "Timer started: %d seconds"
        args: [timer.total_seconds]
  on_timer_finished:
    - rtttl.play: "alert:d=4,o=5,b=100:e,e,e"
  on_timer_cancelled:
    - logger.log: "Timer cancelled"
  on_timer_tick:
    - lambda: |-
        for (auto &t : timers) {
          ESP_LOGI("timer", "Remaining: %d s", t.seconds_left);
        }

Voice Assistant Actions

voice_assistant.start

Listen for a single voice command, then stop. Silence detection automatically determines when the user has finished speaking.
on_...:
  - voice_assistant.start:
      silence_detection: true    # default: true
      wake_word: "hey jarvis"    # optional: wake word that triggered start

voice_assistant.start_continuous

Continuously listen for commands. Automatically restarts after each response. Call voice_assistant.stop to end the cycle.
on_...:
  - voice_assistant.start_continuous:

voice_assistant.stop

Stop the current listening session or continuous cycle.
on_...:
  - voice_assistant.stop:

Conditions

voice_assistant.is_running

Returns true if the voice assistant is currently active.

voice_assistant.connected

Returns true if Home Assistant is connected and ready.

Usage Patterns

Push to Talk

Hold a button to listen, release to send.
voice_assistant:
  microphone:
    microphone: board_mic
    channels: 0
    gain_factor: 4
  speaker: board_speaker

binary_sensor:
  - platform: gpio
    pin: GPIO0
    name: "PTT Button"
    on_press:
      - voice_assistant.start:
          silence_detection: false   # manual release controls end
    on_release:
      - voice_assistant.stop:

Click to Toggle

Click once to start listening; click again to stop.
binary_sensor:
  - platform: gpio
    pin: GPIO0
    name: "VA Button"
    on_click:
      - if:
          condition: voice_assistant.is_running
          then:
            - voice_assistant.stop:
          else:
            - voice_assistant.start_continuous:

Always-On with Wake Word (Micro Wake Word)

micro_wake_word:
  models:
    - model: hey_jarvis
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

voice_assistant:
  microphone: board_mic
  speaker: board_speaker
  micro_wake_word: mww_component
  on_end:
    - micro_wake_word.start:    # re-arm wake word after each interaction

esphome:
  on_boot:
    - micro_wake_word.start:    # start listening immediately on boot

Dual Microphone Channel

Provide two microphone sources for improved voice pipeline performance. Home Assistant 2026.6+ is required to use the second channel.
voice_assistant:
  id: va
  microphone:
    - microphone: i2s_mics
      channels: 0    # processed audio (with gain, noise suppression)
    - microphone: i2s_mics
      channels: 1    # raw audio
  speaker: board_speaker
Both microphone sources must provide 16 kHz, 16-bit, mono audio as required by the Assist pipeline.

LED Feedback Example

Provide clear visual feedback throughout the voice pipeline.
light:
  - platform: neopixelbus
    id: ring_light
    type: GRB
    pin: GPIO4
    num_leds: 12
    effects:
      - pulse:
          name: "listening_pulse"
          transition_length: 500ms
          update_interval: 500ms

voice_assistant:
  microphone: board_mic
  speaker: board_speaker
  on_listening:
    - light.turn_on:
        id: ring_light
        blue: 100%
        brightness: 80%
        effect: "listening_pulse"
  on_stt_vad_end:
    - light.turn_on:
        id: ring_light
        red: 100%
        green: 100%
        blue: 0%
        brightness: 60%
        effect: none
  on_tts_stream_start:
    - light.turn_on:
        id: ring_light
        green: 100%
        brightness: 60%
  on_end:
    - light.turn_off: ring_light
  on_error:
    - light.turn_on:
        id: ring_light
        red: 100%
        brightness: 80%
    - delay: 2s
    - light.turn_off: ring_light

Build docs developers (and LLMs) love