Experimental Feature: This feature is currently in testing, so stability is not guaranteed.
Traditional Voice Activity Detection (VAD) is excellent for segmenting speech based on pauses, but it can often break a single cohesive idea into multiple smaller transcripts if the speaker pauses to think. Many LLMs are hot to interrupt and respond to a user when a transcription is sent even when turn skipping tools are enabled. The Thought Detection feature solves this by analyzing the semantic and vocal content of the speech in real-time to determine when a user has finished expressing a complete thought, leading to less interruptions and a better voice assistant interaction.

How It Works

  1. Enable Thought Detection by adding a simple query parameter to your WebSocket URL.
  2. As you stream audio, the server transcribes it internally but does not immediately send back a transcript after every pause. Instead, it buffers these transcripts and keeps them as one longer string.
  3. Only when the model determines a thought is complete does the server send a single message of type complete_thought containing the full text of that idea.

Enabling the Feature

Activation is controlled by a single URL parameter.
detect_thoughts
boolean
default:"false"
Set this to true in the WebSocket connection URL to enable server-side thought detection.

Install the SDK mic addon

pip install fennec-asr[mic]

Python SDK Example

An SDK Example (mic_ws_continuous_thought_detection_sdk.py)
import os, asyncio
from dotenv import load_dotenv
from fennec_asr import Realtime
from fennec_asr.mic import stream_microphone

load_dotenv()

API_KEY = os.getenv("FENNEC_API_KEY")
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK_MS = 32
SINGLE_UTTERANCE = False
DETECT_THOUGHTS = True

VAD = {
    "threshold": 0.45,
    "min_silence_ms": 100,
    "speech_pad_ms": 200,
    "final_silence_s": 0.1,
    "start_trigger_ms": 36,
    "min_voiced_ms": 48,
    "min_chars": 1,
    "min_words": 1,
    "amp_extend": 1200,
    "force_decode_ms": 0,
    "debug": False,
}

async def main():
    if not API_KEY:
        raise RuntimeError("Set FENNEC_API_KEY")

    rt = (
        Realtime(API_KEY, sample_rate=SAMPLE_RATE, channels=CHANNELS, detect_thoughts=DETECT_THOUGHTS)
        .on("open",    lambda: print("✅ ready"))
        .on("thought", lambda t: print("🧠", t))
        .on("error",   lambda e: print("❌", e))
        .on("close",   lambda: print("👋 closed"))
    )

    rt._start_msg["single_utterance"] = SINGLE_UTTERANCE
    rt._start_msg["vad"] = VAD

    async with rt:
        await stream_microphone(rt, samplerate=SAMPLE_RATE, channels=CHANNELS, chunk_ms=CHUNK_MS)

if __name__ == "__main__":
    asyncio.run(main())

Example Interaction

Here is how the experience differs when speaking the same sentence: “I was thinking about the quarterly report… and it seems like the numbers for Q3 are a bit lower than we expected.”

Without Thought Detection

The application receives multiple, fragmented transcripts based purely on pauses.Received Transcripts:
I was thinking about the quarterly report
and it seems like the numbers for Q3 are a bit lower than we expected.

With Thought Detection

The application receives a single, semantically complete transcript after the user finishes their entire point.Received Transcript:
✅ Thought Complete: I was thinking about the quarterly report, and it seems like the numbers for Q3 are a bit lower than we expected.

Why is this useful?

Thought Detection helps AI voice agents to be more intuitive and realistic. It’s annoying for users to be interrupted by an overly enthusiastic LLM, and this is a simple solution to eliminate that from the start. It’s also useful for live transcriptions that have beautiful spacing out of the box. No more blocks of text!