Skip to main content
Experimental Feature: This feature is currently in testing, so stability is not guaranteed.
Traditional Voice Activity Detection (VAD) is excellent for segmenting speech based on pauses, but it can often break a single cohesive idea into multiple smaller transcripts if the speaker pauses to think. Many LLMs are hot to interrupt and respond to a user when a transcription is sent even when turn skipping tools are enabled. The Thought Detection feature solves this by analyzing the semantic and vocal content of the speech in real-time to determine when a user has finished expressing a complete thought, leading to less interruptions and a better voice assistant interaction.

How It Works

  1. Enable Thought Detection by sending a JSON start message on the WebSocket that includes detect_thoughts: true. You can also tune end_thought_eagerness and force_complete_time here.
  2. As you stream audio, the server transcribes it internally but does not immediately send back a transcript after every pause. Instead, it buffers these transcripts and keeps them as one longer string.
  3. Only when the model determines a thought is complete does the server send a single message of type complete_thought containing the full text of that idea.
The moment a thought is considered “complete” is tunable. Use end_thought_eagerness to make the detector more or less willing to close a thought, and force_complete_time to set a drop-dead timeout (in seconds) that will force emission of the current buffered thought after silence. This is helpful if a finished turn is incorrectly identified as an unfinished turn.

Enabling the Feature

Start Message Fields
{
  "type": "start",
  "sample_rate": 16000,
  "channels": 1,
  "detect_thoughts": true,
  "end_thought_eagerness": "medium",   // "low" | "medium" | "high"
  "force_complete_time": 2.0,          // seconds, will force complete thought so users aren't stranded (min 1, max 60)
  "context": "optional correction context string", // enables server-side transcript correction if provided
  "vad": { "...": "unchanged VAD settings" }
}

Tuning When a Thought Ends

These fields belong in the initial start message you send after opening the WebSocket. end_thought_eagerness (string, default: “medium”) Controls how aggressively the model closes a thought. Allowed values: “low”, “medium”, “high”. force_complete_time (number, default: 2.0) A drop-dead timer in seconds. If the model has not marked the current thought complete and the user has gone silent, the server will force-emit the buffered text once this many seconds have elapsed. Range: 1.0–60.0 seconds. Values outside this range are rejected. Note: You still enable the feature with detect_thoughts: true. These extra fields only tune when the thought ends.

Getting Started

Install the SDK mic addon

pip install fennec-asr[mic]

Python SDK Example

An SDK Example (mic_ws_continuous_thought_detection_sdk.py)
import os, asyncio
from dotenv import load_dotenv
from fennec_asr import Realtime
from fennec_asr.mic import stream_microphone

load_dotenv()

API_KEY = os.getenv("FENNEC_API_KEY")
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK_MS = 32
SINGLE_UTTERANCE = False
DETECT_THOUGHTS = True

VAD = {
    "threshold": 0.45,
    "min_silence_ms": 100,
    "speech_pad_ms": 200,
    "final_silence_s": 0.1,
    "start_trigger_ms": 36,
    "min_voiced_ms": 48,
    "min_chars": 1,
    "min_words": 1,
    "amp_extend": 1200,
    "force_decode_ms": 0,
    "debug": False,
}

async def main():
    if not API_KEY:
        raise RuntimeError("Set FENNEC_API_KEY")

    rt = (
        Realtime(API_KEY, sample_rate=SAMPLE_RATE, channels=CHANNELS, detect_thoughts=DETECT_THOUGHTS)
        .on("open",    lambda: print("✅ ready"))
        .on("thought", lambda t: print("🧠", t))
        .on("error",   lambda e: print("❌", e))
        .on("close",   lambda: print("👋 closed"))
    )

    rt._start_msg["single_utterance"] = SINGLE_UTTERANCE
    rt._start_msg["detect_thoughts"] = True
    rt._start_msg["end_thought_eagerness"] = "medium"   # "low" | "medium" | "high"
    rt._start_msg["force_complete_time"] = 2.0          # 1–60 seconds
    rt._start_msg["vad"] = VAD

    async with rt:
        await stream_microphone(rt, samplerate=SAMPLE_RATE, channels=CHANNELS, chunk_ms=CHUNK_MS)

if __name__ == "__main__":
    asyncio.run(main())
Don’t want to use the SDK? Here are some full code samples:
This client script includes a simple ENABLE_THOUGHT_DETECTION flag. When set to true, it automatically adjusts the WebSocket URL and the VAD settings, then listens for the specific complete_thought message from the server.
A Full Example (mic_ws_thought_detection.py)
import asyncio
import json
import signal
import sounddevice as sd
import websockets
import httpx
import os

ENABLE_THOUGHT_DETECTION = True

WS_BASE = "wss://api.fennec-asr.com/api/v1/transcribe/stream"
HTTP_TOKEN_URL = "https://api.fennec-asr.com/api/v1/transcribe/streaming-token"
API_KEY = os.getenv("FENNEC_API_KEY") or "YOUR_API_KEY"

SAMPLE_RATE = 16_000
CHANNELS = 1
DTYPE = "int16"
CHUNK_MS = 100
FRAMES_PER_CHUNK = int(SAMPLE_RATE * (CHUNK_MS / 1000.0))

START_MSG = {
    "type": "start",
    "sample_rate": 16000,
    "channels": 1,
    "single_utterance": False,
    "detect_thoughts": True,
    "end_thought_eagerness": "medium",   # "low" | "medium" | "high"
    "force_complete_time": 2.0,          # 1–60 seconds
    # "context": "optional correction context string",
    "vad": {
        "threshold": 0.45,
        "min_silence_ms": 400,
        "speech_pad_ms": 200,
        "final_silence_s": 0,
        "start_trigger_ms": 30,
        "min_voiced_ms": 100,
        "min_chars": 1,
        "min_words": 1,
        "amp_extend": 1200
    }
}

shutdown_event = asyncio.Event()

def _handle_sigint(*_):
    shutdown_event.set()

async def audio_sender(ws, stream):
    print("\n🎙️  Streaming mic… speak a full thought and pause. (Ctrl+C to stop)", flush=True)
    while not shutdown_event.is_set():
        try:
            data, _ = await asyncio.to_thread(stream.read, FRAMES_PER_CHUNK)
            await ws.send(bytes(data))
        except websockets.exceptions.ConnectionClosed:
            break
    try:
        if not ws.closed:
            await ws.send('{"type":"eos"}')
    except websockets.exceptions.ConnectionClosed:
        pass

async def message_receiver(ws):
    """Listens for messages and handles them based on the selected mode."""
    async for msg in ws:
        try:
            data = json.loads(msg)
            if data.get("type") == "complete_thought":
                text = data.get("text", "")
                print(f"\n✅ Thought Complete: {text}", flush=True)
        except Exception:
            # Ignore non-JSON messages
            pass

async def fetch_streaming_token() -> str:
    """
    Exchanges your API key for a short-lived streaming token.
    Send the token via ?streaming_token=... in the WS URL.
    """
    if not API_KEY or API_KEY == "YOUR_API_KEY":
        raise RuntimeError("Set FENNEC_API_KEY (or replace API_KEY).")
    async with httpx.AsyncClient(timeout=10) as client:
        resp = await client.post(
            HTTP_TOKEN_URL,
            headers={"X-API-Key": API_KEY, "content-type": "application/json"},
            json={},  # body not required; header is what matters
        )
        resp.raise_for_status()
        data = resp.json()
        token = data.get("token")
        if not token:
            raise RuntimeError(f"Token endpoint returned no token: {data}")
        return token

async def main():
    print("Fetching streaming token…")
    token = await fetch_streaming_token()
    WEBSOCKET_URL = f"{WS_BASE}?streaming_token={token}"
    print(f"Connecting to: {WEBSOCKET_URL}")
    try:
        async with websockets.connect(
                WEBSOCKET_URL,
                max_size=None,
                ping_interval=5
            ) as ws:
            await ws.send(json.dumps(START_MSG))
            print("✅ WebSocket connected and configured.")

            with sd.RawInputStream(
                samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE, blocksize=FRAMES_PER_CHUNK
            ) as stream:
                sender = asyncio.create_task(audio_sender(ws, stream))
                receiver = asyncio.create_task(message_receiver(ws))
                await asyncio.wait([sender, receiver], return_when=asyncio.FIRST_COMPLETED)
    except Exception as e:
        print(f"❌ An error occurred: {e}")

if __name__ == "__main__":
    if sd.query_devices(kind='input'):
        signal.signal(signal.SIGINT, _handle_sigint)
        asyncio.run(main())
    else:
        print("\n❌ No input microphone found.")

Example Interaction

Here is how the experience differs when speaking the same sentence: “I was thinking about the quarterly report… and it seems like the numbers for Q3 are a bit lower than we expected.”

Without Thought Detection

The application receives multiple, fragmented transcripts based purely on pauses.Received Transcripts:
I was thinking about the quarterly report
and it seems like the numbers for Q3 are a bit lower than we expected.

With Thought Detection

The application receives a single, semantically complete transcript after the user finishes their entire point.Received Transcript:
✅ Thought Complete: I was thinking about the quarterly report, and it seems like the numbers for Q3 are a bit lower than we expected.

Why is this useful?

Thought Detection helps AI voice agents to be more intuitive and realistic. It’s annoying for users to be interrupted by an overly enthusiastic LLM, and this is a simple solution to eliminate that from the start. It’s also useful for live transcriptions that have beautiful spacing out of the box. No more blocks of text!