ASR Message Processing and AI Trigger Flow

ML & AI · flowchart diagram · NOASSERTION

This flowchart illustrates the real-time processing of ASR messages, including speaker tracking, text accumulation, silence detection, and the logic for tr

Source: https://github.com/lanzeweie/ASR-interview-coder-llm/blob/cad6297ac1dc16063215e57c175239b2019d44ab/README/Mermaid.md
Curated by lanzeweie
ASR Real-time Processing AI Triggering Silence Detection Speaker Tracking Configurable Parameters Voice AI

Mermaid source

flowchart TD
    Start([ASR消息输入]) --> CheckLen{长度 ≥ 3？}
    CheckLen -- 否 --> Ignore[忽略消息]
    CheckLen -- 是 --> UpdateTime[更新最后消息时间]

    UpdateTime --> ExtractSpeaker[提取说话人信息]
    ExtractSpeaker --> SameSpeaker{当前说话人<br/>已存在？}

    SameSpeaker -- 否 --> NewSpeaker[设置当前说话人<br/>重置累积文本]
    SameSpeaker -- 是 --> Accumulate[累积文本]

    Accumulate --> CheckThreshold{累积字符 ≥ 最小值（10）？}
    NewSpeaker --> CheckThreshold

    CheckThreshold -- 否 --> Wait[等待更多音频]
    CheckThreshold -- 是 --> StartSilence{已启动静音检测？}

    StartSilence -- 否 --> StartTimer[启动静音计时器]
    StartSilence -- 是 --> CheckSilence{静音 ≥ 阈值（2秒）？}

    StartTimer --> Wait
    CheckSilence -- 否 --> CheckForce{文本 ≥ 3倍阈值？}
    CheckSilence -- 是 --> Trigger[触发分析]

    CheckForce -- 是 --> Trigger
    CheckForce -- 否 --> CheckTimeout{静音 ≥ 2倍阈值？}

    CheckTimeout -- 是 --> Trigger
    CheckTimeout -- 否 --> CheckSilence

    Trigger --> RunAnalysis[[运行智能分析]]
    RunAnalysis --> CheckResult{模型判定结果}

    CheckResult -- true --> NeedsAI[需要启动智囊团]
    CheckResult -- false --> NoAI[普通对话，无需AI]

    NeedsAI --> Reset1[重置静音检测]
    NoAI --> Reset2[重置静音检测]

    Reset1 --> ResetSpeakerState[重置状态变量]
    Reset2 --> ResetSpeakerState
    ResetSpeakerState --> Ready[准备接收新消息]
    Ready --> Start

    Ignore --> Ready
    Wait --> Start

    %% 用户配置参数详细说明
    subgraph ConfigArea [⚙️ 用户可配置参数]
        direction TB
        subgraph Basic [基础参数]
            Config1["最小消息长度: 3字符<br/>过滤过短无效消息"]
            Config2["累积阈值: 10字符<br/>达到后启动静音检测"]
        end
        subgraph Timing [时间参数]
            Config3["静音阈值: 2秒<br/>首次满足触发条件"]
            Config4["强制阈值: 3倍累积<br/>30字符强制触发分析"]
            Config5["超时阈值: 4秒<br/>静音超时自动触发"]
        end
        subgraph Speaker [说话人参数]
            Config6["声纹识别<br/>区分不同说话人"]
            Config7["累积逻辑<br/>同一说话人累积，不同说话人重置"]
        end
    end

    style Trigger fill:#ff9999
    style RunAnalysis fill:#8B4513
    style NeedsAI fill:#FF6B6B
    style NoAI fill:#90EE90
    style ResetSpeakerState fill:#90EE90
    style CheckThreshold fill:#e1f5fe
    style CheckSilence fill:#e1f5fe
    style CheckForce fill:#e1f5fe
    style CheckTimeout fill:#e1f5fe
    style SameSpeaker fill:#e1f5fe

What this diagram shows

The diagram details the end-to-end process from ASR message input, through checks for message length, speaker identification, and text accumulation. It then outlines the sophisticated logic for initiating AI analysis, which involves monitoring silence duration, accumulated text length, and timeout conditions, all based on user-configurable parameters.

When to use it

Use this diagram when designing, documenting, or explaining real-time voice interaction systems, intelligent assistants, or any application where ASR output needs to be processed dynamically to decide when to invoke an AI model for deeper analysis. It's particularly useful for systems requiring nuanced control over AI triggering based on conversational flow.

How to adapt it for your project

This flow can be adapted by adjusting the configurable parameters (message length, accumulation thresholds, silence durations) to suit different conversational speeds or AI response requirements. It can be extended to include sentiment analysis before AI triggering, integrate different speaker diarization methods, or incorporate more complex context-aware AI decision-making.

Key concepts

ASR Message Processing
Real-time Speech Analysis
Silence Detection
Speaker Tracking
AI Triggering Logic