ASR Message Processing and Intelligent Analysis Flow

System Design · flowchart diagram · NOASSERTION

Details the real-time processing of ASR messages, including filtering, speaker identification, text accumulation, and logic for triggering intelligent anal

Source: https://github.com/lanzeweie/ASR-interview-coder-llm/blob/cad6297ac1dc16063215e57c175239b2019d44ab/README/Mermaid.md
Curated by lanzeweie
ASR Speech Recognition Real-time Processing Conversational AI System Design Flow Control Voice Assistant

Mermaid source

flowchart TD
    Start([ASR消息输入]) --> CheckLen{长度 ≥ 3？}
    CheckLen -- 否 --> Ignore[忽略消息]
    CheckLen -- 是 --> UpdateTime[更新最后消息时间]

    UpdateTime --> ExtractSpeaker[提取说话人信息]
    ExtractSpeaker --> SameSpeaker{当前说话人<br/>已存在？}

    SameSpeaker -- 否 --> NewSpeaker[设置当前说话人<br/>重置累积文本]
    SameSpeaker -- 是 --> Accumulate[累积文本]

    Accumulate --> CheckThreshold{累积字符 ≥ 最小值（10）？}
    NewSpeaker --> CheckThreshold

    CheckThreshold -- 否 --> Wait[等待更多音频]
    CheckThreshold -- 是 --> StartSilence{已启动静音检测？}

    StartSilence -- 否 --> StartTimer[启动静音计时器]
    StartSilence -- 是 --> CheckSilence{静音 ≥ 阈值（2秒）？}

    StartTimer --> Wait
    CheckSilence -- 否 --> CheckForce{文本 ≥ 3倍阈值？}
    CheckSilence -- 是 --> Trigger[触发分析]

    CheckForce -- 是 --> Trigger
    CheckForce -- 否 --> CheckTimeout{静音 ≥ 2倍阈值？}

    CheckTimeout -- 是 --> Trigger
    CheckTimeout -- 否 --> CheckSilence

    Trigger --> RunAnalysis[[运行智能分析]]
    RunAnalysis --> CheckResult{模型判定结果}

    CheckResult -- true --> NeedsAI[需要启动智囊团]
    CheckResult -- false --> NoAI[普通对话，无需AI]

    NeedsAI --> Reset1[重置静音检测]
    NoAI --> Reset2[重置静音检测]

    Reset1 --> ResetSpeakerState[重置状态变量]
    Reset2 --> ResetSpeakerState
    ResetSpeakerState --> Ready[准备接收新消息]
    Ready --> Start

    Ignore --> Ready
    Wait --> Start

    %% 用户配置参数详细说明
    subgraph ConfigArea [⚙️ 用户可配置参数]
        direction TB
        subgraph Basic [基础参数]
            Config1["最小消息长度: 3字符<br/>过滤过短无效消息"]
            Config2["累积阈值: 10字符<br/>达到后启动静音检测"]
        end
        subgraph Timing [时间参数]
            Config3["静音阈值: 2秒<br/>首次满足触发条件"]
            Config4["强制阈值: 3倍累积<br/>30字符强制触发分析"]
            Config5["超时阈值: 4秒<br/>静音超时自动触发"]
        end
        subgraph Speaker [说话人参数]
            Config6["声纹识别<br/>区分不同说话人"]
            Config7["累积逻辑<br/>同一说话人累积，不同说话人重置"]
        end
    end

    style Trigger fill:#ff9999
    style RunAnalysis fill:#8B4513
    style NeedsAI fill:#FF6B6B
    style NoAI fill:#90EE90
    style ResetSpeakerState fill:#90EE90
    style CheckThreshold fill:#e1f5fe
    style CheckSilence fill:#e1f5fe
    style CheckForce fill:#e1f5fe
    style CheckTimeout fill:#e1f5fe
    style SameSpeaker fill:#e1f5fe

What this diagram shows

This flowchart illustrates the complete lifecycle of an Automatic Speech Recognition (ASR) message, from initial input to triggering an intelligent analysis module. It covers message length validation, speaker identification and text accumulation, various conditions for initiating analysis (minimum text length, silence duration, forced trigger, timeout), and the subsequent reset of the system for new messages. It also highlights user-configurable parameters for fine-tuning the processing logic.

When to use it

Use this diagram when designing real-time conversational AI systems, voice assistants, meeting transcription services, or any application requiring intelligent processing of continuous ASR output. It's particularly useful for defining how to segment continuous speech into meaningful chunks for analysis, manage speaker turns, and optimize resource usage by triggering AI only when necessary.

How to adapt it for your project

This flow can be adapted by modifying the configurable parameters such as minimum message length, text accumulation thresholds, and silence detection durations to suit different conversational speeds or application requirements. The 'Run Analysis' module can be replaced with various AI models (e.g., intent recognition, sentiment analysis, summarization). Speaker identification logic can be enhanced with more sophisticated diarization techniques, and the 'Needs AI'/'No AI' decision can be based on more complex criteria.

Key concepts

ASR Processing
Real-time Speech Analysis
Silence Detection
Speaker Diarization
Configurable Thresholds