Details the real-time processing of ASR messages, including filtering, speaker identification, text accumulation, and logic for triggering intelligent anal
flowchart TD
Start([ASR消息输入]) --> CheckLen{长度 ≥ 3?}
CheckLen -- 否 --> Ignore[忽略消息]
CheckLen -- 是 --> UpdateTime[更新最后消息时间]
UpdateTime --> ExtractSpeaker[提取说话人信息]
ExtractSpeaker --> SameSpeaker{当前说话人<br/>已存在?}
SameSpeaker -- 否 --> NewSpeaker[设置当前说话人<br/>重置累积文本]
SameSpeaker -- 是 --> Accumulate[累积文本]
Accumulate --> CheckThreshold{累积字符 ≥ 最小值(10)?}
NewSpeaker --> CheckThreshold
CheckThreshold -- 否 --> Wait[等待更多音频]
CheckThreshold -- 是 --> StartSilence{已启动静音检测?}
StartSilence -- 否 --> StartTimer[启动静音计时器]
StartSilence -- 是 --> CheckSilence{静音 ≥ 阈值(2秒)?}
StartTimer --> Wait
CheckSilence -- 否 --> CheckForce{文本 ≥ 3倍阈值?}
CheckSilence -- 是 --> Trigger[触发分析]
CheckForce -- 是 --> Trigger
CheckForce -- 否 --> CheckTimeout{静音 ≥ 2倍阈值?}
CheckTimeout -- 是 --> Trigger
CheckTimeout -- 否 --> CheckSilence
Trigger --> RunAnalysis[[运行智能分析]]
RunAnalysis --> CheckResult{模型判定结果}
CheckResult -- true --> NeedsAI[需要启动智囊团]
CheckResult -- false --> NoAI[普通对话,无需AI]
NeedsAI --> Reset1[重置静音检测]
NoAI --> Reset2[重置静音检测]
Reset1 --> ResetSpeakerState[重置状态变量]
Reset2 --> ResetSpeakerState
ResetSpeakerState --> Ready[准备接收新消息]
Ready --> Start
Ignore --> Ready
Wait --> Start
%% 用户配置参数详细说明
subgraph ConfigArea [⚙️ 用户可配置参数]
direction TB
subgraph Basic [基础参数]
Config1["最小消息长度: 3字符<br/>过滤过短无效消息"]
Config2["累积阈值: 10字符<br/>达到后启动静音检测"]
end
subgraph Timing [时间参数]
Config3["静音阈值: 2秒<br/>首次满足触发条件"]
Config4["强制阈值: 3倍累积<br/>30字符强制触发分析"]
Config5["超时阈值: 4秒<br/>静音超时自动触发"]
end
subgraph Speaker [说话人参数]
Config6["声纹识别<br/>区分不同说话人"]
Config7["累积逻辑<br/>同一说话人累积,不同说话人重置"]
end
end
style Trigger fill:#ff9999
style RunAnalysis fill:#8B4513
style NeedsAI fill:#FF6B6B
style NoAI fill:#90EE90
style ResetSpeakerState fill:#90EE90
style CheckThreshold fill:#e1f5fe
style CheckSilence fill:#e1f5fe
style CheckForce fill:#e1f5fe
style CheckTimeout fill:#e1f5fe
style SameSpeaker fill:#e1f5fe
This flowchart illustrates the complete lifecycle of an Automatic Speech Recognition (ASR) message, from initial input to triggering an intelligent analysis module. It covers message length validation, speaker identification and text accumulation, various conditions for initiating analysis (minimum text length, silence duration, forced trigger, timeout), and the subsequent reset of the system for new messages. It also highlights user-configurable parameters for fine-tuning the processing logic.
Use this diagram when designing real-time conversational AI systems, voice assistants, meeting transcription services, or any application requiring intelligent processing of continuous ASR output. It's particularly useful for defining how to segment continuous speech into meaningful chunks for analysis, manage speaker turns, and optimize resource usage by triggering AI only when necessary.
This flow can be adapted by modifying the configurable parameters such as minimum message length, text accumulation thresholds, and silence detection durations to suit different conversational speeds or application requirements. The 'Run Analysis' module can be replaced with various AI models (e.g., intent recognition, sentiment analysis, summarization). Speaker identification logic can be enhanced with more sophisticated diarization techniques, and the 'Needs AI'/'No AI' decision can be based on more complex criteria.