This flowchart illustrates the real-time processing of ASR messages, including speaker tracking, text accumulation, silence detection, and the logic for tr
flowchart TD
Start([ASR消息输入]) --> CheckLen{长度 ≥ 3?}
CheckLen -- 否 --> Ignore[忽略消息]
CheckLen -- 是 --> UpdateTime[更新最后消息时间]
UpdateTime --> ExtractSpeaker[提取说话人信息]
ExtractSpeaker --> SameSpeaker{当前说话人<br/>已存在?}
SameSpeaker -- 否 --> NewSpeaker[设置当前说话人<br/>重置累积文本]
SameSpeaker -- 是 --> Accumulate[累积文本]
Accumulate --> CheckThreshold{累积字符 ≥ 最小值(10)?}
NewSpeaker --> CheckThreshold
CheckThreshold -- 否 --> Wait[等待更多音频]
CheckThreshold -- 是 --> StartSilence{已启动静音检测?}
StartSilence -- 否 --> StartTimer[启动静音计时器]
StartSilence -- 是 --> CheckSilence{静音 ≥ 阈值(2秒)?}
StartTimer --> Wait
CheckSilence -- 否 --> CheckForce{文本 ≥ 3倍阈值?}
CheckSilence -- 是 --> Trigger[触发分析]
CheckForce -- 是 --> Trigger
CheckForce -- 否 --> CheckTimeout{静音 ≥ 2倍阈值?}
CheckTimeout -- 是 --> Trigger
CheckTimeout -- 否 --> CheckSilence
Trigger --> RunAnalysis[[运行智能分析]]
RunAnalysis --> CheckResult{模型判定结果}
CheckResult -- true --> NeedsAI[需要启动智囊团]
CheckResult -- false --> NoAI[普通对话,无需AI]
NeedsAI --> Reset1[重置静音检测]
NoAI --> Reset2[重置静音检测]
Reset1 --> ResetSpeakerState[重置状态变量]
Reset2 --> ResetSpeakerState
ResetSpeakerState --> Ready[准备接收新消息]
Ready --> Start
Ignore --> Ready
Wait --> Start
%% 用户配置参数详细说明
subgraph ConfigArea [⚙️ 用户可配置参数]
direction TB
subgraph Basic [基础参数]
Config1["最小消息长度: 3字符<br/>过滤过短无效消息"]
Config2["累积阈值: 10字符<br/>达到后启动静音检测"]
end
subgraph Timing [时间参数]
Config3["静音阈值: 2秒<br/>首次满足触发条件"]
Config4["强制阈值: 3倍累积<br/>30字符强制触发分析"]
Config5["超时阈值: 4秒<br/>静音超时自动触发"]
end
subgraph Speaker [说话人参数]
Config6["声纹识别<br/>区分不同说话人"]
Config7["累积逻辑<br/>同一说话人累积,不同说话人重置"]
end
end
style Trigger fill:#ff9999
style RunAnalysis fill:#8B4513
style NeedsAI fill:#FF6B6B
style NoAI fill:#90EE90
style ResetSpeakerState fill:#90EE90
style CheckThreshold fill:#e1f5fe
style CheckSilence fill:#e1f5fe
style CheckForce fill:#e1f5fe
style CheckTimeout fill:#e1f5fe
style SameSpeaker fill:#e1f5fe
The diagram details the end-to-end process from ASR message input, through checks for message length, speaker identification, and text accumulation. It then outlines the sophisticated logic for initiating AI analysis, which involves monitoring silence duration, accumulated text length, and timeout conditions, all based on user-configurable parameters.
Use this diagram when designing, documenting, or explaining real-time voice interaction systems, intelligent assistants, or any application where ASR output needs to be processed dynamically to decide when to invoke an AI model for deeper analysis. It's particularly useful for systems requiring nuanced control over AI triggering based on conversational flow.
This flow can be adapted by adjusting the configurable parameters (message length, accumulation thresholds, silence durations) to suit different conversational speeds or AI response requirements. It can be extended to include sentiment analysis before AI triggering, integrate different speaker diarization methods, or incorporate more complex context-aware AI decision-making.