Illustrates the core data flow of a voice chat assistant, covering user input, ASR, intent classification, context pooling, knowledge retrieval, and respon
sequenceDiagram
participant U as 用户
participant F as 前端
participant S as Server API
participant A as Agent A
participant Pool as Context Pool<br/>(Redis)
participant B as Agent B
participant C as Agent C
participant D as Agent D
participant DB as MongoDB
U->>F: 访问 /chat?merchant=dongli&userId=uuid123&mode=voice
F->>S: POST /api/user-enter
S->>D: 系统通知: 用户进入
D->>DB: 写入: 用户进入事件
U->>F: 语音输入 (按住说话)
F->>S: POST /api/process-input (audio)
S->>A: processInput(uuid123, audio)
A->>A: ASR转文字 + 意图分类
A->>Pool: addTurn(user问题)
A->>Bus: publish(A→B)
Bus->>B: 通知B有任务
B->>Pool: getRecentTurns(uuid123, 5条)
alt 缓存命中
B->>Pool: findSimilarAnswer()
Pool-->>B: 返回历史答案
B->>B: 润色回复 + TTS
else 缓存未命中
B->>Bus: publish(B→C)
Bus->>C: 通知C检索
C->>JSON: 搜索知识库
C->>Pool: 查上下文(多条结果时)
C->>Bus: publish(C→B, 结果)
B->>B: 生成回复 + TTS
end
B->>Pool: addTurn(assistant回复)
B->>Bus: publish(B→USER)
Bus->>S: responseStore.save(traceId)
D->>DB: 写入: 完整流程日志
F->>S: GET /api/poll-response?traceId=xxx
S-->>F: 返回回复 + audioBase64
F->>U: 播放语音 + 显示文字
This sequence diagram details the end-to-end interaction for a voice chat assistant. It starts with a user accessing a chat interface, triggering a user entry event logged in MongoDB. Upon voice input, the audio is sent to Agent A for Automatic Speech Recognition (ASR) and intent classification. The user's query is added to a Redis-based Context Pool. Agent B then retrieves recent turns from the pool, attempts to find a similar answer in cache, and either generates a response or involves Agent C for knowledge base retrieval (from JSON). Agent B then generates the final assistant response, adds it to the Context Pool, and publishes it back to the Server API. Agent D logs the complete interaction flow in MongoDB, and the response is polled by the frontend to be displayed and played to the user.
This pattern is useful for designing conversational AI systems, voice assistants, chatbots, or customer service automation platforms that require real-time voice processing, context management, knowledge retrieval, and asynchronous communication between microservices or agents. It's particularly relevant for systems needing to maintain conversation history and leverage cached responses.
This diagram can be adapted by integrating additional agents for specific tasks (e.g., sentiment analysis, transaction processing), using different knowledge bases (e.g., external APIs, RAG), implementing alternative caching strategies, or introducing a message queue for more robust asynchronous communication. The Context Pool could be extended with more sophisticated session management, and the ASR/TTS components could be swapped with different providers.