This flowchart illustrates the process of handling multi-turn conversations with a Large Language Model, focusing on history management, prompt rendering,
%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#eef6ff", "primaryBorderColor": "#60a5fa", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
U["user_message"] --> H1["追加到 conversations 历史"]
H1 --> R["_render_prompt() / build_generation_prompt()<br>渲染 system / user / assistant 历史"]
R --> E["encode → input_ids"]
E --> C{"是否超出 context_length?"}
C -- 是 --> CL["裁剪最早历史轮次<br>保留最近上下文"]
C -- 否 --> G["model.generate()"]
CL --> G
G --> D["decode → assistant_text"]
D --> S["sanitize<br>清理 surrogate · 条件性追加 EOS"]
S --> H2["写回 conversations 历史"]
H2 --> N["下一轮继续"]
classDef hist fill:#eff6ff,stroke:#60a5fa,color:#0f172a;
classDef prompt fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
classDef run fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
classDef risk fill:#fff7ed,stroke:#fb923c,color:#0f172a;
class U,H1,H2,N hist;
class R,E prompt;
class G,D,S,CL run;
class C risk;
The diagram details the steps involved in processing a user message in a multi-turn LLM conversation. It starts by appending the user message to the conversation history, then renders a complete prompt from system, user, and assistant history. This prompt is encoded into input IDs, followed by a check for context length overflow. If overflow occurs, the earliest conversation history is truncated to retain recent context. The model then generates a response, which is decoded into assistant text. Finally, the response undergoes sanitization (e.g., cleaning surrogate characters, adding EOS tokens) before being written back to the conversation history, preparing for the next turn.
Useful for designing and implementing conversational AI systems, chatbots, or any application involving multi-turn interactions with large language models where managing conversation history and context window is crucial. It's particularly relevant for ensuring stable and coherent long-running dialogues.
This flow can be adapted by implementing different context truncation strategies (e.g., summarization, importance-based pruning), integrating external knowledge bases, or adding more sophisticated response validation and error handling. Custom sanitization rules can be developed for specific application needs. The prompt rendering logic can be extended to support various prompt engineering techniques.