This diagram illustrates the process of handling multi-turn conversations for a language model, including appending user messages, rendering prompts, encod
%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#eef6ff", "primaryBorderColor": "#60a5fa", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
U["user_message"] --> H1["追加到 conversations 历史"]
H1 --> R["_render_prompt() / build_generation_prompt()<br>渲染 system / user / assistant 历史"]
R --> E["encode → input_ids"]
E --> C{"是否超出 context_length?"}
C -- 是 --> CL["裁剪最早历史轮次<br>保留最近上下文"]
C -- 否 --> G["model.generate()"]
CL --> G
G --> D["decode → assistant_text"]
D --> S["sanitize<br>清理 surrogate · 条件性追加 EOS"]
S --> H2["写回 conversations 历史"]
H2 --> N["下一轮继续"]
classDef hist fill:#eff6ff,stroke:#60a5fa,color:#0f172a;
classDef prompt fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
classDef run fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
classDef risk fill:#fff7ed,stroke:#fb923c,color:#0f172a;
class U,H1,H2,N hist;
class R,E prompt;
class G,D,S,CL run;
class C risk;
It details the lifecycle of a user message in a multi-turn conversation, from being appended to history, rendered into a prompt, encoded, checked against context length (with truncation if needed), generated by the model, decoded, sanitized, and then stored back into history for subsequent turns. It highlights context management and prompt engineering aspects.
This diagram is useful when designing or understanding the conversational flow for AI chatbots, especially those based on large language models, where managing conversation history, context window, and prompt construction is critical for coherent multi-turn interactions. It's also relevant for debugging issues related to long conversations or token limits.
This flow can be adapted by changing the prompt rendering strategy, implementing different context truncation policies (e.g., summarization instead of simple truncation), integrating different sanitization rules, or adding steps for persona management or external tool calls within the conversation loop. The 'model.generate()' step can be replaced with specific API calls or custom inference logic.