Multi-turn LLM Conversation Flow with Context Management

ML & AI · flowchart diagram · MIT

This flowchart illustrates the process of handling multi-turn conversations with a Large Language Model, focusing on history management, prompt rendering,

Source: https://github.com/jiaran-king/MicroLM/blob/782ae02f10c14b484a317f22115a066b3b10b91d/Readme/%E9%A1%B9%E7%9B%AE%E5%85%A8%E6%99%AF%E5%9B%BE/00-%E5%85%A8%E6%B5%81%E7%A8%8B%E5%88%86%E6%9E%90%EF%BC%88%E8%AE%AD%E7%BB%83%E3%80%81%E6%8E%A8%E7%90%86%E3%80%81%E8%AF%84%E6%B5%8B%E4%B8%8E%E9%83%A8%E7%BD%B2%EF%BC%89.md
Curated by jiaran-king
LLM Chatbot Conversational AI Prompt Management Context Truncation State Management AI Workflow

Mermaid source

%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#eef6ff", "primaryBorderColor": "#60a5fa", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
    U["user_message"] --> H1["追加到 conversations 历史"]
    H1 --> R["_render_prompt() / build_generation_prompt()<br>渲染 system / user / assistant 历史"]
    R --> E["encode → input_ids"]
    E --> C{"是否超出 context_length?"}
    C -- 是 --> CL["裁剪最早历史轮次<br>保留最近上下文"]
    C -- 否 --> G["model.generate()"]
    CL --> G
    G --> D["decode → assistant_text"]
    D --> S["sanitize<br>清理 surrogate · 条件性追加 EOS"]
    S --> H2["写回 conversations 历史"]
    H2 --> N["下一轮继续"]

    classDef hist fill:#eff6ff,stroke:#60a5fa,color:#0f172a;
    classDef prompt fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
    classDef run fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
    classDef risk fill:#fff7ed,stroke:#fb923c,color:#0f172a;
    class U,H1,H2,N hist;
    class R,E prompt;
    class G,D,S,CL run;
    class C risk;

What this diagram shows

The diagram details the steps involved in processing a user message in a multi-turn LLM conversation. It starts by appending the user message to the conversation history, then renders a complete prompt from system, user, and assistant history. This prompt is encoded into input IDs, followed by a check for context length overflow. If overflow occurs, the earliest conversation history is truncated to retain recent context. The model then generates a response, which is decoded into assistant text. Finally, the response undergoes sanitization (e.g., cleaning surrogate characters, adding EOS tokens) before being written back to the conversation history, preparing for the next turn.

When to use it

Useful for designing and implementing conversational AI systems, chatbots, or any application involving multi-turn interactions with large language models where managing conversation history and context window is crucial. It's particularly relevant for ensuring stable and coherent long-running dialogues.

How to adapt it for your project

This flow can be adapted by implementing different context truncation strategies (e.g., summarization, importance-based pruning), integrating external knowledge bases, or adding more sophisticated response validation and error handling. Custom sanitization rules can be developed for specific application needs. The prompt rendering logic can be extended to support various prompt engineering techniques.

Key concepts

Multi-turn Conversation
Context Window Management
Prompt Engineering
Tokenization
Conversation History