LLM Single-Turn Text Generation Inference Path

ML & AI · flowchart diagram · MIT

This flowchart illustrates the end-to-end inference path for single-turn text generation in a Large Language Model, from prompt input to final output text.

Source: https://github.com/jiaran-king/MicroLM/blob/782ae02f10c14b484a317f22115a066b3b10b91d/Readme/%E9%A1%B9%E7%9B%AE%E5%85%A8%E6%99%AF%E5%9B%BE/00-%E5%85%A8%E6%B5%81%E7%A8%8B%E5%88%86%E6%9E%90%EF%BC%88%E8%AE%AD%E7%BB%83%E3%80%81%E6%8E%A8%E7%90%86%E3%80%81%E8%AF%84%E6%B5%8B%E4%B8%8E%E9%83%A8%E7%BD%B2%EF%BC%89.md
Curated by jiaran-king
LLM Inference Text Generation Tokenizer Sampling Deep Learning NLP

Mermaid source

%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#eef6ff", "primaryBorderColor": "#60a5fa", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
    P["prompt: str"] --> R["resolve_generation_prompt()<br>纯文本 or 对话 prompt"]
    R --> T["tokenizer.encode()<br>BPE 或 HF tokenizer"]
    T --> I["prompt_ids<br>list[int] → tensor(1, seq_len)"]
    I --> G["model.generate()<br>prefill + decode loop"]
    G --> S["sampling<br>temperature · top-p · EOS"]
    S --> O["generated_ids"]
    O --> D["tokenizer.decode()"]
    D --> Y["output_text"]

    classDef step fill:#eff6ff,stroke:#60a5fa,color:#0f172a;
    classDef sample fill:#fff7ed,stroke:#fb923c,color:#0f172a;
    classDef out fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
    class P,R,T,I,G,D step;
    class S sample;
    class O,Y out;

What this diagram shows

The diagram details the sequential steps involved in generating text using a Large Language Model. It starts with a raw prompt, which is then resolved into the correct format (plain text or dialogue), tokenized into numerical IDs, and fed into the model's generation process. The model performs prefill and a decode loop, followed by a sampling step (considering temperature, top-p, and End-Of-Sequence tokens). Finally, the generated IDs are decoded back into human-readable output text.

When to use it

Use this diagram to understand or explain the inference pipeline of a text generation model, debug generation issues, or design custom text generation applications. It's particularly useful when discussing the interplay between tokenization, model generation, and sampling strategies.

How to adapt it for your project

This flow can be adapted by integrating different tokenizers (e.g., SentencePiece), modifying sampling parameters (e.g., adding top-k, beam search), incorporating post-processing steps for the output text, or extending it for multi-turn conversations by adding a history management component before prompt resolution.

Key concepts