MicroLM Model Forward Pass

ML & AI · flowchart diagram · MIT

Illustrates the forward pass architecture of a micro-sized Large Language Model (LLM), detailing its core components from input to output logits.

Source: https://github.com/jiaran-king/MicroLM/blob/782ae02f10c14b484a317f22115a066b3b10b91d/Readme/%E9%A1%B9%E7%9B%AE%E5%85%A8%E6%99%AF%E5%9B%BE/00-%E5%85%A8%E6%B5%81%E7%A8%8B%E5%88%86%E6%9E%90%EF%BC%88%E8%AE%AD%E7%BB%83%E3%80%81%E6%8E%A8%E7%90%86%E3%80%81%E8%AF%84%E6%B5%8B%E4%B8%8E%E9%83%A8%E7%BD%B2%EF%BC%89.md
Curated by jiaran-king
LLM Transformer Deep Learning Neural Network Forward Pass AI Model Machine Learning

Mermaid source

%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#f8fafc", "primaryBorderColor": "#94a3b8", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
    M1["input_ids"] --> M2["Embedding"]
    M2 --> M3["8 × TransformerBlock"]
    M3 --> M4["RMSNorm → Attention → Residual"]
    M4 --> M5["RMSNorm → SwiGLU FFN → Residual"]
    M5 --> M6["Final RMSNorm"]
    M6 --> M7["lm_head"]
    M7 --> M8["next-token logits"]

    classDef model fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
    classDef attn fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
    classDef ffn fill:#fff7ed,stroke:#fb923c,color:#0f172a;
    classDef out fill:#fdf4ff,stroke:#d946ef,color:#0f172a;
    class M1,M2,M3,M6,M7 model;
    class M4 attn;
    class M5 ffn;
    class M8 out;

What this diagram shows

This flowchart depicts the complete forward pass of the formerLM model, a micro-sized Large Language Model. It starts with input_ids going into an Embedding layer, followed by 8 TransformerBlock's. Each block consists of RMSNorm, Attention (with residual connection), and RMSNorm, SwiGLU FFN (with residual connection). The output then passes through a Final RMSNorm and an lm_head to produce next-token logits.

When to use it

Use this diagram to understand the sequential data flow and architectural components within a Transformer-based LLM. It's ideal for explaining how input tokens are processed through embedding, attention, and feed-forward networks to generate output predictions. Useful for educational purposes, architectural reviews, or debugging model inference.

How to adapt it for your project

This diagram can be adapted to represent other Transformer variants by changing the number of TransformerBlock's, modifying the specific normalization layers (e.g., LayerNorm instead of RMSNorm), or altering the FFN activation (e.g., GELU instead of SwiGLU). You could also expand specific blocks to show internal details of attention mechanisms or add pre/post-processing steps.

Key concepts