MicroLM Model Forward Pass

ML & AI · flowchart diagram · MIT

Illustrates the forward pass architecture of the MicroLM, a compact Transformer-based language model, detailing the flow from input_ids to next-token logit

Source: https://github.com/jiaran-king/MicroLM/blob/782ae02f10c14b484a317f22115a066b3b10b91d/Readme/%E9%A1%B9%E7%9B%AE%E5%85%A8%E6%99%AF%E5%9B%BE/00-%E5%85%A8%E6%B5%81%E7%A8%8B%E5%88%86%E6%9E%90%EF%BC%88%E8%AE%AD%E7%BB%83%E3%80%81%E6%8E%A8%E7%90%86%E3%80%81%E8%AF%84%E6%B5%8B%E4%B8%8E%E9%83%A8%E7%BD%B2%EF%BC%89.md
Curated by jiaran-king
Transformer LLM Neural Network Deep Learning Model Architecture Forward Pass MicroLM

Mermaid source

%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#f8fafc", "primaryBorderColor": "#94a3b8", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
    M1["input_ids"] --> M2["Embedding"]
    M2 --> M3["8 × TransformerBlock"]
    M3 --> M4["RMSNorm → Attention → Residual"]
    M4 --> M5["RMSNorm → SwiGLU FFN → Residual"]
    M5 --> M6["Final RMSNorm"]
    M6 --> M7["lm_head"]
    M7 --> M8["next-token logits"]

    classDef model fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
    classDef attn fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
    classDef ffn fill:#fff7ed,stroke:#fb923c,color:#0f172a;
    classDef out fill:#fdf4ff,stroke:#d946ef,color:#0f172a;
    class M1,M2,M3,M6,M7 model;
    class M4 attn;
    class M5 ffn;
    class M8 out;

What this diagram shows

This flowchart depicts the sequential data flow through the MicroLM model during its forward pass. It starts with input_ids, which are processed by an Embedding layer. The output then passes through 8 × TransformerBlock's, each comprising RMSNorm, Attention, and Residual connections, followed by another RMSNorm, SwiGLU FFN, and Residual connection. A Final RMSNorm precedes the lm_head layer, which ultimately produces the next-token logits.

When to use it

Use this diagram to understand or explain the core architecture and data flow of a Transformer-based large language model, particularly when focusing on compact or educational implementations like MicroLM. It's suitable for architectural reviews, onboarding new team members, or documenting model designs.

How to adapt it for your project

This diagram can be adapted by changing the number of TransformerBlock's, modifying the internal structure of the blocks (e.g., different attention mechanisms or FFN activations), adjusting embedding dimensions, or incorporating additional pre/post-processing layers. The lm_head can also be varied based on the specific task (e.g., classification instead of next-token prediction).

Key concepts

Transformer Architecture
Embedding Layer
Attention Mechanism
Feed-Forward Network (FFN)
Residual Connections