Illustrates the forward pass architecture of the MicroLM, a compact Transformer-based language model, detailing the flow from input_ids to next-token logit
%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#f8fafc", "primaryBorderColor": "#94a3b8", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
M1["input_ids"] --> M2["Embedding"]
M2 --> M3["8 × TransformerBlock"]
M3 --> M4["RMSNorm → Attention → Residual"]
M4 --> M5["RMSNorm → SwiGLU FFN → Residual"]
M5 --> M6["Final RMSNorm"]
M6 --> M7["lm_head"]
M7 --> M8["next-token logits"]
classDef model fill:#f8fafc,stroke:#94a3b8,color:#0f172a;
classDef attn fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
classDef ffn fill:#fff7ed,stroke:#fb923c,color:#0f172a;
classDef out fill:#fdf4ff,stroke:#d946ef,color:#0f172a;
class M1,M2,M3,M6,M7 model;
class M4 attn;
class M5 ffn;
class M8 out;
This flowchart depicts the sequential data flow through the MicroLM model during its forward pass. It starts with input_ids, which are processed by an Embedding layer. The output then passes through 8 × TransformerBlock's, each comprising RMSNorm, Attention, and Residual connections, followed by another RMSNorm, SwiGLU FFN, and Residual connection. A Final RMSNorm precedes the lm_head layer, which ultimately produces the next-token logits.
Use this diagram to understand or explain the core architecture and data flow of a Transformer-based large language model, particularly when focusing on compact or educational implementations like MicroLM. It's suitable for architectural reviews, onboarding new team members, or documenting model designs.
This diagram can be adapted by changing the number of TransformerBlock's, modifying the internal structure of the blocks (e.g., different attention mechanisms or FFN activations), adjusting embedding dimensions, or incorporating additional pre/post-processing layers. The lm_head can also be varied based on the specific task (e.g., classification instead of next-token prediction).