A detailed flowchart illustrating the pre-training loop for a language model, covering data batching, forward pass, cross-entropy loss, backpropagation, an
%%{init: {"theme": "base", "themeVariables": {"background": "#ffffff", "primaryColor": "#f0fdf4", "primaryBorderColor": "#22c55e", "primaryTextColor": "#0f172a", "lineColor": "#64748b"}}}%%
flowchart TB
P1["train_ids.npy / valid_ids.npy"] --> P2["get_batch()<br>随机切窗口 x / y"]
P2 --> P3["model(x)"]
P3 --> P4["cross_entropy(logits, y)"]
P4 --> P5["backward()"]
P5 --> P6["AdamW + cosine scheduler + gradient clipping"]
P6 --> P7["checkpoint / train_log.jsonl / wandb"]
classDef train fill:#f0fdf4,stroke:#22c55e,color:#0f172a;
classDef loss fill:#fff7ed,stroke:#fb923c,color:#0f172a;
classDef out fill:#eff6ff,stroke:#60a5fa,color:#0f172a;
class P1,P2,P3,P5,P6 train;
class P4 loss;
class P7 out;
This flowchart illustrates the essential steps of a language model pre-training loop. It begins with loading training data, creating batches using random window sampling, performing a forward pass through the model, calculating cross-entropy loss, executing backpropagation, and updating model weights using AdamW with a cosine learning rate scheduler and gradient clipping. The process concludes with checkpointing and logging.
Use this diagram to understand or design the core pre-training process for transformer-based language models. It's particularly useful for visualizing the data flow from raw input IDs to model output, loss computation, and the optimization steps involved in training deep learning models, especially when implementing custom training loops or debugging existing ones.
This diagram can be adapted by changing the data loading mechanism (e.g., streaming, different sampling strategies), replacing the model architecture (e.g., different Transformer variants), using a different loss function (e.g., for specific tasks), or employing alternative optimizers and learning rate schedules. Additional steps like distributed training, mixed precision, or custom metrics can also be integrated.