ML & AI System Diagrams

RAG pipelines, training architectures, inference stacks, agentic systems — how production ML and AI systems fit together.

52 diagrams

Production RAG Pipeline (Ingest → Embed → Store → Retrieve → Generate)

Production-grade Retrieval-Augmented Generation pipeline showing offline ingest (chunk + embed + store) and online query (retrieve + rerank + generate) with a feedback loop for continuous evals.

flowchart · MIT

MicroLM Development & Qwen Migration Workflow

This diagram illustrates two parallel workflows for developing and deploying language models: a self-developed MicroLM pipeline and a Qwen-based migration

flowchart · MIT

MicroLM Project Global Overview: Self-Developed & Qwen Pipelines

Detailed overview of the MicroLM project, showcasing two parallel LLM development pipelines: a self-developed TransformerLM and a Qwen-based fine-tuning an

flowchart · MIT

Multi-turn Conversation Flow with Context Management

This diagram illustrates the process of handling multi-turn conversations for a language model, including appending user messages, rendering prompts, encod

flowchart · MIT

Multi-turn LLM Conversation Flow with Context Management

This flowchart illustrates the process of handling multi-turn conversations with a Large Language Model, focusing on history management, prompt rendering,

flowchart · MIT

Large Language Model Single-Turn Text Generation Inference Path

Illustrates the complete inference path for single-turn text generation in an LLM, from initial prompt to final output, emphasizing prompt resolution and t

flowchart · MIT

LLM Single-Turn Text Generation Inference Path

This flowchart illustrates the end-to-end inference path for single-turn text generation in a Large Language Model, from prompt input to final output text.

flowchart · MIT

MicroLM Transformer Forward Path

Details the forward pass of a MicroLM Transformer model, showing the flow from input_ids through Embedding, Transformer Blocks, MultiHeadSelfAttention, Swi

flowchart · MIT

MicroLM Transformer Language Model Forward Pass

Illustrates the internal forward pass of a MicroLM Transformer model, detailing components like embedding, multi-head attention, SwiGLU FFN, and RMSNorm.

flowchart · MIT

KV Cache Inference Path for LLMs

Illustrates the KV Cache inference path in large language models, detailing how it optimizes token generation by reusing historical K/V states to achieve s

flowchart · MIT

KV Cache Inference Path for Large Language Models

Illustrates the KV Cache inference path in LLMs, showing how prefill and iterative decoding with cached Key/Value states optimize token generation by avoid

flowchart · MIT

LLM Pre-training Loop with AdamW and Cosine Scheduler

A detailed flowchart illustrating the pre-training loop for a language model, covering data batching, forward pass, cross-entropy loss, backpropagation, an

flowchart · MIT

LLM Pretraining Workflow

This diagram illustrates a typical pretraining loop for a Language Model, covering data loading, forward pass, loss calculation, backpropagation, optimizat

flowchart · MIT

LoRA Integration and Parameter-Efficient Fine-tuning Workflow

Illustrates the LoRA (Low-Rank Adaptation) process for parameter-efficient fine-tuning, showing how it modifies a pre-trained model to train only a small f

flowchart · MIT

LoRA Integration and Parameter-Efficient Fine-tuning

This diagram illustrates the LoRA (Low-Rank Adaptation) process for parameter-efficient fine-tuning, showing how it modifies a pre-trained model by replaci

flowchart · MIT

SFT Data Protocol and Assistant-Only Loss Training Path

This diagram illustrates the Supervised Fine-Tuning (SFT) data pipeline, focusing on processing conversational data and applying an assistant-only masked l

flowchart · MIT

SFT Data Protocol and Training Path

This diagram illustrates the data processing pipeline for Supervised Fine-Tuning (SFT) of a language model, emphasizing the critical 'assistant-only masked

flowchart · MIT

End-to-End LLM Engineering and Deployment Pipeline

This flowchart illustrates a comprehensive pipeline for developing, training, optimizing, and deploying Large Language Models (LLMs), from raw data to a ve

flowchart · MIT

LLM Engineering Pipeline

A comprehensive flowchart outlining the end-to-end process of building, training, optimizing, and deploying a Large Language Model (LLM) into a verifiable

flowchart · MIT

Large Language Model Structured Output Evaluation Flow

This flowchart illustrates a structured evaluation process for Large Language Models (LLMs) to assess their ability to produce accurate and well-formed JSO

flowchart · MIT

LLM Structured Output Evaluation Flow for JSON Generation

This flowchart illustrates a structured evaluation process for comparing Large Language Models (LLMs) on their ability to produce accurate and usable JSON

flowchart · MIT

MicroLM Model Forward Pass

Illustrates the forward pass architecture of a micro-sized Large Language Model (LLM), detailing its core components from input to output logits.

flowchart · MIT

MicroLM Model Forward Pass

Illustrates the forward pass architecture of the MicroLM, a compact Transformer-based language model, detailing the flow from input_ids to next-token logit

flowchart · MIT

Qwen LoRA Fine-tuning Path

Illustrates the Qwen LoRA fine-tuning process, from data tokenization with ChatML to PEFT LoRA injection and masked loss training for Qwen2.5-1.5B-Instruct

flowchart · MIT

Qwen LoRA Fine-tuning Path

Illustrates the Qwen LoRA fine-tuning process, from data tokenization and loss mask construction to PEFT LoRA injection and training loop.

flowchart · MIT

Qwen / vLLM Service Inference Path

Illustrates the deployment and inference pipeline for a Qwen model fine-tuned with LoRA, served via vLLM's OpenAI-compatible API, handling HTTP requests an

flowchart · MIT

Qwen/vLLM Service Inference Path

This diagram illustrates the service-based inference path for a Qwen large language model, leveraging vLLM to expose an OpenAI-compatible HTTP API.

flowchart · MIT

BPE Training Path for MicroLM

This flowchart illustrates the Byte Pair Encoding (BPE) training process for MicroLM, from initial corpus preparation to generating a custom 6400-token voc

flowchart · MIT

GPT-2 Style Byte-Pair Encoding (BPE) Tokenizer Training

Illustrates the step-by-step process of training a GPT-2 style Byte-Pair Encoding (BPE) tokenizer, from initial corpus to final vocabulary files.

flowchart · MIT

Language Model Tokenization and Memmap Data Preparation

Illustrates the process of converting raw text data into token ID sequences and storing them in memory-mapped files for efficient language model training.

flowchart · MIT

LLM Deployment Verification Path

This flowchart illustrates a comprehensive verification path for deploying Large Language Models (LLMs), covering smoke tests, performance benchmarks, and

flowchart · MIT

LLM Evaluation Path for MicroLM

A flowchart illustrating the self-developed evaluation path for LLMs, comparing pretrain, baseline, and LoRA checkpoints using fixed prompts and human scor

flowchart · MIT

MicroLM Self-Developed Evaluation Path

Flowchart detailing the MicroLM evaluation process from model checkpoints and prompt sets to human scoring and capability assessment, focusing on dialogue,

flowchart · MIT

Pre-training Data Preparation Flow for MicroLM

Illustrates the complete data preparation pipeline for pre-training a MicroLM, from raw Chinese corpus to cleaned, split, and EOS-encoded datasets.

flowchart · MIT

LoRA Model Merging for Deployment

This flowchart illustrates the process of merging a LoRA adapter with a base model using PEFT to create a deployable model directory, ready for serving fra

flowchart · MIT

LoRA Model Merging for Deployment

Illustrates the process of merging a base model with a LoRA adaptor into a single model directory, ready for deployment with inference frameworks like vLLM

flowchart · MIT

AI Agent Orchestration Pipeline with Feedback Loops

Illustrates an AI agent orchestration pipeline, detailing task assignment, iterative agent work, confidence aggregation, validation, and feedback loops for

flowchart · unknown license

CFN Loop Flow Diagram: Modular AI Agent Orchestration

Illustrates a modular, iterative workflow for orchestrating AI agents, focusing on complexity reduction through helper scripts and distinct implementation/

flowchart · unknown license

ASR Message Processing and AI Trigger Flow

This flowchart illustrates the real-time processing of ASR messages, including speaker tracking, text accumulation, silence detection, and the logic for tr

flowchart · NOASSERTION

ASR Interview Coder LLM System Flow

This diagram illustrates the complete processing flow for an ASR-based LLM system, from input to answer generation, including smart analysis, intent recogn

flowchart · NOASSERTION

Resume Parsing and User Profile Generation Flow

This flowchart illustrates the process of parsing a user's resume to generate a structured XML profile, which is then used by an interview agent for person

flowchart · NOASSERTION

AI Intent Recognition Flow with LLMs

This flowchart illustrates the process of AI-driven intent recognition, including agent availability checks, prompt building, model invocation, and JSON pa

flowchart · NOASSERTION

AI Intent Recognition Workflow with LLMs

This flowchart illustrates a sophisticated AI-driven intent recognition process, detailing how an LLM identifies core questions, generates discussion outli

flowchart · NOASSERTION

API Endpoint Flow for Distributed Feature Engineering and Inference

Illustrates the API endpoint flow for a system handling distributed feature engineering and inference, including health checks, prediction, and feature ret

flowchart · unknown license

AI Chatbot Architecture with Llama 3.1 and GPT-4o-mini

Illustrates an AI chatbot's RAG architecture, detailing how user queries are processed through summarization, a supervisor LLM, vectorstore retrieval, and

flowchart · NOASSERTION

Three-Stage Training and Inference Flow for Memory-Augmented AI

Illustrates a three-stage training process for an AI model incorporating memory grids, followed by its inference generation flow, including a detailed sing

flowchart · GPL-3.0

AI Interview Agent Job Analysis and Dual-Mode Response Flow

This diagram illustrates a two-stage process for an AI interview agent: first, generating a professional job analysis, and then leveraging it in a dual-mod

flowchart · NOASSERTION

Job Analysis and Dual-Mode Question Answering System

Illustrates a system for generating professional job analysis from user input and then using it to answer questions in either a direct or deep-thinking dua

flowchart · NOASSERTION

RAG Workflow for Conversational AI

This sequence diagram illustrates a Retrieval-Augmented Generation (RAG) workflow for conversational AI, from user voice input to synthesized speech output

sequence · unknown license

Chatbot ML Development and Evaluation Pipeline

A sequential pipeline illustrating the stages of a chatbot's machine learning model development, including preparation, testing, evaluation, bias detection

flowchart · NOASSERTION

Dedalus Orchestrator AI Coaching System

This diagram illustrates the Dedalus Orchestrator, an AI-powered coaching system that processes user queries, delegates analysis to specialized agents, int

flowchart · unknown license

Memory Consolidation Flow for AI Agents

Illustrates the process of an AI agent's working memory being consolidated into episodic memory within a MemoryMesh, including vector index management and

sequence · MIT