RAG pipelines, training architectures, inference stacks, agentic systems — how production ML and AI systems fit together.
Production-grade Retrieval-Augmented Generation pipeline showing offline ingest (chunk + embed + store) and online query (retrieve + rerank + generate) with a feedback loop for continuous evals.
This diagram illustrates two parallel workflows for developing and deploying language models: a self-developed MicroLM pipeline and a Qwen-based migration
Detailed overview of the MicroLM project, showcasing two parallel LLM development pipelines: a self-developed TransformerLM and a Qwen-based fine-tuning an
This diagram illustrates the process of handling multi-turn conversations for a language model, including appending user messages, rendering prompts, encod
This flowchart illustrates the process of handling multi-turn conversations with a Large Language Model, focusing on history management, prompt rendering,
Illustrates the complete inference path for single-turn text generation in an LLM, from initial prompt to final output, emphasizing prompt resolution and t
This flowchart illustrates the end-to-end inference path for single-turn text generation in a Large Language Model, from prompt input to final output text.
Details the forward pass of a MicroLM Transformer model, showing the flow from input_ids through Embedding, Transformer Blocks, MultiHeadSelfAttention, Swi
Illustrates the internal forward pass of a MicroLM Transformer model, detailing components like embedding, multi-head attention, SwiGLU FFN, and RMSNorm.
Illustrates the KV Cache inference path in large language models, detailing how it optimizes token generation by reusing historical K/V states to achieve s
Illustrates the KV Cache inference path in LLMs, showing how prefill and iterative decoding with cached Key/Value states optimize token generation by avoid
A detailed flowchart illustrating the pre-training loop for a language model, covering data batching, forward pass, cross-entropy loss, backpropagation, an
This diagram illustrates a typical pretraining loop for a Language Model, covering data loading, forward pass, loss calculation, backpropagation, optimizat
Illustrates the LoRA (Low-Rank Adaptation) process for parameter-efficient fine-tuning, showing how it modifies a pre-trained model to train only a small f
This diagram illustrates the LoRA (Low-Rank Adaptation) process for parameter-efficient fine-tuning, showing how it modifies a pre-trained model by replaci
This diagram illustrates the Supervised Fine-Tuning (SFT) data pipeline, focusing on processing conversational data and applying an assistant-only masked l
This diagram illustrates the data processing pipeline for Supervised Fine-Tuning (SFT) of a language model, emphasizing the critical 'assistant-only masked
This flowchart illustrates a comprehensive pipeline for developing, training, optimizing, and deploying Large Language Models (LLMs), from raw data to a ve
A comprehensive flowchart outlining the end-to-end process of building, training, optimizing, and deploying a Large Language Model (LLM) into a verifiable
This flowchart illustrates a structured evaluation process for Large Language Models (LLMs) to assess their ability to produce accurate and well-formed JSO
This flowchart illustrates a structured evaluation process for comparing Large Language Models (LLMs) on their ability to produce accurate and usable JSON
Illustrates the forward pass architecture of a micro-sized Large Language Model (LLM), detailing its core components from input to output logits.
Illustrates the forward pass architecture of the MicroLM, a compact Transformer-based language model, detailing the flow from input_ids to next-token logit
Illustrates the Qwen LoRA fine-tuning process, from data tokenization with ChatML to PEFT LoRA injection and masked loss training for Qwen2.5-1.5B-Instruct
Illustrates the Qwen LoRA fine-tuning process, from data tokenization and loss mask construction to PEFT LoRA injection and training loop.
Illustrates the deployment and inference pipeline for a Qwen model fine-tuned with LoRA, served via vLLM's OpenAI-compatible API, handling HTTP requests an
This diagram illustrates the service-based inference path for a Qwen large language model, leveraging vLLM to expose an OpenAI-compatible HTTP API.
This flowchart illustrates the Byte Pair Encoding (BPE) training process for MicroLM, from initial corpus preparation to generating a custom 6400-token voc
Illustrates the step-by-step process of training a GPT-2 style Byte-Pair Encoding (BPE) tokenizer, from initial corpus to final vocabulary files.
Illustrates the process of converting raw text data into token ID sequences and storing them in memory-mapped files for efficient language model training.
This flowchart illustrates a comprehensive verification path for deploying Large Language Models (LLMs), covering smoke tests, performance benchmarks, and
A flowchart illustrating the self-developed evaluation path for LLMs, comparing pretrain, baseline, and LoRA checkpoints using fixed prompts and human scor
Flowchart detailing the MicroLM evaluation process from model checkpoints and prompt sets to human scoring and capability assessment, focusing on dialogue,
Illustrates the complete data preparation pipeline for pre-training a MicroLM, from raw Chinese corpus to cleaned, split, and EOS-encoded datasets.
This flowchart illustrates the process of merging a LoRA adapter with a base model using PEFT to create a deployable model directory, ready for serving fra
Illustrates the process of merging a base model with a LoRA adaptor into a single model directory, ready for deployment with inference frameworks like vLLM
Illustrates an AI agent orchestration pipeline, detailing task assignment, iterative agent work, confidence aggregation, validation, and feedback loops for
Illustrates a modular, iterative workflow for orchestrating AI agents, focusing on complexity reduction through helper scripts and distinct implementation/
This flowchart illustrates the real-time processing of ASR messages, including speaker tracking, text accumulation, silence detection, and the logic for tr
This diagram illustrates the complete processing flow for an ASR-based LLM system, from input to answer generation, including smart analysis, intent recogn
This flowchart illustrates the process of parsing a user's resume to generate a structured XML profile, which is then used by an interview agent for person
This flowchart illustrates the process of AI-driven intent recognition, including agent availability checks, prompt building, model invocation, and JSON pa
This flowchart illustrates a sophisticated AI-driven intent recognition process, detailing how an LLM identifies core questions, generates discussion outli
Illustrates the API endpoint flow for a system handling distributed feature engineering and inference, including health checks, prediction, and feature ret
Illustrates an AI chatbot's RAG architecture, detailing how user queries are processed through summarization, a supervisor LLM, vectorstore retrieval, and
Illustrates a three-stage training process for an AI model incorporating memory grids, followed by its inference generation flow, including a detailed sing
This diagram illustrates a two-stage process for an AI interview agent: first, generating a professional job analysis, and then leveraging it in a dual-mod
Illustrates a system for generating professional job analysis from user input and then using it to answer questions in either a direct or deep-thinking dua
This sequence diagram illustrates a Retrieval-Augmented Generation (RAG) workflow for conversational AI, from user voice input to synthesized speech output
A sequential pipeline illustrating the stages of a chatbot's machine learning model development, including preparation, testing, evaluation, bias detection
This diagram illustrates the Dedalus Orchestrator, an AI-powered coaching system that processes user queries, delegates analysis to specialized agents, int
Illustrates the process of an AI agent's working memory being consolidated into episodic memory within a MemoryMesh, including vector index management and