AI & ML System Design
LLM serving, RAG, agents, multi-agent orchestration, evaluation, cost, safety, ML fundamentals, feature stores, recommendations, multimodal, voice.
- Modules
- 15
- Hours
- 7
- Difficulty
- Intermediate to Advanced
- 9.0advanced
LLM Serving Architecture (vLLM, TGI, TensorRT-LLM)
Design a production LLM inference stack: continuous batching, paged attention, KV-cache management, and multi-tenant GPU scheduling.
- 9.1intermediate
RAG Pipelines (Retrieval-Augmented Generation)
Design production RAG: chunking, embedding models, hybrid dense-plus-sparse retrieval, reranking, and the eval loops that keep it honest.
- 9.2advanced
Vector Search at Scale (HNSW, IVF-PQ, DiskANN)
Design billion-scale vector search: HNSW, IVF-PQ, and DiskANN indexes, product quantization, hybrid BM25-vector search, and sharding strategies.
- 9.3advanced
AI Agent Architectures (ReAct, Reflection, Planning, Tool Use, Memory)
The canonical patterns for turning an LLM into an agent: ReAct's think-act-observe loop, reflection and self-critique, planner-executor decomposition, tool use and function calling, and how agents manage short- and long-term memory.
- 9.4advanced
Multi-Agent Orchestration (LangGraph, OpenAI Agents SDK, AutoGen, Swarm)
Composing multiple agents into a reliable system: orchestrator-worker topologies, handoffs and delegation, shared memory, parallel fan-out, and the failure modes of agent graphs.
- 9.5advanced
LLM Evaluation and Observability (Ragas, LangSmith, TruLens, LLM-as-Judge)
How to evaluate LLM systems before and after they ship: golden datasets, reference-free metrics, LLM-as-judge, continuous eval pipelines, and the observability stack for production LLMs.
- 9.6intermediate
LLMOps and Prompt Engineering (Versioning, Guardrails, Red-Teaming)
The operational side of shipping LLM features: prompt-as-code, versioning, rollback, A/B testing prompts, structured outputs, and red-teaming before launch.
- 9.7intermediate
LLM Cost Optimisation (Semantic Cache, Model Routing, Cascading, Prompt Caching)
The cost-engineering toolbox for production LLMs: semantic caching, model routing, cascade small-then-big, prompt caching (Anthropic, OpenAI), and the unit economics that decide per-request margin.
- 9.8advanced
LLM Safety and Guardrails (OWASP LLM Top 10, Prompt Injection, PII, Jailbreaks)
The safety-engineering surface for LLM applications: OWASP LLM Top 10, prompt-injection defence, PII redaction, jailbreak containment, and the defence-in-depth model for public-facing agents.
- 9.9intermediate
ML System Design Fundamentals
The classic ML systems backbone every modern AI product sits on: candidate generation, ranking, two-tower embeddings, offline/online feature parity, and the training-serving skew problem.
- 9.10advanced
Feature Stores and Model Serving (Feast, Tecton, KServe, BentoML, MLflow)
The infrastructure that makes ML shippable: online and offline feature stores, the model registry, model servers, shadow deploys, and the production lifecycle around a trained model.
- 9.11advanced
Recommendation Systems Deep Dive (DLRM, Two-Tower, Embedding Retrieval, Cold Start)
How modern recommenders actually work end-to-end: candidate gen via ANN on embeddings, DLRM-style ranking, exploration-exploitation, cold-start handling, and the evaluation loop that keeps metrics honest.
- 9.12advanced
Realtime AI and Voice Agents (Streaming Inference, WebRTC, LiveKit, Deepgram)
Designing sub-second voice agents: streaming ASR, low-latency LLM inference, streaming TTS, WebRTC transport, interruption handling, and the end-to-end latency budget.
- 9.13intermediate
Multimodal AI Systems (CLIP, Whisper, LayoutLM, Document AI)
Designing systems that ingest images, audio, video, and documents: CLIP-style embeddings for cross-modal retrieval, Whisper pipelines, OCR-plus-layout models, and the storage architecture for unstructured data.
- 9.14intermediate
Data Infrastructure for AI (Embedding Pipelines, Chunking, Unstructured ETL, MCP)
The data plane that feeds AI systems: source connectors, chunking strategies, embedding at scale, metadata schema, freshness, and the Model Context Protocol as a standard interface.