Data Systems
Storage engines, OLAP, streams, search, vectors.
- Modules
- 10
- Hours
- 4
- Difficulty
- Intermediate to Advanced
- 4.0advanced
Storage Engines: B-Trees, LSM-Trees, and Why Your Database Feels the Way It Does
How B-tree and LSM-tree storage engines shape read, write, and space amplification, with examples from InnoDB, PostgreSQL, RocksDB, and Cassandra.
- 4.1intermediate
OLTP vs OLAP: Row Stores, Column Stores, and Matching Shape to Workload
Why transactional systems use row-oriented storage and analytical systems use columnar, with examples from Postgres, MySQL, Redshift, BigQuery, ClickHouse, and Snowflake.
- 4.2intermediate
Data Warehouses and Data Lakes: Structure, Schema, and the Lakehouse
How Redshift, BigQuery, Snowflake, S3-based lakes, and the lakehouse pattern with Delta Lake, Iceberg, and Hudi actually fit together.
- 4.3intermediate
Stream vs Batch Processing: Lambda, Kappa, and the End of That Debate
Batch with Spark and Hadoop, streaming with Kafka Streams, Flink, and Spark Streaming, and how Lambda and Kappa architectures stack up.
- 4.4intermediate
Change Data Capture: Streaming the Database's Inner Monologue
How Debezium, Maxwell, and the outbox pattern turn WAL and binlog entries into reliable event streams, and when each approach is the right call.
- 4.5intermediate
Search Systems: Inverted Indexes, BM25, and Running Elasticsearch in Production
How Elasticsearch, OpenSearch, and Solr build inverted indexes, score with BM25, and handle faceting, relevance tuning, and sharding at scale.
- 4.6intermediate
Time-Series Databases: Metrics, Events, and Retention at Scale
How Prometheus, InfluxDB, TimescaleDB, and VictoriaMetrics handle write-heavy time-series workloads with downsampling and retention policies.
- 4.7intermediate
Graph Databases: Property Graphs, Cypher, and When Joins Are the Problem
How Neo4j, Amazon Neptune, and Dgraph model relationships, and when graph queries beat recursive SQL joins.
- 4.8advanced
Vector Databases: Embeddings, ANN Indexes, and the Retrieval Layer for AI
How Pinecone, Weaviate, Milvus, and pgvector store and search embeddings using HNSW and IVF approximate nearest neighbor indexes.
- 4.9intermediate
Key-Value Stores: Redis, Memcached, DynamoDB, and Picking the Right Hash Table
How Redis, Memcached, and DynamoDB differ in durability, data model, and scaling, and when each is the right key-value store.