Part 4 of 11

Data Systems

Storage engines, OLAP, streams, search, vectors.

Modules
10
Hours
4
Difficulty
Intermediate to Advanced
  1. 4.0advanced

    Storage Engines: B-Trees, LSM-Trees, and Why Your Database Feels the Way It Does

    How B-tree and LSM-tree storage engines shape read, write, and space amplification, with examples from InnoDB, PostgreSQL, RocksDB, and Cassandra.

    25 min MySQLPostgreSQLCassandra+3
  2. 4.1intermediate

    OLTP vs OLAP: Row Stores, Column Stores, and Matching Shape to Workload

    Why transactional systems use row-oriented storage and analytical systems use columnar, with examples from Postgres, MySQL, Redshift, BigQuery, ClickHouse, and Snowflake.

    25 min PostgreSQLMySQLBigQuery+6
  3. 4.2intermediate

    Data Warehouses and Data Lakes: Structure, Schema, and the Lakehouse

    How Redshift, BigQuery, Snowflake, S3-based lakes, and the lakehouse pattern with Delta Lake, Iceberg, and Hudi actually fit together.

    25 min BigQueryS3Spark+6
  4. 4.3intermediate

    Stream vs Batch Processing: Lambda, Kappa, and the End of That Debate

    Batch with Spark and Hadoop, streaming with Kafka Streams, Flink, and Spark Streaming, and how Lambda and Kappa architectures stack up.

    25 min KafkaFlinkSpark+2
  5. 4.4intermediate

    Change Data Capture: Streaming the Database's Inner Monologue

    How Debezium, Maxwell, and the outbox pattern turn WAL and binlog entries into reliable event streams, and when each approach is the right call.

    25 min PostgreSQLMySQLKafka+2
  6. 4.5intermediate

    Search Systems: Inverted Indexes, BM25, and Running Elasticsearch in Production

    How Elasticsearch, OpenSearch, and Solr build inverted indexes, score with BM25, and handle faceting, relevance tuning, and sharding at scale.

    30 min ElasticsearchOpenSearchPostgreSQL+1
  7. 4.6intermediate

    Time-Series Databases: Metrics, Events, and Retention at Scale

    How Prometheus, InfluxDB, TimescaleDB, and VictoriaMetrics handle write-heavy time-series workloads with downsampling and retention policies.

    25 min PrometheusGrafanaInfluxDB+2
  8. 4.7intermediate

    Graph Databases: Property Graphs, Cypher, and When Joins Are the Problem

    How Neo4j, Amazon Neptune, and Dgraph model relationships, and when graph queries beat recursive SQL joins.

    25 min Neo4jRocksDB
  9. 4.8advanced

    Vector Databases: Embeddings, ANN Indexes, and the Retrieval Layer for AI

    How Pinecone, Weaviate, Milvus, and pgvector store and search embeddings using HNSW and IVF approximate nearest neighbor indexes.

    25 min PineconeWeaviateMilvus+6
  10. 4.9intermediate

    Key-Value Stores: Redis, Memcached, DynamoDB, and Picking the Right Hash Table

    How Redis, Memcached, and DynamoDB differ in durability, data model, and scaling, and when each is the right key-value store.

    25 min RedisMemcachedDynamoDB+1