Course Schedule
Part 1: Resources and Deployment
Week 1
Mon, Jan 22
No Class
Fri, Jan 26
Deployment (Linux Shell)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 2
Mon, Jan 29
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Week 3
Mon, Feb 5
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 7
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Feb 9
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Feb 12
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Feb 14
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 16
Memory Resources (Caching Practice)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Week 5
Mon, Feb 19
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Anki Flashcards: Deck
Wed, Feb 21
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 23
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters and Hadoop Ecosystem
Week 6
Wed, Feb 28
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 1
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Week 7
Mon, Mar 4
SQL Databases (MySQL 2)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 6
HDFS Overview
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Due: P3
Released: P4 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Fri, Mar 8
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Lecture
Anki Flashcards: Deck
Week 8
Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
No TopHat!
Midterm:
- Regular: 5:45 to 6:45 pm; Location: TBD
- McBurney: 5:45 to 8:45 pm; Location: TBD
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 15
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Week 9
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Jan 22
No Class
Fri, Jan 26
Deployment (Linux Shell)
Released: P1 (Docker)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Mon, Jan 29
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Week 3
Mon, Feb 5
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 7
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Feb 9
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Feb 12
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Feb 14
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 16
Memory Resources (Caching Practice)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Week 5
Mon, Feb 19
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Anki Flashcards: Deck
Wed, Feb 21
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 23
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters and Hadoop Ecosystem
Week 6
Wed, Feb 28
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 1
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Week 7
Mon, Mar 4
SQL Databases (MySQL 2)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 6
HDFS Overview
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Due: P3
Released: P4 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Fri, Mar 8
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Lecture
Anki Flashcards: Deck
Week 8
Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
No TopHat!
Midterm:
- Regular: 5:45 to 6:45 pm; Location: TBD
- McBurney: 5:45 to 8:45 pm; Location: TBD
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 15
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Week 9
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Feb 5
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 7
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Feb 9
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Mon, Feb 12
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Feb 14
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 16
Memory Resources (Caching Practice)
Due: P2Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Week 5
Mon, Feb 19
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Anki Flashcards: Deck
Wed, Feb 21
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 23
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters and Hadoop Ecosystem
Week 6
Wed, Feb 28
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 1
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Week 7
Mon, Mar 4
SQL Databases (MySQL 2)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 6
HDFS Overview
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Due: P3
Released: P4 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Fri, Mar 8
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Lecture
Anki Flashcards: Deck
Week 8
Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
No TopHat!
Midterm:
- Regular: 5:45 to 6:45 pm; Location: TBD
- McBurney: 5:45 to 8:45 pm; Location: TBD
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 15
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Week 9
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Feb 19
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Watch: Lecture
Anki Flashcards: Deck
Wed, Feb 21
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 23
Network Resources (gRPC+Compose)
Read: gRPC Basics TutorialWatch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 28
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 1
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Week 7
Mon, Mar 4
SQL Databases (MySQL 2)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 6
HDFS Overview
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Due: P3
Released: P4 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Fri, Mar 8
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Lecture
Anki Flashcards: Deck
Week 8
Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
No TopHat!
Midterm:
- Regular: 5:45 to 6:45 pm; Location: TBD
- McBurney: 5:45 to 8:45 pm; Location: TBD
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 15
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Week 9
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Mar 4
SQL Databases (MySQL 2)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 6
HDFS Overview
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Due: P3
Released: P4 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Fri, Mar 8
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Watch: Lecture
Anki Flashcards: Deck
Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")No TopHat!
Midterm:
- Regular: 5:45 to 6:45 pm; Location: TBD
- McBurney: 5:45 to 8:45 pm; Location: TBD
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 15
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Lecture
Anki Flashcards: Deck
Week 9
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Mar 18
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 20
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break
Week 11
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 12
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: LectureAnki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Apr 5
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Week 13
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: The Cloud
Week 14
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Wed, Apr 24
Big Query 1
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 26
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Anki Flashcards: Deck
Week 15
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Apr 29
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck