Course Schedule

Part 1: Resources and Deployment

Week 1

Mon, Jan 22
No Class
Wed, Jan 24
Course Intro
Read: Syllabus
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Jan 26
Deployment (Linux Shell)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck

Week 2

Wed, Jan 31
Deployment (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Feb 2
Deployment (Docker 2)
Watch: Lecture

Week 3

Week 4

Mon, Feb 12
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Feb 14
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 16
Memory Resources (Caching Practice)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF

Week 5

Mon, Feb 19
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Anki Flashcards: Deck
Wed, Feb 21
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 23
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters and Hadoop Ecosystem

Week 6

Mon, Feb 26
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck

Week 7

Week 8

Mon, Mar 11
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
No TopHat!
Midterm:
  • Regular: 5:45 to 6:45 pm; Location: TBD
  • McBurney: 5:45 to 8:45 pm; Location: TBD

Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 13
Spark RDDs
Watch: Lecture

Week 9

Fri, Mar 22
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck

Week 10

Mon, Mar 25
Spring Break
Wed, Mar 27
Spring Break
Fri, Mar 29
Spring Break

Week 11

Mon, Apr 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 3
Cassandra Query Language (CQL)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)

Week 12

Mon, Apr 8
Cassandra Replication
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Apr 12
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck

Week 13

Mon, Apr 15
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
UPDATED Due: P6 (now due Apr 18)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Apr 19
Streaming: Spark Concepts
Guest lecture by Tyler
No TopHat!
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 3: The Cloud

Week 14

Mon, Apr 22
The Cloud
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 24
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)

Week 15

Wed, May 1
Big Query 4
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Fri, May 3
Review
UPDATED Due: P8 (now due May 6)
Final exam:
  • Regular: Sunday, May 5 2:45PM - 4:45PM; Location: check canvas announcement
  • McBurney: Sunday, May 5 1:00 PM - 5:00 PM; Location: check canvas announcement

Watch: Lecture
Anki Flashcards: Deck