CS 229R: Algorithms for Big Data

CS 229R: Algorithms for Big Data (Fall 2015, Harvard Univ.). Instructor: Professor Jelani Nelson. Big data is data so large that it does not fit in the main memory of a single machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic monitoring, machine learning, scientific computing, signal processing, and several other areas. This course will cover mathematically rigorous models for developing such algorithms, as well as some provable limitations of algorithms operating in those models. Topics discussed will include sketching and streaming, dimensionality reduction, numerical linear algebra, compressed sensing, and external memory and cache-obliviousness. (from

Lecture 23 - External memory model: Linked list, Matrix multiplication, B-tree

External memory model: Linked list, Matrix multiplication, B-tree, Buffered repository tree, Sorting

Go to the Course Home or watch other lectures:

Lecture 01 - Course Introduction, Basic tail bounds, Morris' algorithm
Lecture 02 - Distinct elements, k-wise independence, Geometric subsampling of streams
Lecture 03 - Necessity of randomized/approximate guarantees, Linear sketching
Lecture 04 - P-stable sketch analysis, Nisan's PRG, High 𝓁p norms (p>2) via max-stability
Lecture 05 - Analysis of 𝓁p estimation algorithm via max-stability, Deterministic point query
Lecture 06 - CountMin sketch, Point query, Heavy hitters, Sparse approximation
Lecture 07 - CountSketch, 𝓁0 sampling, Graph sketching
Lecture 08 - Amnesic dynamic programming (approximate distance to monotonicity)
Lecture 09 - Communication complexity + application to median and F0 lower bounds
Lecture 10 - Randomized and approximate F0 lower bounds, Disjointness, FP lower bound
Lecture 11 - Khintchine, Decoupling, Hanson-Wright, Proof of distributional JL lemma
Lecture 12 - Alon's JL lower bound, Beyond worst case analysis
Lecture 13 - ORS theorem (distributional JL implies Gordon's theorem), Sparse JL
Lecture 14 - Sparse JL proof wrap-up, Fast JL Transform, Approximate nearest neighbor
Lecture 15 - Approximate matrix multiplication with Frobenius error via sampling / JL
Lecture 16 - Linear least squares via subspace embeddings, Leverage score sampling
Lecture 17 - Oblivious subspace embeddings, Faster iterative regression, Sketch-and-solve regression
Lecture 18 - Low-rank approximation, Column-based matrix reconstruction, K-means
Lecture 19 - RIP and connection to incoherence, Basis pursuit, Krahmer-Ward theorem
Lecture 20 - Krahmer-Ward proof, Iterative Hard Thresholding
Lecture 21 - 𝓁1/𝓁1 recovery, RIP1, Unbalanced expanders, Sequential Sparse Matching Pursuit
Lecture 22 - Matrix completion
Lecture 23 - External memory model: Linked list, Matrix multiplication, B-tree
Lecture 24 - Competitive paging, Cache-oblivious algorithms
Lecture 25 - MapReduce: TeraSort, Minimum spanning tree, Triangle counting