InfoCoBuild

CS 229R: Algorithms for Big Data

CS 229R: Algorithms for Big Data (Fall 2015, Harvard Univ.). Instructor: Professor Jelani Nelson. Big data is data so large that it does not fit in the main memory of a single machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic monitoring, machine learning, scientific computing, signal processing, and several other areas. This course will cover mathematically rigorous models for developing such algorithms, as well as some provable limitations of algorithms operating in those models. Topics discussed will include sketching and streaming, dimensionality reduction, numerical linear algebra, compressed sensing, and external memory and cache-obliviousness. (from harvard.edu)

Lecture 01 - Course Introduction, Basic tail bounds, Morris' algorithm

Logistics, Course topics, Basic tail bounds (Markov, Chebyshev, Chernoff, Bernstein), Morris' algorithm

Go to the Course Home or watch other lectures:

Lecture 01 - Course Introduction, Basic tail bounds, Morris' algorithm

Lecture 02 - Distinct elements, k-wise independence, Geometric subsampling of streams

Lecture 03 - Necessity of randomized/approximate guarantees, Linear sketching

Lecture 04 - P-stable sketch analysis, Nisan's PRG, High 𝓁_p norms (p>2) via max-stability

Lecture 05 - Analysis of 𝓁_p estimation algorithm via max-stability, Deterministic point query

Lecture 06 - CountMin sketch, Point query, Heavy hitters, Sparse approximation

Lecture 07 - CountSketch, 𝓁₀ sampling, Graph sketching

Lecture 08 - Amnesic dynamic programming (approximate distance to monotonicity)

Lecture 09 - Communication complexity + application to median and F₀ lower bounds

Lecture 10 - Randomized and approximate F₀ lower bounds, Disjointness, F_P lower bound

Lecture 11 - Khintchine, Decoupling, Hanson-Wright, Proof of distributional JL lemma

Lecture 12 - Alon's JL lower bound, Beyond worst case analysis

Lecture 13 - ORS theorem (distributional JL implies Gordon's theorem), Sparse JL

Lecture 14 - Sparse JL proof wrap-up, Fast JL Transform, Approximate nearest neighbor

Lecture 15 - Approximate matrix multiplication with Frobenius error via sampling / JL

Lecture 16 - Linear least squares via subspace embeddings, Leverage score sampling

Lecture 17 - Oblivious subspace embeddings, Faster iterative regression, Sketch-and-solve regression

Lecture 18 - Low-rank approximation, Column-based matrix reconstruction, K-means

Lecture 19 - RIP and connection to incoherence, Basis pursuit, Krahmer-Ward theorem

Lecture 20 - Krahmer-Ward proof, Iterative Hard Thresholding

Lecture 21 - 𝓁₁/𝓁₁ recovery, RIP1, Unbalanced expanders, Sequential Sparse Matching Pursuit

Lecture 22 - Matrix completion

Lecture 23 - External memory model: Linked list, Matrix multiplication, B-tree

Lecture 24 - Competitive paging, Cache-oblivious algorithms

Lecture 25 - MapReduce: TeraSort, Minimum spanning tree, Triangle counting