**Preparatory Material** |

Lecture 01 - Probability Basics 1 |

Lecture 02 - Probability Basics 2 |

Lecture 03 - Linear Algebra 1 |

Lecture 04 - Linear Algebra 2 |

**Introduction to RL and Immediate RL** |

Lecture 05 - Introduction to RL |

Lecture 06 - RL Framework and Applications |

Lecture 07 - Introduction to Immediate RL |

Lecture 08 - Bandit Optimalities |

Lecture 09 - Value Function based Methods |

**Bandit Algorithms** |

Lecture 10 - Upper Confidence Bound 1 (UCB 1) |

Lecture 11 - Concentration Bounds |

Lecture 12 - UCB 1 Theorem |

Lecture 13 - Probably Approximately Correct (PAC) Bounds |

Lecture 14 - Median Elimination |

Lecture 15 - Thompson Sampling |

**Policy Gradient Methods and Introduction to Full RL** |

Lecture 16 - Policy Search |

Lecture 17 - REINFORCE |

Lecture 18 - Contextual Bandits |

Lecture 19 - Full RL Introduction |

Lecture 20 - Returns, Value Functions and Markov Decision Processes (MDPs) |

**MDP Formulation, Bellman Equations and Optimality Proofs** |

Lecture 21 - MDP Modelling |

Lecture 22 - Bellman Equation |

Lecture 23 - Bellman Optimality Equation |

Lecture 24 - Cauchy Sequence and Green's Equation |

Lecture 25 - Banach Fixed Point Theorem |

Lecture 26 - Convergence Proof |

**Dynamic Programming and Monte Carlo Methods** |

Lecture 27 - Lpi Convergence |

Lecture 28 - Value Iteration |

Lecture 29 - Policy Iteration |

Lecture 30 - Dynamic Programming |

Lecture 31 - Monte Carlo |

Lecture 32 - Control in Monte Carlo |

**Monte Carlo and Temporal Difference Methods** |

Lecture 33 - Off Policy MC |

Lecture 34 - UCT (Upper Confidence Bound 1 applied to Trees) |

Lecture 35 - TD (0) |

Lecture 36 - TD (0) Control |

Lecture 37 - Q-Learning |

Lecture 38 - Afterstate |

**Eligibility Traces** |

Lecture 39 - Eligibility Traces |

Lecture 40 - Backward View of Eligibility Traces |

Lecture 41 - Eligibility Trace Control |

Lecture 42 - Thompson Sampling Recap |

**Function Approximation** |

Lecture 43 - Function Approximation |

Lecture 44 - Linear Parameterization |

Lecture 45 - State Aggregation Methods |

Lecture 46 - Function Approximation and Eligibility Traces |

Lecture 47 - Least-Squares Temporal Difference (LSTD) and LSTDQ |

Lecture 48 - LSPI and Fitted Q |

**DQN, Fitted Q and Policy Gradient Approaches** |

Lecture 49 - DQN and Fitted Q-Iteration |

Lecture 50 - Policy Gradient Approach |

Lecture 51 - Actor Critic and REINFORCE |

Lecture 52 - REINFORCE (cont.) |

Lecture 53 - Policy Gradient with Function Approximation |

**Hierarchical Reinforcement Learning** |

Lecture 54 - Hierarchical Reinforcement Learning |

Lecture 55 - Types of Optimality |

Lecture 56 - Semi Markov Decision Processes |

Lecture 57 - Options |

Lecture 58 - Learning with Options |

Lecture 59 - Hierarchical Abstract Machines |

**Hierarchical RL: MAXQ** |

Lecture 60 - MAXQ |

Lecture 61 - MAXQ Value Function Decomposition |

Lecture 62 - Option Discovery |

**POMDPs (Partially Observable Markov Decision Processes)** |

Lecture 63 - POMDP Introduction |

Lecture 64 - Solving POMDP |