InfoCoBuild

Reinforcement Learning

Reinforcement Learning. Instructor: Prof. Balaraman Ravindran, Department of Computer Science and Engineering, IIT Madras. Reinforcement learning is a paradigm that aims to model the trial-and-error learning process that is needed in many problem situations where explicit instructive signals are not available. It has roots in operations research, behavioral psychology and AI. The goal of the course is to introduce the basic mathematical foundations of reinforcement learning, as well as highlight some of the recent directions of research. (from nptel.ac.in)

Lecture 29 - Policy Iteration


Go to the Course Home or watch other lectures:

Preparatory Material
Lecture 01 - Probability Basics 1
Lecture 02 - Probability Basics 2
Lecture 03 - Linear Algebra 1
Lecture 04 - Linear Algebra 2
Introduction to RL and Immediate RL
Lecture 05 - Introduction to RL
Lecture 06 - RL Framework and Applications
Lecture 07 - Introduction to Immediate RL
Lecture 08 - Bandit Optimalities
Lecture 09 - Value Function based Methods
Bandit Algorithms
Lecture 10 - Upper Confidence Bound 1 (UCB 1)
Lecture 11 - Concentration Bounds
Lecture 12 - UCB 1 Theorem
Lecture 13 - Probably Approximately Correct (PAC) Bounds
Lecture 14 - Median Elimination
Lecture 15 - Thompson Sampling
Policy Gradient Methods and Introduction to Full RL
Lecture 16 - Policy Search
Lecture 17 - REINFORCE
Lecture 18 - Contextual Bandits
Lecture 19 - Full RL Introduction
Lecture 20 - Returns, Value Functions and Markov Decision Processes (MDPs)
MDP Formulation, Bellman Equations and Optimality Proofs
Lecture 21 - MDP Modelling
Lecture 22 - Bellman Equation
Lecture 23 - Bellman Optimality Equation
Lecture 24 - Cauchy Sequence and Green's Equation
Lecture 25 - Banach Fixed Point Theorem
Lecture 26 - Convergence Proof
Dynamic Programming and Monte Carlo Methods
Lecture 27 - Lpi Convergence
Lecture 28 - Value Iteration
Lecture 29 - Policy Iteration
Lecture 30 - Dynamic Programming
Lecture 31 - Monte Carlo
Lecture 32 - Control in Monte Carlo
Monte Carlo and Temporal Difference Methods
Lecture 33 - Off Policy MC
Lecture 34 - UCT (Upper Confidence Bound 1 applied to Trees)
Lecture 35 - TD (0)
Lecture 36 - TD (0) Control
Lecture 37 - Q-Learning
Lecture 38 - Afterstate
Eligibility Traces
Lecture 39 - Eligibility Traces
Lecture 40 - Backward View of Eligibility Traces
Lecture 41 - Eligibility Trace Control
Lecture 42 - Thompson Sampling Recap
Function Approximation
Lecture 43 - Function Approximation
Lecture 44 - Linear Parameterization
Lecture 45 - State Aggregation Methods
Lecture 46 - Function Approximation and Eligibility Traces
Lecture 47 - Least-Squares Temporal Difference (LSTD) and LSTDQ
Lecture 48 - LSPI and Fitted Q
DQN, Fitted Q and Policy Gradient Approaches
Lecture 49 - DQN and Fitted Q-Iteration
Lecture 50 - Policy Gradient Approach
Lecture 51 - Actor Critic and REINFORCE
Lecture 52 - REINFORCE (cont.)
Lecture 53 - Policy Gradient with Function Approximation
Hierarchical Reinforcement Learning
Lecture 54 - Hierarchical Reinforcement Learning
Lecture 55 - Types of Optimality
Lecture 56 - Semi Markov Decision Processes
Lecture 57 - Options
Lecture 58 - Learning with Options
Lecture 59 - Hierarchical Abstract Machines
Hierarchical RL: MAXQ
Lecture 60 - MAXQ
Lecture 61 - MAXQ Value Function Decomposition
Lecture 62 - Option Discovery
POMDPs (Partially Observable Markov Decision Processes)
Lecture 63 - POMDP Introduction
Lecture 64 - Solving POMDP