# InfoCoBuild

## Reinforcement Learning

Reinforcement Learning. Instructor: Prof. Balaraman Ravindran, Department of Computer Science and Engineering, IIT Madras. Reinforcement learning is a paradigm that aims to model the trial-and-error learning process that is needed in many problem situations where explicit instructive signals are not available. It has roots in operations research, behavioral psychology and AI. The goal of the course is to introduce the basic mathematical foundations of reinforcement learning, as well as highlight some of the recent directions of research. (from nptel.ac.in)

 Lecture 05 - Introduction to RL

In this lecture we give an introduction to the area of reinforcement learning and discuss various illustrative examples as well as some applications in which reinforcement learning has been successfully applied.

Go to the Course Home or watch other lectures:

 Preparatory Material Lecture 01 - Probability Basics 1 Lecture 02 - Probability Basics 2 Lecture 03 - Linear Algebra 1 Lecture 04 - Linear Algebra 2 Introduction to RL and Immediate RL Lecture 05 - Introduction to RL Lecture 06 - RL Framework and Applications Lecture 07 - Introduction to Immediate RL Lecture 08 - Bandit Optimalities Lecture 09 - Value Function based Methods Bandit Algorithms Lecture 10 - Upper Confidence Bound 1 (UCB 1) Lecture 11 - Concentration Bounds Lecture 12 - UCB 1 Theorem Lecture 13 - Probably Approximately Correct (PAC) Bounds Lecture 14 - Median Elimination Lecture 15 - Thompson Sampling Policy Gradient Methods and Introduction to Full RL Lecture 16 - Policy Search Lecture 17 - REINFORCE Lecture 18 - Contextual Bandits Lecture 19 - Full RL Introduction Lecture 20 - Returns, Value Functions and Markov Decision Processes (MDPs) MDP Formulation, Bellman Equations and Optimality Proofs Lecture 21 - MDP Modelling Lecture 22 - Bellman Equation Lecture 23 - Bellman Optimality Equation Lecture 24 - Cauchy Sequence and Green's Equation Lecture 25 - Banach Fixed Point Theorem Lecture 26 - Convergence Proof Dynamic Programming and Monte Carlo Methods Lecture 27 - Lpi Convergence Lecture 28 - Value Iteration Lecture 29 - Policy Iteration Lecture 30 - Dynamic Programming Lecture 31 - Monte Carlo Lecture 32 - Control in Monte Carlo Monte Carlo and Temporal Difference Methods Lecture 33 - Off Policy MC Lecture 34 - UCT (Upper Confidence Bound 1 applied to Trees) Lecture 35 - TD (0) Lecture 36 - TD (0) Control Lecture 37 - Q-Learning Lecture 38 - Afterstate Eligibility Traces Lecture 39 - Eligibility Traces Lecture 40 - Backward View of Eligibility Traces Lecture 41 - Eligibility Trace Control Lecture 42 - Thompson Sampling Recap Function Approximation Lecture 43 - Function Approximation Lecture 44 - Linear Parameterization Lecture 45 - State Aggregation Methods Lecture 46 - Function Approximation and Eligibility Traces Lecture 47 - Least-Squares Temporal Difference (LSTD) and LSTDQ Lecture 48 - LSPI and Fitted Q DQN, Fitted Q and Policy Gradient Approaches Lecture 49 - DQN and Fitted Q-Iteration Lecture 50 - Policy Gradient Approach Lecture 51 - Actor Critic and REINFORCE Lecture 52 - REINFORCE (cont.) Lecture 53 - Policy Gradient with Function Approximation Hierarchical Reinforcement Learning Lecture 54 - Hierarchical Reinforcement Learning Lecture 55 - Types of Optimality Lecture 56 - Semi Markov Decision Processes Lecture 57 - Options Lecture 58 - Learning with Options Lecture 59 - Hierarchical Abstract Machines Hierarchical RL: MAXQ Lecture 60 - MAXQ Lecture 61 - MAXQ Value Function Decomposition Lecture 62 - Option Discovery POMDPs (Partially Observable Markov Decision Processes) Lecture 63 - POMDP Introduction Lecture 64 - Solving POMDP