Course Syllabus

Reinforcement Learning

CSCI-B659- Spring 2016


Class Meets

When: Tuesday and Thrusday 4:00pm-5:15pm

Where: Swain West, Room 103



Adam White

Office: TBA

Web: (Links to an external site.)




Office: TBA




Office Hours (instructor)

Wednesday 2pm-4pm, or by appointment


Office Hours (AIs)




Course content

Reinforcement learning is a framework for modeling the an autonomous agent’s interaction with an unknown world. The agent’s objective is to learn the effects of it’s actions, and modify its policy in order to maximize future reward. The study of Reinforcement learning emphasizes a learning approach to artificial intelligence. Unlike supervised learning, the agent is not explicitly told the correct answers (labels), rather an RL agent must learn only from reward and trial and error interaction with the world. This general framework has been used to optimize helicopter flight, schedule elevators, and achieve super-human level performance in many games (e.g., Backgammon, GO, and Atari). Ideas from reinforcement learning has also be used to explain learning in animals, and model dopamine activity in the human brain.


This course provides an introduction to some of the foundational ideas on which modern reinforcement learning is built, including Markov decision processes, value functions, Monte Carlo estimation, dynamic programming, temporal difference learning, eligibility traces, and function approximation. This course will develop an intuitive understanding of these concepts (taking the agent’s perspective), while also focusing on the mathematical theory of reinforcement learning. Programming assignments and projects will require implementing and testing complete decision making systems.


The objective of this course is twofold. The first is to prepare you for conducting research in reinforcement learning. The second is to provide you with the required knowledge to apply reinforcement learning techniques to novel applications.


Topics to be covered:

  • Overview of reinforcement learning: the agent environment framework, successes of reinforcement learning
  • Bandit problems and online learning
  • Markov decision processes
  • Returns, and value functions
  • Solution methods: dynamic programming
  • Solution methods: Monte Carlo learning
  • Solution methods: Temporal difference learning learning
  • Eligibility traces
  • Value function approximation (function approximation)
  • Models and planning (table lookup case)
  • Case studies: successful examples of RL systems
  • Frontiers of RL research



This course will rely on basic statistics (e.g., probability distributions and expected values), and basic linear algebra (e.g., inner products). You should be able to program in some language (e.g., python, C). 



  • 50% from 5 assignments
  • 10% thought questions.
  • 10% Project proposal.
  • 30% Final project.


Text and resources:

Required: Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto

Supplemental: Algorithms for Reinforcement learning, by Csaba Szepesvari 

(both freely available online)


Late Policy and Academic Honesty

Assignments can be done in groups but you must clearly state who you collaborated with and the nature of the collaboration. All the sources used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. For example,

I worked with Sally on questions 4 and 5. It was Sally's idea to use larger tile sizes in the experiment, but I coded the experiment myself.

 Every student must write their own code and conduct their own experiments. No data or results sharing.


The project can be done in pairs, with no restrictions on what is shared. Each pair shall submit one report and both will receive the same grade.

Academic honesty is taken seriously; for detailed information see Indiana University Code of Student Rights, Responsibilities, and Conduct (Links to an external site.).

Course Summary:

Date Details Due