Conducting a scientific experiment
In this course you will have to design and conduct a experiments. In your assignments, the experiment will usually be clearly specified, you just have to follow the directions and produce the desired plots. In the case of your research projects, however, you will have to carefully design an experiment and complete a write up of your experiment. This is hard, and many active researchers in the field struggle with it as well. The purpose of this document is to give you some guidelines.
Reinforcement learning experiments are like any scientific experiment; you should have a hypothesis you wish to test, an experiment to test that hypothesis and some conclusions based on the results of the experiment. Lets go through each in turn.
The research question
Your hypothesis should be something that you can clearly state, and about something of interest. For example:
which of two variations of the Sarsa algorithm (two different coding schemes), is better on Mountain car?
Many times you may be asking a question that has been studied before. That's ok; it is useful to recreate know results and this process often leads to better understanding and new insights.
The experiment
In order to answer your question you need to conduct an experiment. How difficult it is to establish your result---how careful you have to be---depends on your hypothesis. For example, the hypothesis above, about which variation of Sarsa is better might require testing over a large range of \alpha, \lambda, and \epsilon values, averaging over many independent runs and plotting standard errors and/or conducting t-test. In other words, it is difficult to say one algorithm is better than another.
Doing a fair comparison is a topic onto itself. We will talk about it in class, but in general it is something that takes trial and error to get better at.
The write up
In your write-up you need to completely explain your experimental setup, and algorithm implementation details so that others could nearly perfectly reproduce your results. Science is about reproducibility! One way to make this easier is to split things up. Clearly and separately explain the problem under study (e.g., Mountain Car), and the learning algorithms/agents you apply to your problem (e.g., Sarsa with tile coding).
You want to describe how you conducted your experiment. You want to keep these three things (problem, algorithm, and experiment description) clearly separated in the readers mind. The best way to do this is to use different tenses while writing:
- The problem and algorithms is described in present tense. They are not in the past of the future, they just are.
- For example, the Sarsa algorithm maintains (present tense) an approximation to the state-action
value function, and the Mountain car problem has (present tense) a 2-dimensional state space with 3 discrete actions, left, right, and coast.
- The experiment is something you did in the past, so we use past tense.
-
For example, in my experiment, I tested (past tense) 10 instances of the Sarsa algorithm, each with a different value of \alpha, to the problem. Each of these algorithm instances was initialized (past
tense) with a \vec{w}=0, and then run (past tense) for 200 episodes. The whole proceedure was repeated (past tense) 200 times (i.e., runs). The random seed was initialized (past tense) to the same value for each algorithm instances at the beginning of each batch of 200 runs. For each run, the cumulative reward on during each episode was recorded (past tense) and averaged (past tense) over the 200 runs to produce the learning curves shown in Figure
As you can see describing your experience can be very time consuming and boring, but it is essential to ensure others could reproduce your experiment: leaving out even the smallest detail can produce entirely different plots. Write simply and clearly.
Results and Conclusions
Finally we can move to describing our results and forming conclusions. Again we want to keep things clear and separated. The results of the experiment are described in present tense. The results are very different from the conclusions. Do not embellish them or draw conclusions here, just describe the data you have. It is appropriate to note any interesting trends and only explain them if you have a concrete understanding of why they occurred.
For example, the results in figure X show that algorithm A finds a policy for getting out of the valley within the first 10 episodes, but later exhibits significant oscillation in steps to goal
In our conclusions we draw conclusions from our experiment. Here we typically revisit our hypothesis and indicate if we have a clear answer to our original question. Be careful not to over claim here. You can make conclusions only about the result of your experiment, not broader things you did not test. For example, you might say which algorithm variation appears to be better in Mountain car given the parameter values you tested and the number of episodes you ran. You cannot say anything about other problems you did not test or more generally about which algorithm is better overall. I typically say something like "Based on these results we conclude", to make it clear that you are drawing a conclusion and not describing a result.