In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single shared policy, is addressed by applying novel reinforcement learning techniques. A traffic simulator is used in the learning process. Two new algorithms are introduced: a value function-based algorithm and one that uses a direct policy evaluation approach. Both algorithms are shown to perform comparably well.
The different versions of the original document can be found in: