Abstract

In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single shared policy, is addressed by applying novel reinforcement learning techniques. A traffic simulator is used in the learning process. Two new algorithms are introduced: a value function-based algorithm and one that uses a direct policy evaluation approach. Both algorithms are shown to perform comparably well.


Original document

The different versions of the original document can be found in:

http://www.sci.brooklyn.cuny.edu/~parsons/courses/790-spring-2004/notes/pendrith.pdf,
http://portal.acm.org/citation.cfm?doid=336595.337554,
https://dl.acm.org/citation.cfm?id=336595.337554,
https://doi.org/10.1145/336595.337554,
https://trid.trb.org/view/715027,
https://core.ac.uk/display/101466319,
https://academic.microsoft.com/#/detail/2023790196
http://dx.doi.org/10.1145/336595.337554
Back to Top

Document information

Published on 01/01/2000

Volume 2000, 2000
DOI: 10.1145/336595.337554
Licence: CC BY-NC-SA license

Document Score

0

Views 0
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?