Distributed reinforcement learning for a traffic engineering application

Latest revision as of 17:41, 3 February 2021

Abstract

In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single shared policy, is addressed by applying novel reinforcement learning techniques. A traffic simulator is used in the learning process. Two new algorithms are introduced: a value function-based algorithm and one that uses a direct policy evaluation approach. Both algorithms are shown to perform comparably well.

Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1145/336595.337554

http://www.sci.brooklyn.cuny.edu/~parsons/courses/790-spring-2004/notes/pendrith.pdf

https://dblp.uni-trier.de/db/conf/agents/agents2000.html#Pendrith00,

http://www.sci.brooklyn.cuny.edu/~parsons/courses/790-spring-2004/notes/pendrith.pdf,

http://portal.acm.org/citation.cfm?doid=336595.337554,

https://dl.acm.org/citation.cfm?id=336595.337554,

https://doi.org/10.1145/336595.337554,

https://trid.trb.org/view/715027,

https://core.ac.uk/display/101466319,

https://academic.microsoft.com/#/detail/2023790196

http://dl.acm.org/ft_gateway.cfm?id=337554&ftid=7539&dwn=1,

http://dx.doi.org/10.1145/336595.337554

Latest revision as of 17:41, 3 February 2021

Abstract

Original document

Document information

Document Score

Share this document

Keywords

claim authorship

Revision as of 17:41, 3 February 2021 (view source) Scipediacontent (talk \| contribs) (Created page with " == Abstract == In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single...")	Latest revision as of 17:41, 3 February 2021 (view source) Scipediacontent (talk \| contribs) m (Scipediacontent moved page Draft Content 213991674 to Pendrith 2000a)
(No difference)