Abstract

We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the $\\kappa$-operator, allows to learn a robust policy in a model-based fashion [...]