Abstract
We extend trust region policy optimization (TRPO) to cooperative multiagent reinforcement learning (MARL) for partially observable Markov games (POMGs......
小提示:本篇文献需要登录阅读全文,点击跳转登录