Baidu
map

Successive Over-Relaxation Q-Learning

Kamanchi, C; Diddigi, RB; Bhatnagar, S

Kamanchi, C (corresponding author), Indian Inst Sci, Dept Comp Sci & Automat, Bengaluru 560012, India.

IEEE CONTROL SYSTEMS LETTERS, 2020; 4 (1): 55

Abstract

In a discounted reward Markov decision process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to a......

Full Text Link


Baidu
map
Baidu
map
Baidu
map