News
We propose the first online actor-critic scheme with adaptive basis to find a local optimal control policy for a Markov Decision Process (MDP) under the weighted discounted cost objective. We ...
To execute the algorithms, run main.py with the corresponding arguments:. You can use methods on two grid-world environments, Cliff World (CW) and Frozen Lake (FL), which you can specify with the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results