News

We propose the first online actor-critic scheme with adaptive basis to find a local optimal control policy for a Markov Decision Process (MDP) under the weighted discounted cost objective. We ...
To execute the algorithms, run main.py with the corresponding arguments:. You can use methods on two grid-world environments, Cliff World (CW) and Frozen Lake (FL), which you can specify with the ...