This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hello everyone
I am currently spending some time on writing DDPG algorithm for continuous control. I run this algo on lunar lander environment with actor lr 1e_4 and critic lr 1e_3. It learns how to land but at some point -like 40 consecutive successful episodes- unfortunately it starts to forget. I know that it is in nature of function approximation. We lose GPI with function approximation. Is there any way to mitigate this forgetting? I use ADAM optimizer in traninig process. I use gaussian noise to prompt exploration. Should I decrease noise std or learning rates of NN s by time?
Thank you in advance
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/reinforceme...