Critic Network is failling to predict rewards

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

https://preview.redd.it/y7o4wybc5ezd1.png?width=2058&format=png&auto=webp&s=66421482f30fdad610078cf6e12499ced4e87810

I am training an Actor - Critic model but it is not effectively learning the task. I realised, Critic Loss is not decreasing while training and decided to get an output of True rewards and critic outputs to compare critic networks performance. As you can see in the plot, it is not learning anything at all. I tried training with Vanilla LSTM and also another model with custom LSTM block with residual connection and feed forward network but both of them is doing same.

I am using shared layers for both Actor and Critic heads and single optimizer to train. What can be problem here?

Author

Account Strength

40%

Account Age

1 year

Verified Email

Yes

Verified Flair

Total Karma

Link Karma

Comment Karma

Profile updated: 2 days ago

Posts updated: 3 days ago

BagComprehensive79

Subreddit

r/reinforcementlearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 2 months ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/reinforceme...