Actor Critic Model taking same action for all inputs

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Hello everyone,

I am working on Actor Critic model to take decisions from time series data. I tried different model architectures like LSTM, Transformer and even Mamba to check any different result but didnt change. Basically, I am passing time series data to my model to take an action out of 6 actions for each time step.

For input data I tried 2 different approach; Full Context length with input shape like (seq_len, hidden_dim) and in this case model is taking all time steps and gives action output for each time step. I also tried Fixed Context Length with input shape like (batch_size, seq_len, hidden_dim), and in this case model is creating an action output for each batch instead of time step.

I also implemented Epsilon Greedy for exploration and printing selected action percentages for each action as output at the end of each epoch to check models output.

My problem is starting at this point, I am using epsilon annealing over epochs and while training it is reducing epsilon number to reduce exploration. When I check my Critic Loss value over time, it is reducing significantly (MSE Loss for Critic is starting around 0.9 and going down to 0.002 for reward values between 1 and -1) so i think critic is learning.

At the end of each epoch, I am also running evaluation to see status and at evaluation, I am also checking selected action percentages for each action. Problem is, my model is selecting same action no matter what is input. At training I am using categorical sampling with epsilon greedy and thanks to that it is selecting different actions but when I use argmax at evaluation step, it is selecting same action.

What I tried; Different model architectures with various parameters sizes. Different learning rates from too high to too low. Different initial epsilon values. Different input types (Fixed Length and Full context)

None of them didnt make any difference. I implemented my actor critic model from this example and double checked for any mistake; https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f

After all this, I couldnt find any solution. Anyone has any idea?

Author

Account Strength

40%

Account Age

1 year

Verified Email

Yes

Verified Flair

Total Karma

Link Karma

Comment Karma

Profile updated: 1 week ago

Posts updated: 1 week ago

BagComprehensive79

Subreddit

r/reinforcementlearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 3 months ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/reinforceme...