RL for Pokemon (specifically "subgame solving")

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

I’m pretty new to RL, and I was wondering if I could get some insight on how an idea of mine could work for Pokemon battling. For those unfamiliar with Pokemon battling, it’s an imperfect-information game with a very large state space and a lot of hidden information regarding the opponent’s moves, their stats, etc. For generalization, let’s say each player has a team of 6 Pokemon (we’ll call this 6v6). The action space is the 4 attacks 5 potential switches for a total of 9 moves.

My main question is, can you train an agent by solving “subgames” where first you have it become a 1v1 pro (so each player only has 1 Pokemon), and then become a pro at 2v2 by learning a policy that reduces the 2v2 game into a favorable 1v1 scenario, eventually scaling up to 6v6? So rather than starting from the (daunting) 6v6 problem, we explicitly start with the much simpler smaller endgame scenarios and frame the problem into learning policies that map to a state that we have already solved before. Is there an RL algorithm that couples naturally with this? The motivation here is that hopefully the agent will naturally learn concepts like win conditions, reducing games to favorable endgame scenarios, etc. which are all larger macro concepts that separate top human players from people simply clicking the super effective move.

For example, how would you modify Deep-Q learning to adopt this learning paradigm? Is there some way you could “freeze” weights of the network that are good approximating Q-values for these 1v1 states, and then freeze another layer of weights that are good at approximating Q-values for 2v2 states, etc. in these rounds of training subgames? Would that make any sense to do?

Author

Account Strength

100%

Account Age

9 years

Verified Email

Yes

Verified Flair

Total Karma

9,456

Link Karma

4,999

Comment Karma

4,290

Profile updated: 1 week ago

Posts updated: 10 months ago

trolltest123

Subreddit

r/reinforcementlearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 3 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/reinforceme...