This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Iām pretty new to RL, and I was wondering if I could get some insight on how an idea of mine could work for Pokemon battling. For those unfamiliar with Pokemon battling, itās an imperfect-information game with a very large state space and a lot of hidden information regarding the opponentās moves, their stats, etc. For generalization, letās say each player has a team of 6 Pokemon (weāll call this 6v6). The action space is the 4 attacks 5 potential switches for a total of 9 moves.
My main question is, can you train an agent by solving āsubgamesā where first you have it become a 1v1 pro (so each player only has 1 Pokemon), and then become a pro at 2v2 by learning a policy that reduces the 2v2 game into a favorable 1v1 scenario, eventually scaling up to 6v6? So rather than starting from the (daunting) 6v6 problem, we explicitly start with the much simpler smaller endgame scenarios and frame the problem into learning policies that map to a state that we have already solved before. Is there an RL algorithm that couples naturally with this? The motivation here is that hopefully the agent will naturally learn concepts like win conditions, reducing games to favorable endgame scenarios, etc. which are all larger macro concepts that separate top human players from people simply clicking the super effective move.
For example, how would you modify Deep-Q learning to adopt this learning paradigm? Is there some way you could āfreezeā weights of the network that are good approximating Q-values for these 1v1 states, and then freeze another layer of weights that are good at approximating Q-values for 2v2 states, etc. in these rounds of training subgames? Would that make any sense to do?
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/reinforceme...