Let's say we have a dataset of Overwatch games for a single player. The data includes metrics like elims, deaths, # of character swaps, etc, with a binary target column of whether they won the game or not.
For this scenario, we are interested in only deaths, and making a recommendation based off the model. Let's say that after training the model, we find that the average SHAP value for deaths is 0.15 - this SHAP value ranks 4 of all the metrics.
My first question is: can we say that this is the 4th most "important" feature as it relates to whether this player will win or lose the game, even if this isn't 100% known or totally comprehensive?
Regardless, does this SHAP value relate at all to the values within the feature itself? For example, we intuitively know that high deaths is a bad thing in Overwatch, but low deaths could also mean that this player is being way too conservative and not helping their team, which is actually contributing to them losing.
My last question is: is there any way, given a SHAP value for a feature, to know whether that feature being big is a good or bad thing?
I understand that there are manual, domain-specific ways to go about this. But is there a way that's "just good enough, even if not totally comprehensive" to figure out if a metric being big is a good thing when trying to predict a win or loss?
Subreddit
Post Details
- Posted
- 7 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/datascience...