This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
As the title says, I'm working on building a logistic regression model. I've set up a function to flag outlier observations based on Tukey's fences, as a couple of my predictor variables are heavily skewed. I'm doing a 70/30 train/test split. Of 299 observations, 107 lie outside Tukey's fences.
What I'm wondering is this: Do I violate any conventions by also splitting the outliers 70/30? So my training set would consist of 70% of the "typical" observations and 70% of the outliers, and my testing set would be the other 30% of each. The thought process here is to build the model with adequate exposure to extreme cases, and also be able to test on those extreme cases. Thanks in advance!
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/AskStatisti...