Dealing with outliers in logistic regression

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

As the title says, I'm working on building a logistic regression model. I've set up a function to flag outlier observations based on Tukey's fences, as a couple of my predictor variables are heavily skewed. I'm doing a 70/30 train/test split. Of 299 observations, 107 lie outside Tukey's fences.

What I'm wondering is this: Do I violate any conventions by also splitting the outliers 70/30? So my training set would consist of 70% of the "typical" observations and 70% of the outliers, and my testing set would be the other 30% of each. The thought process here is to build the model with adequate exposure to extreme cases, and also be able to test on those extreme cases. Thanks in advance!

Author

Account Strength

100%

Account Age

5 years

Verified Email

Yes

Verified Flair

Total Karma

16,448

Link Karma

6,261

Comment Karma

9,986

Profile updated: 2 days ago

Posts updated: 1 year ago

PhoenixRising256

Subreddit

r/AskStatistics

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 3 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/AskStatisti...