Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

8
Dealing with outliers in logistic regression
Post Body

As the title says, I'm working on building a logistic regression model. I've set up a function to flag outlier observations based on Tukey's fences, as a couple of my predictor variables are heavily skewed. I'm doing a 70/30 train/test split. Of 299 observations, 107 lie outside Tukey's fences.

What I'm wondering is this: Do I violate any conventions by also splitting the outliers 70/30? So my training set would consist of 70% of the "typical" observations and 70% of the outliers, and my testing set would be the other 30% of each. The thought process here is to build the model with adequate exposure to extreme cases, and also be able to test on those extreme cases. Thanks in advance!

Author
Account Strength
100%
Account Age
5 years
Verified Email
Yes
Verified Flair
No
Total Karma
16,448
Link Karma
6,261
Comment Karma
9,986
Profile updated: 2 days ago
Posts updated: 1 year ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
3 years ago