This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
This is an open question to the datasets community. To give some background first, I have two main sources of data -- the real-time ingest (which doesn't get score data since the comments and posts are brand new) and the monthly recrawl that re-ingests the entire previous month in order to get score data for comments and posts.
As many of you know, Reddit has recently banned the subreddit /r/incels. This ban occurred while I was re-crawling October comments.
I do have the very last comments and posts made to /r/incel in my real-time table.
My question is this: Is it worth adding the /r/incel comments and posts from my real-time table and include that data into the October monthly dumps? Obviously the score data will be 1 for comments and posts, but for the sake of completeness and academic research, it may be worth adding that data to the October dump.
The October monthly dump is almost complete for comments and should be made available tomorrow or very early Saturday.
tl;dr: Should I include the last comments and posts from /r/incel into the October dump files?
Subreddit
Post Details
- Posted
- 7 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/datasets/co...