This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
9
Reddit September 2018 Comments are now available for download
Post Body
Stats:
Key | Value |
---|---|
Filename | RC_2018-09.xz |
Location | https://files.pushshift.io/reddit/comments/RC_2018-09.xz |
Start Time | 2018-09-01 00:00:00 UTC |
End Time | 2018-09-30 23:59:59 UTC |
Compressed Size | 10,715,442,268 bytes (~11 GB) |
Uncompressed Size | 117,964,567,469 bytes (~118 GB) |
Compression Type | .xz (LZMA/LZMA2) |
Subreddit Cardinality | 109,651 |
Author Cardinality | 5,052,316 |
Largest Score | 65,693 |
Lowest Score | -59,834 |
Number of Objects | 104,473,929 comments |
SHA256 Checksum | 5324affffdc7f39d2bd4e109adffbd3e2b245d9f57cc67759d7e109ea2d9ebb4 |
File Format | ndjson (new line \n delimited JSON objects) |
File Encoding | UTF-8 (Unicode Encoded / 7-bit ASCII Safe) |
Data Visual | Hourly View of Data |
Top Subreddits | 50 Most Active Subreddits |
Top Authors | 50 Most Active Authors |
Time View | Top 5 Subreddits Time Aggregation |
Term View | Top 25 Subreddits with Comments mentioning Trump |
Admin Activity | Top 15 Subreddits with the most admin comments |
Verbose Comments | Top 25 Subreddits with comments greater than 5,000 characters in length |
Huge Trees | Top 5 subreddits with comment nest levels greater than 500 |
Fast Replies | Top 10 subreddits with comment replies less than 30 seconds |
Fast Replies (Authors) | Top 100 Authors with the most comment replies less than 30 seconds |
This file contains Reddit comments for September, 2018. There are four quarantined subreddits included in this dump: ice_poseidon, cringeanarchy, theredpill and braincels. I decided to include these in the standard dump since they were a part of previous dumps for a long time and they represent four of the largest subreddits quarantined to date by Reddit.
Python example of reading data (read_data.py):
#!/usr/bin/env python3
import ujson as json
import sys
for line in sys.stdin:
# obj is a dict object representing the comment data
obj = json.loads(line)
print(obj['subreddit'],obj['author'],obj['score'],sep=',')
Linux Command line to process the first 1,000 comments:
xz -cd RC_2018-09.xz | head -n 1000 | ./read_data.py
Author
Account Strength
100%
Account Age
11 years
Verified Email
No
Verified Flair
No
Total Karma
143,730
Link Karma
34,810
Comment Karma
108,242
Profile updated: 3 days ago
Posts updated: 6 months ago
Subreddit
Post Details
We try to extract some basic information from the post title. This is not
always successful or accurate, please use your best judgement and compare
these values to the post title and body for confirmation.
- Posted
- 6 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...