Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

9
Reddit September 2018 Comments are now available for download
Post Body

Stats:

Key Value
Filename RC_2018-09.xz
Location https://files.pushshift.io/reddit/comments/RC_2018-09.xz
Start Time 2018-09-01 00:00:00 UTC
End Time 2018-09-30 23:59:59 UTC
Compressed Size 10,715,442,268 bytes (~11 GB)
Uncompressed Size 117,964,567,469 bytes (~118 GB)
Compression Type .xz (LZMA/LZMA2)
Subreddit Cardinality 109,651
Author Cardinality 5,052,316
Largest Score 65,693
Lowest Score -59,834
Number of Objects 104,473,929 comments
SHA256 Checksum 5324affffdc7f39d2bd4e109adffbd3e2b245d9f57cc67759d7e109ea2d9ebb4
File Format ndjson (new line \n delimited JSON objects)
File Encoding UTF-8 (Unicode Encoded / 7-bit ASCII Safe)
Data Visual Hourly View of Data
Top Subreddits 50 Most Active Subreddits
Top Authors 50 Most Active Authors
Time View Top 5 Subreddits Time Aggregation
Term View Top 25 Subreddits with Comments mentioning Trump
Admin Activity Top 15 Subreddits with the most admin comments
Verbose Comments Top 25 Subreddits with comments greater than 5,000 characters in length
Huge Trees Top 5 subreddits with comment nest levels greater than 500
Fast Replies Top 10 subreddits with comment replies less than 30 seconds
Fast Replies (Authors) Top 100 Authors with the most comment replies less than 30 seconds

This file contains Reddit comments for September, 2018. There are four quarantined subreddits included in this dump: ice_poseidon, cringeanarchy, theredpill and braincels. I decided to include these in the standard dump since they were a part of previous dumps for a long time and they represent four of the largest subreddits quarantined to date by Reddit.

Python example of reading data (read_data.py):

#!/usr/bin/env python3

import ujson as json
import sys

for line in sys.stdin:
    # obj is a dict object representing the comment data
    obj = json.loads(line)  
    print(obj['subreddit'],obj['author'],obj['score'],sep=',')

Linux Command line to process the first 1,000 comments:
xz -cd RC_2018-09.xz | head -n 1000 | ./read_data.py

Author
Account Strength
100%
Account Age
11 years
Verified Email
No
Verified Flair
No
Total Karma
143,730
Link Karma
34,810
Comment Karma
108,242
Profile updated: 3 days ago
Posts updated: 6 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
6 years ago