This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
20
Reading the Monthly Reddit Dumps with Python
Post Body
There have been some questions posted on how to deal with the large .zst files. This is a very basic module that will read the .zst files line by line for you without loading the entire thing into memory.
https://github.com/pushshift/zreader
Basic usage:
import zreader
import ujson as json
# Adjust chunk_size as necessary -- defaults to 16,384 if not specified
# Using the default is generally fine for most types of file systems / drives
zreader = zreader.Zreader("RC_2019-08.zst", chunk_size=8192)
# Read each line from the reader
for line in zreader.read():
obj = json.loads(line)
print (obj['author'], obj['subreddit'], sep=",")
Author
Account Strength
100%
Account Age
11 years
Verified Email
No
Verified Flair
No
Total Karma
143,730
Link Karma
34,810
Comment Karma
108,242
Profile updated: 2 days ago
Posts updated: 6 months ago
Subreddit
Post Details
We try to extract some basic information from the post title. This is not
always successful or accurate, please use your best judgement and compare
these values to the post title and body for confirmation.
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...