This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
It's been a long time coming to get caught up and work through Reddit API changes, but the files are currently available via https://files.pushshift.io/reddit/submissions
The format of the files is RS_2018-08.xz
New fields:
There are some new fields available and a field that I have added to the data through a lot of data mining. Here is an example of a JSON object from one of the files
Reddit now includes the author id under the authorfullname field. This starts with "t2" and is a base 36 representation of an integer, just like the other id fields.
I have added a field called "author_created_utc" -- this field is the epoch time of when that account was created on Reddit. Some data scientists may find this very useful with certain types of analysis.
If you have any questions, please let me know. July and August comments are near completion and should be available in a few days.
subreddit_subscribers -- Every submission object has the number of accounts subscribed to a subreddit. This number is accurate when compared to the "retrieved_on" value (when I ingested the object) and not the "created_utc" value. If you are analyzing subreddit growth, remember to always use the retrieved_on time when analyzing the number of subscribers.
Thank you!
Subreddit
Post Details
- Posted
- 6 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...