This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
For many years since Pushshift started, the comment volume on Reddit was low enough where sequentially fetching new data was sufficient to keep up with the amount of new data from Reddit.
Unfortunately, spam has become a huge issue with Reddit and there are times where millions of spam comments are generated in short bursts.
The new ingest script will have the following new features:
Ability to use multiple accounts to combine the rate-limits so that Pushshift can stay near real-time despite spam bursts.
Ability to keep the ingest near real-time. The goal is to have the ingest fetch new material within 5 seconds of it being created on Reddit. There are times when there are massive spam bursts that may cause the ingest to lag a bit more than 5 seconds, but once the new script is put into production, the Pushshift API should never fall minutes / hours behind.
More timely monthly dumps. Monthly dumps should be available within two weeks of the previous month's end with the goal to have them out within seven days.
I know a lot of services depend on the API being near real-time and it is frustrating for me to see the API fall hours behind due to large spam bursts on Reddit, so this upgrade will alleviate a lot of the current issues with the API falling behind.
The plan is to have the new ingest in production by February.
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...