This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
First, I apologize for being out of touch with so many people. I've had to deal with family issues and also keeping Pushshift up and running and just got overwhelmed.
I'd like to give a huge thanks to the other moderators for holding down the fort while I was away.
I know there is a lot to discuss, but I wanted to let you all know that I'm still deeply invested in the project and we have some additional help now besides just me. The API itself has gone through some major issues recently that we've been working on (through adding more servers, etc.)
Tonight, the production API will finally be getting the new ingest code put in place so that it doesn't fall behind (comments, etc.).
I know there is a lot to catch up on so please feel free to make this an open forum for any outstanding issues, etc. I'll be around much more often now.
Thanks again everyone!
Few major points
I'm removing Postgres from the API -- it's just a pain in the ass to maintain duplication of data. What that means is that Postgres will still be used for archival purposes, but I am going to move the API away from relying on Posgres to answer queries that don't have a query parameter. Elasticsearch should be able to catch up. In the meantime while the code gets fixed, you can get the latest comments and submissions without using a query term by using q=* -- I'm going to be removing the Postgres dependency in the API so that people don't need to pass q=*.
The beta ingest is currently down because I'm moving things over to api.pushshift.io. api.pushshift.io should get caught up within the next few hours at most and will be up-to-date with all recent comments and submissions.
beta.pushshift.io should be back up by tomorrow.
I'm going to run some queries to fill in any missing data in api.pushshift.io in the next few days.
All monthly submission dumps are up to date
All comment submission dumps will be up to date this weekend.
Files.pushshift.io is being moved to an entirely new server off the network that powers the APIs. There is just too much congestion on the web server (over 25,000 requests per second sometimes coming in)
If you are downloading data from files.pushshift.io, you may see interruptions until this weekend. We need to free up bandwidth to the API endpoints -- but rest assured the data isn't going anywhere and if you see missing files, it's because we're moving 2.5 terabytes to a new server and that should complete in 2-3 days
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...