This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
There was an issue causing the API to "stutter" due to connectivity issues with PostgreSQL. The problem was that certain (very expensive) API requests would cause SQL queries to run basically forever. As these requests accumulated, more and more I/O would be consumed for these requests. Worse still, these requests caused the number of available PostgreSQL connections to dwindle down until there were none available.
This would cause latency to slowly creep up over time until there were no more available connections in the connection pool. Once this happens, the API would start throwing internal 5xx errors.
The solution to this was to create a script that checks all currently running SELECTS from Postgres and to terminate any queries that have been running for more than a minute.
The API latency for certain requests was substantially lowered once these long running requests were killed. Also, the number of 5xx errors dropped significantly after clearing out these requests.
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...