This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hey all,
I wrote a moderation bot a while back that has grown pretty popular over the last few years, bigger (and more importantly, busier) than I thought it would be.
Context
The bot fetches new posts and comments in /r/mod (2 calls/min) every minute. For each one of those items, it will fetch the author's main profile, posts, and comments (3 calls/item). If the author's profile and history meet certain criteria, it will take action on the user where the original post or took place (varies, usually 2 calls/min).
Ethical Considerations
While the community is split on taking subreddit moderation actions on users for their participation outside of that subreddit and the admins have indicated that it's frowned upon, moderators have found this bot to be of significant help to cut back on spam and self-promotion in their subreddits. This bot's purpose is not to censor users for their political ideology, but instead to specifically target those whose intent is to lure readers to their profiles for monetary gain. I originally wrote it because users would often post adult content (for example, in their profile pictures) to subreddits -- frequently visited by minors -- with reckless disregard for that subreddit's rules against it, and it has been adopted by many other subreddits since.
Problem
Given the current rate limit of 60 calls/min, the bot can no longer keep up with the amount of content it needs to ingest. To make matters worse, my hosting platform charges for run time by the millisecond, and PRAW's busy-waiting for rate limit resets are becoming too costly to sustain.
Over the last 10 minutes alone, I've had 580 items come in, necessitating 1740 calls for user profiles and history, making for a whopping 174 calls per minute, well over the allowed 60 per minute.
Attempted Solutions
Some things I've tried to ease up on the amount of queries it needs to perform include:
- Cache user profiles so that recently seen users need not have their profile, posts, and comments scraped again. This has helped a lot, but it's not sufficient to keep up with the subreddits' growing user bases.
- Skip fetching the rest of a user's content when a positive match is found. This works, but the vast majority (about 98%) of user scans result in no match, and thus require the full scan to take place.
- Queue users for scanning so that while a user may not be immediately scanned on demand, it will eventually when the rate limit allows for it. This also works great, but the queue consistently grows larger and larger, and there is no off-peak time of day to catch up. After running for 12 hours overnight, the queue is about 36 minutes behind.
Some solutions I've considered but chosen not to implement include:
- /r/mod/new and /r/mod/comments consist of about 18% posts and 82% comments. I could try having the bot fetch only posts (no comments) every minute to reduce the workload, but that would significantly reduce its value. Scanning on posts is a component of the bot's core functionality, so I can't ditch those in favor of comments.
- Have the bot omit fetching either a user's post or comment history, but that would also result in a value reduction because positive matches are about half and half between posts and comments.
- Become more selective of which subreddits the bot works in, but I'm happy to see it being used everywhere that it currently is.
- Create multiple OAuth clients for increased rate limit allocations. This would work, but it's likely to make the admins very unhappy and could be a violation of TOS.
- Have subreddits (especially larger ones) create their own bot user and allow my application to perform actions with it. This would greatly increase the complexity for implementation and end user adoption. Currently, moderators simply invite the bot to their subreddit and it starts working. Having to create a new bot user and log into the application with it could deter some moderators from using it. Plus, I've seen many other, more popular bots not have to operate this way.
Plea
I'm a little stuck on what to do. I feel like I've limited my API calls to the extent that I can while still maintaining the bot's core functionality. Has anyone had experience with reaching out to [email protected] for a rate limit increase, and if so, how did it go?
Thanks for your time reading this. I didn't intend for it to be so long, but I wanted to provide as much detail as I could.
TL;DR
Run a mod bot. Mod bot is too busy to keep up with rate limits. I've tried reducing API calls as much as I can, but it's still too busy.
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/redditdev/c...