This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Link: https://pushshift.io/reddit-statistics/
As some of you are aware, I ingest Reddit data in real-time and make it available for search via the Pushshift API. I recently started working on a real-time dashboard to examine Reddit activity in real-time. The link above goes to that.
There are several graphs here:
The first three graphs show Reddit comment and submission activity in real-time. The third graph (hour graph) is still filling because I'm using Redis to help offload stress from the database and elasticsearch to be able to handle several thousand hits per second to the dashboard if necessary.
The next graph shows Subreddit Activity for the previous 10,000 comments. It basically shows which subreddits are most active based on where comments are being made.
The Author Comment Activity graph shows which authors are making the most comments (across the past 10,000 comments).
The next graph shows which author's are currently making the most posts to Reddit.
The next graph shows the most popular discussions on Reddit. This is using machine learning and is still training (it will take a while to get just right).
I'm working on some dataviz to show which submissions are most active based on comment volume.
Some interesting obversations:
1) Automoderator appears to remove over 1% of all reddit comments immediately. It also adds around .75% to the total comment volume.
2) Submissions appear to be more bot driven than comments. At night, comments will have a sharper fall-off than submissions. The only thing that I can think of that would most likely cause this is from bots making submissions during the late night. I can verify this with some more digging into the data.
If you all have any suggestions for real-time data visualizations, please feel free to suggest some!
Keep in mind that this page is basically still "alpha" and may have some kinks to work out.
Edit:
As for the most popular discussions on Reddit, there is a lot of room for improvement. Identifying the largest bots and excluding them from analysis, removing certain subreddits that "pollute" the results, etc. If you have any ideas, please feel free to suggest anything. Let's have an open discussion about this!
Subreddit
Post Details
- Posted
- 6 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/TheoryOfRed...