Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

30
Examining Reddit in Real-Time -- some interesting observations and potential new questions.
Post Body

Link: https://pushshift.io/reddit-statistics/

As some of you are aware, I ingest Reddit data in real-time and make it available for search via the Pushshift API. I recently started working on a real-time dashboard to examine Reddit activity in real-time. The link above goes to that.

There are several graphs here:

The first three graphs show Reddit comment and submission activity in real-time. The third graph (hour graph) is still filling because I'm using Redis to help offload stress from the database and elasticsearch to be able to handle several thousand hits per second to the dashboard if necessary.

The next graph shows Subreddit Activity for the previous 10,000 comments. It basically shows which subreddits are most active based on where comments are being made.

The Author Comment Activity graph shows which authors are making the most comments (across the past 10,000 comments).

The next graph shows which author's are currently making the most posts to Reddit.

The next graph shows the most popular discussions on Reddit. This is using machine learning and is still training (it will take a while to get just right).

I'm working on some dataviz to show which submissions are most active based on comment volume.

Some interesting obversations:

1) Automoderator appears to remove over 1% of all reddit comments immediately. It also adds around .75% to the total comment volume.

2) Submissions appear to be more bot driven than comments. At night, comments will have a sharper fall-off than submissions. The only thing that I can think of that would most likely cause this is from bots making submissions during the late night. I can verify this with some more digging into the data.

If you all have any suggestions for real-time data visualizations, please feel free to suggest some!

Keep in mind that this page is basically still "alpha" and may have some kinks to work out.

Edit:

As for the most popular discussions on Reddit, there is a lot of room for improvement. Identifying the largest bots and excluding them from analysis, removing certain subreddits that "pollute" the results, etc. If you have any ideas, please feel free to suggest anything. Let's have an open discussion about this!

Author
Account Strength
100%
Account Age
11 years
Verified Email
No
Verified Flair
No
Total Karma
143,730
Link Karma
34,810
Comment Karma
108,242
Profile updated: 11 hours ago
Posts updated: 6 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
6 years ago