This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
One area that is lacking for Pushshift's data archives of Reddit is a comprehensive data source for all subreddits and which moderators mod various subreddits.
I am working on building a database which I will make available through new endpoints along adding the data to the monthly dumps. It would probably also be helpful to include information about all new subreddits that are created each month.
I should have the data ingested in the next week -- I will update when it is completed.
If you are interested in getting data for all Reddit subreddits (including their subscriber counts, etc.) and also getting information on moderators (who mods what, when they were added, what permissions they have, etc.) then this data is for you!
One issue that makes getting this data difficult is that the subreddit information provided by Reddit also includes all user subreddits. You can get data from Reddit's API by merely walking through the ids sequentially -- but once user subreddits were introduced, it vastly increased the number of subreddits available via the API and made getting regular subreddits more difficult from merely sequentially walking through the ids.
For example, the earliest subreddits on Reddit start around id #6, then skip around in the low thousands, and then skip up to around ID 4.5 million. Here is an example of some subreddits by ascending ID (below). As you can see, the ids skip around and then become relatively stable after ID 4,594,300 million.
id | subreddit |
---|---|
6 | reddit.com |
40718 | nsfw |
95442 | features |
95455 | request |
95487 | olympics |
96552 | de |
96553 | fr |
96554 | es |
96555 | ko |
96557 | zh |
96558 | ja |
98753 | pl |
98754 | hu |
98755 | pt |
98756 | tr |
98757 | id |
98758 | eo |
98759 | eu |
98760 | it |
98761 | nl |
98762 | no |
98763 | vi |
98764 | ro |
98765 | ca |
98766 | ru |
107283 | infogami |
107958 | pixoh |
113928 | programming |
154523 | flagr |
154536 | joel |
154538 | kiko |
158101 | sv |
158104 | moveon |
158107 | inkling |
158109 | voo2do |
199564 | hy |
199566 | freeculture |
199571 | askcaterina |
212700 | lipstick.com |
218200 | arxiv |
226752 | web2 |
234707 | askfounders |
261446 | jackbe |
273591 | bg |
301018 | podbop |
323046 | rail |
367159 | rlc |
461806 | nytimes |
466679 | slate |
524490 | washingtonpost |
536836 | sl |
1058648 | science |
1229772 | oww |
2156921 | obama |
2352663 | ads |
2974194 | netsec |
3949442 | politics |
4162920 | bugs |
4594300 | business |
Subreddit
Post Details
- Posted
- 5 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...