This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Pushshift is now actively ingesting Gab posts and making the data available via an API for research purposes.
Details on how to use the API.
Data is updated in the index approximately every 30 seconds. Both historical and new data is updated. If you are interested in toxicity research, this is an excellent data source.
https://gab.pushshift.io will show you the current ES mapping that is being used to index the data. Fields within that mapping can be searched, filtered and aggregated.
Eventually an updated corpus will be published containing the new data (this will be merged with the existing Gab corpus).
Here's a quick example of searching for new posts that contain the search term "trump":
https://gab.pushshift.io/search/?q=body:trump&sort=created_at:desc&size=100&pretty
Also, I kept the user object in each post. It eats up more space, but it adds a lot more functionality. For instance, you can pull posts for specific authors that have a follower count over a certain number. Here is an example:
https://gab.pushshift.io/search/?sort=account.followers_count:desc&size=10&pretty (This will sort posts based on author's that have the most followers). You can combine this with time ranges, etc.
This endpoint is fully compatible with all the Elasticsearch search functions, filters and aggregations.
Please let me know if you have any questions.
Subreddit
Post Details
- Posted
- 5 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...