This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I am tweaking the mapping files to add a lot more functionality to the new beta testing server. The server itself is actually a Raspberry Pi 4B (since I'm fresh out of servers for the time being). Amazingly, it is extremely fast for basic searches!
The endpoint is:
Both comments and submissions are currently be ingested on these endpoints.
Comments: http://pi4.pushshift.io/rc/_search
Submissions: http://pi4.pushshift.io/rs/_search
I am testing a lot of new functionality but anyone is welcome to test some code against it. I've increased the rate-limit to 600 requests a minute (10 a second). Right now, the API structure is based on Elasticsearch queries. It isn't too difficult to use and I will also provide more detailed examples in the future.
Some of the new features include the emoji search capability. Foreign language search has also been improved. There are some new fields available including:
author_created_utc: The time when the author's account was created.
author_delta: The time difference between when the comment was made and when the author's account was made.
nest_level: The position of the comment in the tree structure. A top-level comment has a nest level of 1. A reply to a top level comment would have a nest level of 2 and so on. If the nest level is null, the nest_level could not be computed for that comment (this generally happens when the top level comment is unavailable for whatever reason).
reply_delay: This is the elapsed time in seconds between when the parent object was created and when the comment itself was created. A comment with a nest level of 1 has the reply_delay calculated by subtracting the time the submission was made from the time the comment itself was created. If the comment has a nest_level greater than one, then the reply_delay is the difference between the comment's creation time and that of its parent.
The nest_level, reply_delay, author_created_utc and author_delta are very interesting metrics because they can be used to detect bots / spam accounts.
Another mapping feature I am testing is partial author / subreddit matching. For instance, if the author's account name is throwaway4839583, you can search for "throwaway" and match any comment who's author has "throwaway" within the full author name (case-insensitive). I am also testing this feature with subreddit names as well.
The code driving this new beta endpoint is in active development so any suggestions, feature requests, bug reports, etc. are much appreciated.
Feel free to make as many requests as you want against it (up to the rate limit). I had to set some type of rate-limit to prevent abuse against the endpoint -- but you are free to make as many requests as you want to test things out.
The site itself may go down at any time for maintenance so do keep that in mind. Also, I would strongly recommend not having any production system depend on the availability of this endpoint. That said, I will try to keep it up as often as possible.
Subreddit
Post Details
- Posted
- 5 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...