This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
The new version of the API has taken a bit longer due to a lot of testing on my end. There's a lot of changes with the new API that should make searches more powerful. Here are some of the changes.
Elasticsearch 7.x will replace the existing 5.x back-end.
The standard analyzer will be replaced with "ICU_analyzer." This analyzer gives better support for Asian languages and also much better support for unicode / emoji searches. It will now be possible to search for emojis within comments such as 😂, 💕, 🤔, etc. Basically all emojis will be searchable.
Multiple analyzers for body / selftext / title. Currently there is one main analyzer used in the current API but it limits the range of searches to words. I will be adding additional analyzers so that people can search for non-word characters such as :) :( :-/ etc. I'm still testing various ways to make this as powerful as possible.
Better stemming support. Currently searching for dog or cat won't return results for "dogs" or "cats", so one of the analyzers I'm building will provide stemming support. One of the remaining issues is to figure out which stemmer is the most appropriate one to use. The "Snowball stemmer" is rather broad, but there are other stemmers that are a bit more specific. Currently users would have to search for dog|dogs to pull comments mentioning dogs.
Basic synonym support. Searching for "🍕" would also return comments mentioning pizza, etc. This is experimental and will be in its own analyzer. I'm currently looking for a good synonym file that is SOLR compatible. Searching for "queen" would ideally return comments mentioning "monarch," etc. This would not be the default analyzer so if you only wanted comments matching exactly what you are searching for, it would not interfere with that capability.
High speed API for data requests for data within the past X days (90/180). The bulk of requests that hit the current API usually involve current data (data within the past few months). The goal here is to move the move recent indexes to a high-speed server with a lot of RAM so that the API can give results much faster than the current API can for recent data.
The ability to do ngram searches for authors. If you are interested in getting all authors on Reddit where the author name contains 88 or whatever, this will allow you to do that.
Better subreddit search support.
Better support for spam and bot detection
More to be added soon (this list will evolve over time)
As always, if you have a particular request that would help your research, please feel free to add those requests in the comments. Also, if you are interested in being a beta tester, just leave a comment with "I'm interested" so I can pull the names at a later point and contact you. Thanks!
Subreddit
Post Details
- Posted
- 5 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...