Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

24
Advice for developers who are migrating their scripts to the new site.
Post Flair (click to view more posts with a particular flair)
Post Body

Hi everyone. I'm the creator of Mturk Engine. I'm putting the finishing touches on the next major version which includes full compatibility with the new site as well as some new features. I aim to release within the next 24 hours barring unforeseen circumstances.

There were a lot of challenges I faced migrating over to the new site but I believe that the new site is much more secure and ultimately a lot easier to work with and it's easier than ever to make your own scripts to scrape data. I thought I'd share what I've learned about the new site in the hopes that it helps other developers update their scripts as soon as possible.

The JSON API.

Most pages (search, dashboard, queue, hits, status detail pages) will return data as JSON when requested with a format=json paramater and a responseType set to 'json'. I'm maintaining some documentation to describe the response returned by each of the different pages which is available here: https://github.com/Anveio/mturk-engine/blob/master/src/worker-mturk-api.d.ts

An alternative to the JSON API is to query the DOM for the the node containing a data-react-props attribute, which can be converted into an object via JSON.parse(node.dataset.reactProps) or JSON.parse(node.getAttribute('data-react-props'). Here's the code that does in that Mturk Engine.

Search

Fetching search results is easily done via the JSON API. The data you get back is in this format: https://github.com/Anveio/mturk-engine/blob/master/src/worker-mturk-api.d.ts#L3-L8 . A potential stumbling block is formatting your search paramaters according to the new site's format. I use the qs library and pass { arrayFormat: 'brackets' } as the second argument to .stringify. The search paramaters that the new site accepts (that I know of) are listed here.

Queue, Dashboard, Status Detail pages.

These are all available through the JSON API. An important difference with Status Detail pages is that their URL is in YYYY-MM-DD rather than MMDDYYYY as it was before. I use moment.js to massage the dates to the format I need them in but plan on switching to date-fns when I get the chance in order to reduce bundle size. Something that I noticed is that hits in status detail pages now have an additional identified called assignment_id but hit_id is how I uniquely indexed submitted hits on the old site. I'm not 100% sure what the purpose of assignment_id is other than being a parameter of an accepted hit page.

Returning A HIT

Returning a HIT on the Worker site is a lot different. Instead of sending a GET request, you must send a POST request to the URL of an accepted HIT. Your form data will need to include a _method field with a value of 'delete' as well as an authenticity token. The authenticity token is retrieved from a hidden form located on the return button itself. I wasn't able to retrieve the authenticity token from HITs in queue as the return button is not initially rendered. I may be doing something wrong but returning a HIT also caused a redirect to the sign-in page. So my current implementation assumes that if that happened, then the return went through successfully.

Accepting a HIT

Accepting a HIT via XHR (like what Turkmaster does) on the Worker site was by far my biggest hurdle. A successful accept from a accept_random link will redirect to a non HTTP page, triggering users' browsers to cancel the request for security reasons. I was able to figure out a workaround, which I describe here. I've submitted feedback requesting that the redirect from accept_random going directly to an HTTPS page would be a lifesaver.

With the migration out of the way, I can hopefully focus once again on building new features.

Author
Account Strength
90%
Account Age
16 years
Verified Email
Yes
Verified Flair
No
Total Karma
4,621
Link Karma
407
Comment Karma
4,214
Profile updated: 3 days ago
Posts updated: 4 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
7 years ago