Amazon’s Mechanical Turk, or MTurk, is a powerful, under-appreciated platform that allows you to allocate work to humans programmatically and at scale. Businesses get access to a vast, scaleable workforce, and workers can select from a variety of tasks whenever they want to work. MTurk can get work done very quickly, with tasks performed in parallel by a multitude of workers. Part of its power lies in the fact that, while it’s programmable, tasks are written in plain English, meaning almost anyone can use it.
The use cases are almost infinite. If work can be sent and returned electronically, it can be performed through Mechanical Turk. Lately, companies have been using it to prepare data for machine learning and data science — think of tagging objects in images or cleaning data. Other uses include transcribing audio, translating text, extracting data from documents, and searching for information, such as phone numbers for all the restaurants in Seattle. MTurk can save you having to buy the data or tie up someone in your organization to find it, which is often a lot slower.
Where does the name come from? The Mechanical Turk was an 18th Century machine that purported to be a chess-playing automaton but was actually an elaborate hoax: A human chess master was hiding inside. Amazon Mechanical Turk is AWS’s modern equivalent — an API with an army of humans behind it. It was launched in 2001, making it one of Amazon’s oldest cloud services, though it’s surprisingly little known.
Mechanical Turk is a highly efficient, cost effective service that enables you to outsource work 24x7. We’ve been using MTurk at my company for about six years to run software QA tests for our customers. It allows us to automate the unautomatable—we can run tests that require a human to perform them, like clicking a button on a website to make sure it works. And MTurk is superfast. Our tasks are typically completed in under 10 minutes.
MTurk is a two-sided marketplace that connects workers and “requesters” via a web interface. As a requester, you just create an account and choose from a selection of templated tasks, such as tagging images, that make it easy to get started. You can also build your own tasks from scratch. The service allows you to drill down to the worker level to see who performed your task, and you can send messages telling them they did a good job or explaining how they can do it better next time.
Workers visit mturk.com to see a description of the tasks available and the payment offered. They can preview a task to see if they want to do it, and if not send it back to the pool for someone else. If they accept a task and don’t complete it in the allocated time, that too goes back to the pool.
Task design is critical. If you want good results, tasks need to be designed well and have clear instructions. Workers talk among themselves on message boards, and it’s easy to get a bad reputation if your tasks are consistently hard to perform, so it’s worth getting this part right.
How do you set the right price? This is super-important, especially if you’re submitting high volumes of work. The payment can’t be higher than you can reasonably afford or too low to be worthwhile for the worker. Iterate in the following manner until you get it right.
Some people treat the workers on MTurk like an API. Remember that they’re human beings and treat them accordingly. Be fair, transparent, and communicative. Like most workers, they’re motivated by payment, status, and pride in what they do. Don’t hesitate to send an email telling them when they’ve done a good job, and you can even pay them a bonus. If you build a relationship with workers, they’re more likely to choose your work and perform it well. Most workers are aged 25 to 35, but there are older and younger people too.
Workers live in all regions. This means they don’t all have English as a first language, which is another reason tasks need to be clear. It also means there are workers online at all times, so you can get work done whenever you need it.
MTurk offers two special types of worker that you might choose to pay a little extra for. Master workers are those who have “demonstrated excellence” across a range of tasks as determined by Amazon’s statistical monitoring. You’re basically paying extra for what Amazon considers a premium worker.
Workers with Premium Qualifications have specific attributes you can request, such as living in a certain country or having an Android phone or an iPhone.
Retention is key. If you plan to use the service a lot, retention is important for your success. You don’t want to waste time training workers on your tasks only to have them leave, and it can take time for them to get up to speed and be efficient.
The workers form communities. Identify the leaders in your community; they’re your ear to the ground.
We train our workers with simple YouTube videos created by our company, and we constantly retrain them, such as when they leave the platform for a month and come back. Here are some tips for enabling workers:
How should you handle workers who aren’t doing tasks well? Start by emailing them. Tell them what they can do better and help them improve. If they can’t improve, you can reject them for tasks via an API or the MTurk interface. You can also use qualifications as a nice way to “soft-ban” workers, i.e. add a qualification you know will make a particular worker ineligible.
It is also possible to block a worker, but it is not an option to be taken lightly. Blocking will likely affect the worker’s reputation, and potentially yours, so be careful how you use it. We’ve blocked very few people over the years.
Amazon describes tasks as Human Intelligence Tasks, or HITs. A HITType is the highest-level building block, where you store the title, description, reward, and qualifications for a task. Each task becomes a HIT, or the individual thing you want done.
After a worker accepts a HIT, MTurk creates an Assignment to track that task through to completion. Notifications are sent over HTTP or SQS to tell you when the state changes on tasks. These are optional unless you’re doing integrations or other real time work where you need to know the status of tasks.
You can start off using the Developer Sandbox to experiment with creating and responding to HITs without actually spending money. Useful API operations you might need:
MTurk offers templates for common uses of the site, such as image tagging. For other tasks, you can use the question types. QuestionForm is the simplest to create. It’s an XML-based form that you can customize with your task. HTML Question allows for a bit more customization using HTML.
There is also an ExternalQuestion type. While the previous question types are hosted by AWS, an ExternalQuestion is hosted on your own server. It’s an iFrame that allows you to embed whatever content you like. And because it’s hosted on your own server, you can change the order in which tasks are distributed among workers, allowing you to prioritize tasks.
Review policies allow you to evaluate the work performed against a defined set of criteria, which helps identify work that’s not being done properly. There are two types of policies: Assignment-level and HIT-level.
Assignment-level policies can validate responses to known answers. You can specify if one question in your HIT has a known answer, and reject the assignment when more than a certain number of known answers are incorrect.
HIT-level policies look for consensus among workers on each HIT. You can automatically compare answers to detect if there’s a majority or consensus answer. You can then optionally reject assignments that don’t match the consensus.
Because we run our business on MTurk, we use a machine-learning backed system to do additional review policies and ensure testers are giving us the right answers.
Publishing HITs in batch form can save a lot of time if you’re submitting numerous HITs of the same type. If you want workers to tag the objects in 1,000 images, for example, you could upload them together in a CVS file and MTurk will automatically create a separate HIT for each worker, so they can be done in parallel.
This will be fine for most people, but it doesn’t allow you to prioritize the order in which tasks are performed. Using the ExternalQuestion format, where work is queued up on our own servers, we decouple our HITs from the actual assignments that MTurk distributes to workers. That means we can prioritize jobs if they need immediate attention, by changing the order in which our tasks are pushed into MTurk. This also makes it easier to cancel HITs if our customers decide to cancel a job they’ve given us.
We want our tasks to be performed as quickly as possible, but if there’s a break in the work and our workers are offline or inactive for a while, it can take a few minutes to bring them back online when work comes in. To get around this, we list HITs on MTurk even when we don’t have any real work that needs to be performed. Instead, we give workers training videos to watch or we do repeat tests of our own software. That means when real work comes in, our workers are online already and they can get started on the real tasks in a matter of seconds. This obviously won’t be important to everyone, but it’s an incredibly useful trick if you need low latency work.
Mechanical Turk is a powerful service for all kinds of work, and it has become even more useful given the rise of machine learning and the need to clean and prepare large datasets. The work is done by humans, so we get human results: Someone can actually describe a specific outcome they experienced, so we don’t have to dive into data to figure out what happened. But because they’re humans, you need to treat them nicely and train them well. If you do, you won’t be surprised to find they can be really effective.
Reprinted with permission. ©IDG Communications, Inc., 2017. All rights reserved. https://www.infoworld.com/article/3217674/cloud-computing/how-to-make-the-most-of-mechanical-turk.html
Today, we’re launching our no-code QA platform with a free-forever plan, making software test automation accessible to any product contributor in any company.
Let's talk about some crucial key components of company culture which are necessary for remote work to thrive.
How do we keep the Rainforest tester network secure? In this post, we’ll dive into how we secure the actual crowd to make sure they are real, trustworthy people (and not robots).
Learn how Rainforest testers are onboarded and trained, and how Rainforest ensures that testers provide reliable, consistent test results for each run.