We’ve integrated generative AI features deeply into our no-code test automation platform, Rainforest QA. 

Each of these features is designed to help you avoid the time-consuming and otherwise annoying work of keeping automated test suites up to date — so your software development team can stay focused on shipping, fast.

In this video, our CEO, Fred Stevens-Smith, walks through what some of these genAI features look like in action.

If you prefer to read instead of watching the video, we’ve detailed below the various ways we’re using generative AI for test automation. Plus, we’ve provided a short explainer on how we optimize our AI for software testing and what makes it different from any other AI solution in the quality assurance market. 

Create automated test cases with simple, plain-English prompts

There are a number of test automation tools that allow you to create automated test steps using plain-English prompts. The natural language processing capabilities of genAI models make them ideal for this use case.

But in many cases, test case generation requires a separate prompt for every step you want the test to execute. So, for example, for your web app’s signup flow, you’d have to write step-by-step instructions with prompts like these:

  1. Click on the Sign Up button
  2. Fill the email field with tester@test.com
  3. Fill the password field with 12345
  4. Click the Continue button

That’s more manual effort than necessary.

We’re focused on helping you save time and move faster, especially in your testing workflow. For test case creation in Rainforest, you can create a whole series of steps with just a single prompt. 

In the test scenario above, you could simply enter the prompt “Create an account using dummy data” and Rainforest’s generative AI model would create comprehensive test coverage for your signup flow.

Rainforest’s AI creates a series of steps based on a single prompt, even including test data generation

Tests update themselves with self healing

Engineering leaders consistently tell us that maintaining automated test scripts is a time-consuming and tedious part of the testing process for their teams. It distracts them from their primary goal: shipping code.

We’re using generative AI to shift the burden of test maintenance from your team to our specialized AI agents. 

When a test fails during execution due to an intended change in your app (and not due to a bug), Rainforest’s artificial intelligence will automatically update — or “heal” — the relevant test steps to reflect your intended changes.

When the failing test steps are from an AI-generated test based on one of your plain-English prompts, the AI will proactively update and save the test steps since it understands your intent. For failing test steps that were created manually with no-code (and not with a prompt), the AI will suggest a fix for you to approve or deny. 

This AI-powered self-healing functionality means you’ll spend a lot less time investigating and addressing false-positive test results stemming from intended changes to your app. When Rainforest does report test failures, it’ll be more likely to represent actual bugs or potential issues you’ll want to fix to keep your app high-quality.

A test in Rainforest automatically healed itself using genAI in response to a change in the tested app

Our goal is to find the ideal balance between helping your software development team move quickly while giving you confidence in your test suite and software quality. 

Any changes the AI makes are completely transparent to you, and you have final control over your tests. The system also records version histories of your tests if you ever want to revert.

Avoid flaky automated tests with reliable element locators

Self-healing isn’t the only way Rainforest uses generative AI to help you avoid false-positive failures in your automated tests. Rainforest also uses genAI to create fallback methods for finding the app elements — like buttons, form fields, and more — indicated in your tests. 

When a test automation tool only uses a single method — like a DOM selector — to locate elements in your app, its tests can be quite brittle. That is, they can fail easily due to minor changes that aren’t even apparent to users. 

For example, if a button’s ID changes from “signup-button” to “sign-up-button,” a test looking for the “signup-button” ID would fail, even if the button’s appearance and functionality hadn’t changed. Someone would need to diagnose the test failure, identify the underlying issue, and update the test to use the correct element ID. It’s not rocket science, but it definitely interrupts workflows and is a low-value use of time.

Rainforest uses up to three different methods to identify elements in your app, which makes its tests a lot more robust. 

These three methods include screenshots, DOM selectors, and AI-generated descriptions. When you or the AI first identify a target element by taking a screenshot, the system automatically captures the element’s DOM selector and generates a description of the element.

During test execution, if the system can’t locate an element based on its screenshot or DOM selector, the AI will search for the element based on the element’s description (e.g., “Pricing” located near the top middle of the screen).

Rainforest uses up to three different methods to identify elements in your app, including an AI-generated description

Having these fallbacks means avoiding brittle tests that interrupt your team with false-positive test failures every time you make minor changes to your app. 

What makes Rainforest’s generative AI different?

The unreliable outputs of generative AI tools like Copilot have made many software developers understandably skeptical. 

We’ve implemented several unique methodologies to make Rainforest’s generative AI more reliable, and especially well-suited for software test automation.

A RAG pipeline makes the AI better at software testing

Like many other AI testing tools, Rainforest works with the large language models (LLM) available via the API of OpenAI, the makers of ChatGPT. 

But we uniquely have access to over ten years of real-world data from manual test cases executed as part of our old crowd-testing business.

We use the behavioral data sets generated from those thousands of human testers to augment (or improve the training of) OpenAI’s models and make them more effective for software testing.

We augment the models using something called Retrieval Augmented Generation (RAG). RAG is a type of runtime prompt engineering where our system dynamically adds relevant things to a prompt before asking the agent for an answer. (You’ve probably done some form of prompt engineering in your interactions with ChatGPT, refining your instructions to get a more useful response.)

It’s a way of giving the AI agent contextually-relevant info it doesn’t have in its original training data. In this case, our system automatically adds relevant information using the 10+ years of software quality assurance data we’ve collected from the execution of manual testing in our crowd testing platform.

For each prompt a customer enters in the platform, we do a “nearest-neighbor lookup” in our historical data sets, looking for semantically-similar prompts. Our system uses those “nearest neighbors” pieces of information to formulate helpful guidelines for the agent. E.g., “For prompts like this, the flow usually involves filling user and password fields, and then clicking the login button.”

Notably, this means no historical customer data from these manual tests is ever used verbatim — our RAG pipeline abstracts a set of data into an instruction for the AI agent. 

Complementary agents can work out the big picture and the details

It’s difficult to teach a single AI agent to adopt multiple approaches — this is a common challenge in artificial intelligence. For example, it’s difficult for an agent to be both good at broad, high-level planning and narrow, detailed, execution.

Instead, we’ve developed different agents that each specialize in specific things and give feedback to each other. During their collaboration, when they disagree, they start talking to each other iteratively until they reach an optimal decision.

You can read about our novel (patent-pending) “complementary agents” approach in this blog post.

Multi-modal data means testing the actual user experience 

Rainforest’s AI agents use different types of data — including, and notably, visual data — from the app under test to make decisions.

This approach follows the same philosophy that led us to focus on testing the visual layer in our no-code automation platform instead of the DOM (the behind-the-scenes code) — tests should evaluate what users will experience, not what computers see in the code behind the experience.

In that spirit, our visual-processing algorithms have been trained using machine learning to simulate human judgment: when visual changes in the app under test are so small that a human software tester wouldn’t notice or care, the system will ignore those changes. (Though you can toggle on a “strict matching” option.)

The AI works inside and outside the browser

Unlike other generative AI tools for test automation, Rainforest’s AI isn’t limited to testing what’s in the browser window — it can evaluate and interact with anything you see on the screen in one of our Windows or macOS virtual machines, like the start menu, task bar, or file explorer. 

So, while we’ve optimized the platform to test web apps, it can accommodate testing other types of software products related to web apps, like browser extensions and downloadable files. These are often the edge-case scenarios that other AI testing tools can’t handle. 

If you’re a growing SaaS startup ready to automate your manual testing efforts and level-up your testing strategy, talk to us about checking out a live demo of our AI testing tool.