The Downfall of DOM and the Rise of UI Testing title image

The Downfall of DOM and the Rise of UI Testing

In our last post, we looked at the multiple layers of testing and where UI tests fit into your overall architecture. In case you didn’t read it, here’s a TLDR:

Testing architecture can be grouped into 3 “layers”:

  1. Unit tests
  2. Integration tests
  3. UI tests

Layer 1 tests tiny chunks of code in complete isolation. Layer 2 tests larger pieces of code in partial isolation. Layer 3 tests your entire application “end-to-end”, meaning that it checks that all the parts of your tech stack are working together as expected.

In this post, we’ll be focusing on Layer 3 specifically.

First, we need to make an important distinction - although we’re talking about DOM-based testing as a Layer 3 solution, we need to keep in mind that DOM-based testing is not actually UI testing.

Real users do not interact with the DOM - they interact with a visual representation of the DOM (aka the User Interface). The term “DOM-based UI testing” is misleading and incorrect. Although the DOM and the UI are related, they are different.

Recall our diagram from the previous post that illustrates end-to-end (e2e) testing in the form of a UI test

This is not what a DOM-based test is actually doing. In reality, it’s short-circuiting this flow by interfacing with the DOM. A more accurate representation of DOM-based testing looks like this:

Our “Testing Agent” is Selenium/Cypress/any DOM-based solution that is executing your tests. We’ve completely omitted the actual user interface, so we are not actually testing our UI at all. Our testing agent takes the place of a real user, but it does not interface with our application in same way that a human does.

While DOM-based tests and UI tests are functionally similar and both attempt to act as a Layer 3 solution, we will see that true UI testing using Rainforest Automation is a superior solution.

DOM-based testing

Selenium works by emulating a user’s interactions with the browser. A test might look something like:

  1. Navigate to a URL
  2. Locate an HTML element on the page
  3. Perform an action on the HTML element
  4. Check if the UI has been updated as expected

Great, we are now supremely confident that our button (and all the things hooked up to the button) are working as intended! On the surface, this seems like a bullet-proof method for ensuring quality. However, as we dig deeper we start to see that this approach has many holes.

The first red flag is that our layer 3 solution suffers from the the same pitfall as layers 1 and 2 - we are writing code to test our application. This test might look something like:


# find button
button = driver.find_elements_by_xpath('//*[@id="editor"]/table/tbody/tr[3]/td[3]/input')[0]

# click button
button.click()

# check that the "bar" class is applied to some another element
element.find_elements_by_xpath('//table/tbody/tr/td/div')[0]
assert element.get_attribute('class') == 'bar'

Although Selenium has ruled the world of DOM-based testing for over a decade, there are alternative solutions that are focused on providing a better user experience. A popular choice is Cypress, which claims to fix the shortcomings of Selenium. It’s hard to dispute that Cypress is a much better tool, but it fundamentally cannot escape the same pitfalls of Selenium (or any DOM-based testing solution) falls into.

The Downfalls of DOM-based testing

Skilled Engineers are required

If that tiny code snippet scared you, you are not alone. Keep in mind that it is a very contrived, simplistic example that does not capture all the technical know-how required to adequately test a UI with Selenium. This means you need to hire skilled QA engineers to write your automation scripts - and like most engineers, they are not cheap. Wouldn’t it be nice if your expensive engineers could focus on building things that being value to your users instead of testing things?

This does not mean QA engineers do not have a place in your organization. Instead of spending their time writing code, they should be empowered to think strategically by doing things like:

  • planning their QA methodologies
  • deciding what should be covered
  • deciding when tests should be run based on the impact to the business

Often times, QA engineers operate separately from the rest of your engineering team. A typical workflow looks like:

Product team
  1. creates requirements for a feature
Engineer 1
  1. interprets the requirements
  2. writes the code + unit tests for the feature
Engineer 2
  1. reviews the code
  2. rejects and sends it back to step 2, or approves and forwards it to step 4.
QA engineer
  1. interprets the requirements from the product team
  2. tests that the feature works as intended
  3. writes Selenium scripts
  4. sends code + unit tests + Selenium scripts on to whoever is responsible for releasing the code to production

In modern engineering organizations, step 4 should often be unnecessary. Why do we need an additional engineer to spend their time interpreting product requirements and writing an additional layer of code? If Engineer 1 is responsible for unit tests, why aren’t they responsible for UI tests also?

Answer: existing DOM-based tooling is not good enough and is complicated to use. At some point, it’s much cheaper to hire a dedicated QA person to handle these things because Engineer 1 is not an expert in Selenium. Their time is expensive, and we don’t want to eat it up by having them do work they are not efficient at doing.

It’s reasonable to argue that having a person dedicated to QA is ideal, but why do we need an expensive engineer to ensure that quality? Why can’t your QA expert be a non-engineer?

Even modern solutions like Cypress cannot escape this. Their website explicitly calls out the fact that it is “built for Developers and QA engineers.”

Spending hours writing code prevents them from leveraging their time to tackle bigger, harder problems that can bring more value to the business.

Code that tests code is subject to bugs

Every layer of our testing pyramid consists of code that tests code. But who tests the code that tests the code? Are we doomed to have infinite layers of code to test code?

In practice, we can be reasonably confident that our testing code is relatively bug-free since the tests will usually fail if bugs are introduced in the automation script or the code that’s being tested. But this raises a deeper, philosophical issue: we are not testing what a user actually does and sees.

Users are real humans, not automation scripts

As we move from layer 1 to layer 3, we get closer to “reality” - meaning we want to test how the application runs in real life, not in some simulated environment. We should aim to get as close to reality as possible, which ultimately means a human interacting with your UI with a mouse and keyboard (or a touch screen, voice control, etc).

With DOM-based testing, your tests make assumptions about your code. This makes sense because automation scripts don’t “see” your UI in the same way that humans do - they only understand code - which means you're in testing your code (and the DOM’s state that is produced by your code) and not your UI. Often times when writing unit/DOM tests, you can massage the test to pass even if a bug has been introduced.

For example, say we want to test that clicking a button makes a popup modal appear like in the example below.

Clicking the button makes the modal appear

The DOM test might look like:

  1. click the button with ID "open-modal"
  2. does the element with ID "my-modal” now have its visibility set to visible?

You run the test and it passes. Great, no bugs to be found here… Right?

This test assumes that visibility: visible means that the modal is visible to the user. In reality, the fact that the element is now set  visible could mean absolutely nothing! There could be another element on top of our modal, meaning a user cannot see it. Or maybe there’s a bug in the modal’s positioning logic that causes it to render somewhere off of the screen.

We can easily get a “false pass” (meaning the test passed but there’s actually a bug in the thing that’s being tested). The automation script failed to a catch a bug that would easily be caught by a human performing the same task.

Not only is it easy to get “false pass”, but we can easily get a “false failure” - meaning the test fails, but there is no bug. Using the same scenario, we check does the element with ID "my-modal" now have its visibility set to visible? This time the test fails because the visibility  is undefined. When a human repeats this test, the functionality seems fine. What gives?

This false red occurs because we make an incorrect assumption: having the visibility attribute set to “visible” means that the modal is visible on the screen. In this case, an engineer may have changed to implementation to use the display attribute to show/hide the modal instead. Or maybe they set height: 0 - the point is that there are a lot of ways to show/hide something, and different circumstances call for different implementation techniques. Checking the implementation is not sufficient, we need to check how the interaction (button click) affects the UI (visibility of the modal).

The only way we can assume that visibility: visible means “the modal is actually visible” is if we have some kind of complex unit testing in place for our CSS. We won’t get deep into why nobody wants to write tests for CSS, but trust me - nobody wants to write tests for CSS. There’s really no good, reliable way to do this.

Here’s an actual example of a bug that made it to production.

Ultimately, this boils down to the fact the DOM-based tests are brittle.

DOM-based tests are brittle

As mentioned above, DOM-based tests make assumptions about your code. These can be granular assumptions (like class “bah” means “styles are correct”) or more high level assumptions. In particular, DOM tests are tightly coupled to the structure of your code.

Recall from our Selenium example that we find a particular element using an XPATH, which is a way of describing where an element lives in the hierarchy of your page. This looks something like:

button = driver.find_elements_by_xpath('//*[@id="editor"]/table/tbody/tr[3]/td[3]/input')[0]

The XPATH refers to the '//*[@id="editor"]/table/tbody/tr[3]/td[3]/input' which reads as

  • start at the element with an ID of “editor”
  • get the table from inside that element
  • get the tbody inside the table
  • get the 4th tr inside the tbody
  • get the 4th td in the tr
  • get the  input

This assume the pages HTML structure looks like

<div id="editor">
  <table>
    <tbody>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>
        <td>...</td>
        <td>...</td>
        <td>...</td>
        <td>
          <input type="text">
        </td>
      </tr>
    </tbody>
  </table>
</div>

What happens when you change the structure of this page? Even the slightest change (including renaming an ID/class or adding/removing elements) can break your test because Selenium can no longer find what it’s looking for. This is true even if the end result looks identical to a user.

Brittle Tests === Technical Debt

Everybody wants maximum output from their engineers. In order to achieve this, we must remove the overhead of writing and maintaining automation scripts.

UI tests should test that the UI works as expected without making assumptions about the underlying code. By using a DOM-based testing approach, we are tightly coupling our tests to our code, which means code changes will routinely break our tests.

Broken tests mean more time spent diagnosing and fixing the tests instead of building features.

Triaging Results

The number of people equipped to triage results from failed test is limited - sometimes it’s even limited to a single engineer who is responsible for the code being tested and/or the test code. They need to figure out if the test itself is broken, or if the thing being tested is broken. With a DOM-based approach, it’s impossible to disentangle these things without knowing how the test (and the code being tested) actually works.

Impossible Interactions

Automation scripts (and similarly, unit tests) can interact with your UI in ways that an actual human cannot, which leads to false greens/reds. Another way of saying this is “DOM-based tests are flaky.” You might even say that DOM-based tests can’t be trusted.

Consider the following scenario where I’ve introduced a bug:

  1. user opens a popover, which has a semi-transparent overlay that covers the rest of the screen
  2. after closing the popover, the overlay does not disappear like it’s supposed to
  3. user can no longer interact with anything on the page because all clicks are “blocked” by the overlay

If a human tests this, it will trivially catch the bug. In many scenarios, an automation script will not catch this. The bug can be circumvented because automation scripts can target a specific DOM element and dispatch an event to that element. It is not the same as a user moving the mouse over the location where they see the button, and then actually clicking the mouse button.

It’s worth noting that scripts can be written to emulate human behavior (and it is best practice), but it’s always possible to run into these kinds of problems. To reiterate, we want to be as close to reality as possible.

Testing how UIs work in reality

We’ve seen a lot of shortcomings of DOM-based testing, but how do we solve for them?

To recap, we want a solution that

  • emulates the reality of human interaction as close as possible by using real mouse/keyboard inputs on the OS level
  • does not make any assumptions about the code
  • does not require code and/or skilled engineers
  • is not brittle in unwanted ways

DOM-based solutions emulate human input (mouse clicks and key presses) well enough, but Rainforest Automation (RFA) takes it a step further - our automation agents control mouse and keyboard input on the OS level. This has a huge advantage of being able to interact with things outside of your browser. For example, you might want to drag-and-drop a file from your desktop into your browser, or even just work across multiple browser windows. Since RFA works on the OS level, you can test things that are impossible with DOM-based testing - including interacting with desktop applications!

Even more importantly, the key departure from reality is how DOM-based solutions interprets output from the UI. In other words, DOM-based solutions only effectively recreate a one-way flow of data. However, when a human uses an interface, there is a two way data flow. The human interprets information and bases their actions entirely on what they see.

RFA challenges the status quo by using visual matching and text content matching (formally known as OCR). Instead of making assertions based on the state of the DOM, it looks at the appearance of the UI to determine if the app is behaving as expected.

This immediately gives us some huge advantages and allows us to fulfill half our criteria:

  • it emulates reality much better than any QA technology
  • does not make any assumptions about the code - it has zero knowledge about the code of the DOM, just like a human. This is true black-box testing, something that cannot be done by interacting with the DOM.

RFA also provides a simple-to-use interface to write all of your tests with our intuitive domain-specific language. In a nutshell, you write your tests in English and the DSL forces a specific structure that can be interpreted by our magic robots.

The test writing interface has a virtual machine with your app loaded into it, allowing you to interact with your app and take screenshots of your UI. These screenshots are how RFA does visual comparisons while executing tests.

There are a lot of powerful features in RFA, but they boil down to one important point - anybody can write RFA tests, meaning no code and no engineers are required to write or maintain your tests.

This gives your organization the flexibility to decide who should own quality. At Rainforest, engineers are responsible to writing/editing RFA tests related to the feature they are working on because they have the most insight into how the feature is supposed to work, they are intimately familiar with the product (we use Rainforest to test Rainforest since 2013!), and we believe it is the most efficient way for us to operate.

Brittleness

You may have noticed the last criteria has strange verbiage: we want a solution that is not brittle in unwanted ways. This is because we want a certain measure of brittleness - if something small in our app changes, we should expect some tests to break in particular ways.

Recall that by using a DOM-based testing approach, we are tightly coupling our tests to our code. This means code changes can routinely break our tests even if the UI is working perfectly and has no visual changes. In this case, any failures are expected and unwanted. We want our tests to prevents bugs from shipping, not prevent our engineers from shipping quality code.

RFA tests are “brittle” in their own way, but in a manner that is healthy and expected. If your app changes visually, your test will fail. This is of course a tradeoff and a conscious decision to couple your tests to your UI instead of your code, but the upsides bring much more value than that the downsides take away.

In particular, coupling your tests to your UI brings three major advantages:

  • It creates an additional checkpoint before code is shipped to production. RFA will surface visual changes to the specific elements that are the subject of the test, and allow you to confirm that these are intended changes.
  • Failed tests due to intended changes can be updated very quickly and easily by either taking new screenshots or by simply accepting Rainforest’s intelligent recommendations for new screenshots.
  • You can decrease brittleness to avoid failures due to minor style changes by using Rainforest’s content matching feature where appropriate.

If visual matching is superior, why are so many organizations still using DOM-based testing?

The TLDR is that DOM-based automation has been around for much longer. It’s a relatively mature, cheap, and fast to execute. Many QA engineers have been working with industry-standard tools like Selenium for over a decade. It gets the job done, and most people have just accepted its shortcomings and pain points as “the way things are.”

At Rainforest, we believe the standard tools of the QA industry have fallen short of providing developers intuitive and easy-to-use tools that test their applications in a realistic way. In order to deliver a product that can bridge the gap between testing and reality, we’ve built our own virtual machine infrastructure that allows our automation to be run on any browser or operating system.

Rainforest Automation is the next evolution of automated testing. It enables your team to move faster and without breaking things, all while providing superior testing capabilities and decreasing the burden of maintenance.

We’ve taken the many technological advancements in our industry and combined them with our proprietary VM infrastructure, algorithms, crowd-sourced testers/test authors, and many other features and techniques to create a powerful tool that not only bridges the gap between testing and reality, but will produce a paradigm shift in the way organizations think about quality.

What is rainforest?

Rainforest is a unified platform for software testing. Quickly build no code QA tests that can be run with automated or crowd execution. Works across browsers, platforms, and mobile.

TRY RAINFOREST FOR FREE