My struggles with React Testing Library

Open any software development book, and there is probably a section on testing and why it is essential. Testing is a great feedback tool. If you think about it, it is incredible how we can write code and then write other code to check the initial bit of code. I can't think of many professions out there who can automatically validate and verify their work like us software engineers.

Testing JavaScript functions with tools like Jest is easy. Testing React or other rendering libraries/frameworks is a different beast. React Testing Library (RTL) overtook Enzyme in popularity a few years ago and became the "go-to tool" for testing React apps. I'll be covering some of the issues with RTL later, but I admit it is a massive improvement over the native test kit offered by React team.

In this post, we'll be mainly talking about React variant of the Testing Library. But it's worth pointing out that Testing Library doesn't just support React. It also supports other libraries/frameworks too (even Cypress).

Integration testing or unit testing?

To clarify, my use of the phrase "integration test" here means Cypress test that tests the UI in the browser with mocked responses. When I talk about "unit test", I'm referring to the sort of test you'd write to test a pure JavaScript function.

I struggle with RTL because I can never determine if I'm writing a unit test or integration test. When I write tests with RTL, I'm telling myself these are unit tests. But it always feels like I'm writing integration tests. Things are rendered in a virtual DOM; we then inspect the DOM to find the elements we need. Isn't this basically Cypress?

So naturally, teams start by writing RTL tests for smaller components. Then they write the same sort of test for a higher-level component, which wraps multiple nested components. Then there are page-level or complex components that make API requests or contains side effects. These tests will often render even more nested components in the virtual DOM, mixed with data fetching, side-effects that cause re-rendering.

You might say: "No, that's not right, we should be testing basic components and leave page-level test to E2E tools like Cypress".

Well, just about every team (different companies) I joined in the past 3+ years all decided Cypress tests are too slow, CI pipeline end up taking too long, so let's migrate them to Jest and write them in RTL. I was in those teams, I disagreed with the decision, but I don't blame them. It seems like the right thing to do. Especially when Cypress tests running in parallel were taking over 40 mins to complete.

So should we call the simpler component test "unit tests" and the more complex component "integration tests"? Although this distinction is vague, how could we get all the developers to agree on them? One of the team I worked in even added "integration" into the file name to help to distinguish. Those are the tests that no one wanted to touch since they were slow to run and hard to understand.

Maybe we are doing it all wrong, and we shouldn't be writing integration tests in RTL.

Actually, according to Kent C. Dodds, the creator of RTL, we really should. He challenges the traditional testing pyramid model and promotes the testing trophy in the following tweet.

"The Testing Trophy" 🏆

A general guide for the **return on investment** 🤑 of the different forms of testing with regards to testing JavaScript applications.

- End to end w/ @Cypress_io ⚫️
- Integration & Unit w/ @fbjest 🃏
- Static w/ @flowtype 𝙁 and @geteslint ⬣ pic.twitter.com/kPBC6yVxSA
— Kent C. Dodds (@kentcdodds) February 6, 2018

So we should be writing integration and unit tests in the same format, layout, and with the same tools? Let me know what you think in the comment section. I'm still very undecided about this 😆

Stop mocking children

Another problem with writing tests in RTL for complex components is the amount of mocking required. RTL tests render the tested component in the virtual DOM. Therefore we need to provide all the props expected by the component itself or any nested components. Otherwise, it will error, and your test is likely to fail. Mocking is manageable with smaller or simpler components that either doesn't have much data passed to it or are relatively flat in the tree structure.

I've worked on very data-heavy apps. Complex components often required mocked responses around a few hundred (extreme ones are thousands) lines of JSON. So the standard practice is to copy the real response from the browser Network tab and paste it into a "mock data file". It is probably possible to trim it down a little, but that will be more time-consuming. I've also seen and tried to write test helper functions that generate mocked data, but these often would grow in complexity to a point it is even less maintainable than the static data copied from the browser.

Maybe these are just extreme cases? Not all complex components consume that much data.

Rendering large or complex components in DOM is slow

If mocking isn't a problem you encounter every day, then perhaps slow tests annoy you enough to rethink this.

Complex components tend to have side-effects, e.g. request data with useEffect, when external data comes back, update component state with useState, then re-render to display that data.

One of these components can take up to a few hundred milliseconds to render. Now imagine a modest number of 20 test cases in a test suite; this already adds up to a few seconds. Let's face it, 20 test cases per page level component is a very conservative estimation. If you prefer Test-Driven Development (TDD) or simply want to modify a few tests, it can be painfully slow to wait for the tests to run each time you change anything. I often deal with individual test suites that take 20 - 60 seconds to run (I will cover why in the next section).

I have been utilising Jest's .skip and .only to limit the number the test cases I run when writing tests. I also only run Jest on one file at a time. Otherwise, waiting around for all the test to finish would drive me mad. And then there are the async tests.

Await, waitFor, what are we actually waiting for?

RTL supports async tests. As we mentioned already, it is effectively doing what Cypress is doing, the only difference is that it does it in the virtual DOM. So when the page re-renders after a user interaction or data request, it needs to wait for the page/component to finish re-rendering.

In my experience, when you are testing complex components, you may as well assume something needs waiting on. That's right, start all of the test cases assuming it is async unless proven otherwise.

I've battled with await and waitFor() (RTL's built-in API for waiting for stuff to happen) a lot recently. A few months ago, we increased the timeout to 10 times the default value because unit tests were too flaky. Yeah, those flaky integrations or E2E tests are in our unit tests now. Luckily, after increasing the timeout to 10s, tests seem to be more stable now. But it also means when a test fails, Jest will also wait for 10s before marking it as a failing test. Now we have an interesting scenario if the test takes too long to finish, then probably it is because there are failing tests.

RTL tests are hard to write

It shouldn't shock anyone that Cypress tests are laborious and slow to write because it is annoying to write the different selectors for elements on the page. Now, let's try to do that without seeing and inspecting the page at all. Because there is no way to do that in RTL, the best you can do is use RTL's debug(), which prints out content in the virtual DOM. If you want to be fancy, you can first select a sub-component that contains the element you need, and we can call it wrapperComponent. Then we can do debug(wrapperComponent), which will only print out the selected element and its nested elements.

Recently, I was so stuck with an async RTL test that keeps complaining about how it cannot find the element I'm looking at, but I can see it when I print it out with debug(). I went as far as installing jest-image-snapshot to see what RTL is rendering. I think this is enough evidence to justify why writing an RTL test for complex components is challenging.

RTL tests are harder to maintain

If you think writing RTL tests are complicated, now imagine opening up an RTL test suite someone else wrote and try to fix broken tests because you changed a small thing in a nested component. Hopefully, Jest's error message is helpful enough to spot the issue. Otherwise, there is no other way other than to use RTL debug() to check what is going on. Let's take it a step further, and say you made an app-wide change. Now several complex component test suites are failing. Well then, have fun!