Bad software tests, learn how to catch them

Posted by Kenan Rhoton

Bad tests can, at best, test nothing at all and, at worst, break good code.

It is, therefore, imperative to avoid them.

So what's a bad test? Here are some examples.

The Nothing Test

The easiest example we can ever find of a bad test is the test that tests nothing.

I know, this seems a little far-fetched, but it actually happens more often than you might think.

Take for example this login test:


describe('Authentication', () => {

    it('can successfully log in a user', () => {

        browser.url("https://mypage.com/login");

        $("#username").setValue("Username");

        $("#password").setValue("Password!1234");

        %("#submit-btn").click();

    });

});

This test actually does make sure that the elements of the form work properly, but it fails to test the most important thing for the scenario: that the login actually completes successfully.

It might seem obvious here, but once you start to move code into reusable functions, you might run into problems. For example once a function in a certain scenario is followed by a check but isn't actually checked in a different one:


describe('Authentication', () => {

    it('can successfully log in a user', () => {

        userLogin(data["successful"]);

        checkLogin(data["successful"]);

    });


    // Some test developed a couple months later

    it('can successfully log in as an admin', () => {

        userLogin(data["admin"]);

        // forget to call checkLogin because we asume userLogin already does that

        checkAdminEmailSent();

    });


});

As we can see, even though this example is very simplistic, it's easy to imagine it happening across a long project.

There are basically three solutions to avoid this happening:

Always check explicitly within the test definition
All functions must check after themselves
Mutation testing

The first approach is simple: we assume that any checks not explicitly mentioned in the test definition don't exist, and as such we need to state them there. This is helpful in that it makes whatever we're testing patently clear in the definition itself.

The second approach is more involved: make every single function check after itself. This requires rethinking some of the functions, but if done well guarantees that everything being done is meeting expectations.

The last approach requires an external tool, like Stryker, and it is, perhaps, the most thorough. By making changes to our test code and making sure it fails, we can be fairly certain our tests make sense. It is extremely time-expensive, though.

In the end, choose whatever approach you will, but make certain that every single test you have actually checks what you need it to test.

The Unreadable Test

This type of test "works", in the same sense that you can, with patience, eat a steak with a spoon.

It's mostly fine until, god forbid, somebody needs to change, refactor or expand it, at which point that somebody will lose over 10 hours of his life trying to make sense of the mess.

These sorts of tests usually happen in one of three ways:

Cowboy programming
Brute-force programming
Multiple iterations

The first case is actually the least common: someone just wrote bad code, without following any kind of readability or ordering principles. It happens mostly with inexperienced programmers that don't get their code reviewed or in personal projects that in the future end up being a team effort. The cure is very simple: apply code reviews and remember that "I don't understand this code" is always a valid critique.

The second case is more common: someone wrote good code and it didn't work. Then they slightly modified it and it still didn't work. They modified it further and further and further until it finally did work as intended. And then they forgot to review it, mostly due to being too busy making a victory dance. The cure is, again, simple: code reviews!

The third case is probably how most unreadable tests happen. A test is made to cover X case, and it's beautiful. Two weeks later, the functionality slightly changes and it gets tweaked. A month later, there is a large rework of the application that makes the use case behave quite differently. Eventually, the test is a garbled mess. Once again, it is important to have the code reviewed, but to avoid this problem a review should also focus on the surrounding code to make sure it still remains readable.

So code reviews are the main weapon against readability problems, and we should always strive to review code better.

These are some of the pitfalls we can run into when creating tests, and if we take mind to avoid them, we will all emerge much better developers.