Back to main page Traveling Coderman

Why I dislike test-driven development

I dislike test-driven development. There are better alternatives. A plea for writing tests later.

I think that with every tool and process, there are use cases where they are more beneficial and use cases where they are less beneficial. While test-driven development as a process is beneficial in some use cases, I found it to be counterproductive in the majority of my use cases.

Why test-driven development ๐Ÿ”—

The main goal of test-driven development are shorter feedback loops. I absolutely agree that this is a goal worth pursuing. Coding is a process of continuous learning: About the problem you are trying to solve and the solution you are developing. The shorter the feedback loops, the faster the learning.

Test-driven development consists of three phases: Writing a test, writing the code to satisfy the test expectation and refactoring the code.

My main issue with test-driven development: The additional code of the tests slows down the feedback loops in the refactoring phase instead of accelerating them.

Reasons for slow feedback loops ๐Ÿ”—

In the following sections, let me explain to you where the feedback loops of test-driven development are too slow and what's faster instead: Utilizing the type system and writing tests later.

Signature refactoring ๐Ÿ”—

One of the phases with test-driven development, is the refactoring phase. After a test and the code satisfying the test have been created, the code is improved to be more maintainable while still satisfying the test expectation.

There are two scenarios of a refactoring. If the refactoring does not include a function signature change of the code, then the code can independently be changed and the test can verify that the code is still solving the problem correctly. However, if the refactoring does include a function signature change of the code, then also the test case needs to be changed. This scenario leaves you with at least two places that require changes, which decreases the speed of the feedback loop.

function findBookNamesOfAuthorsWithMultipleBooks(
books: Book[],
Record<BookId, Author>
): BookName[] {
// ...
}

it('returns no book name if there are no books', () => {
expect(
findBookNamesOfAuthorsWithMultipleBooks(
[],
{ '1': 'Goethe' }
)
).toEqual([]);
})

Now you might argue that while it is slower, in the case of no test case there would not be a feedback loop at all. However, there still exists a feedback loop: The type system.

Since the code change includes a function signature change, it also includes changes in the types. As an example, consider the function findBookNamesOfAuthorsWithMultipleBooks being passed two arguments Book[] and Record<BookId, Author>.

If I learn during coding that in multiple places we need a book together with their author information, then I might decide to pass a single argument AuthoredBook[] and aggregate the two data structures before passing them to the function.

function findBookNamesOfAuthorsWithMultipleBooks(
authoredBooks: AuthoredBook[]
): BookName[] {
// ...
}

This signature change causes the compiler to complain at the call side of this function and within the function body itself. Which leads me to react to that fast feedback and make the type system work again.

Note: You might argue that such signature changes are rare. However, the intention for having fast feedback loops is faster learning and hence faster adaption. If such changes are indeed rare, then adaption due to learning and hence fast feedback loops are not necessary at all.

With utilizing the type system, we avoid the additional refactoring in the test cases. Also, a type-system feedback loop is faster than a test feedback loop.

Implementation refactoring ๐Ÿ”—

Another form of refactoring is breaking down one function into two or more smaller functions. In this case, the signature of the function remains the same and hence the test case does not need to be changed.

As an example, consider a function synchronizeBooks that load books from an API and saves them to the database.

async function synchronizeBooks() {
const books = await http
.get<Book>("https://my.url/books")
.then((response) => response.payload ?? []);
for (const book of books) {
await saveBookToDatabase(book);
}
}

The level of detail of the loading does not match the level of detail of the save. Therefore, we decide to extract the loading into a function loadBooksFromAPI.

function loadBooksFromAPI() {
return http
.get<Book>("https://my.url/books")
.then((response) => response.payload ?? []);
}

async function synchronizeBooks() {
const books = await loadBooksFromAPI();
for (const book of books) {
await saveBookToDatabase(book);
}
}

If there is a test for synchronizeBooks, we would not need to change it since the signature has not changed. However, the reason for breaking down a function is that you want to split the complexity. By not touching the tests, the complexity of the single test remains. You continue to test the parted functions as a single unit.

Continuing on that path every test that started as a test of a small unit eventually becomes an integration (or big unit) test. Therefore, not splitting the tests as well as the functions leads to slower feedback loops down the line with many integration tests and fewer small unit tests. For that reason, both not changing the tests and changing the tests lead to slower feedback loops. Hence, I argue, just write the tests once the code is more stable later on.

Unnecessary tests due to improper typing ๐Ÿ”—

Another source of slow feedback loops within test-driven development are unnecessary tests. These tests verify a property that could also have been verified with improved types or an improved type system.

As an example, imagine not using strict mode in TypeScript or, well, using Java.

A function count(book: Book[]): number can be called with count(null). You could write a test for the null behavior. However, if this would be handled by the type system, then the same benefits applied as before.

Let's take a look at a more sophisticated example where the type system can replace a test case without a change in the compiler or a switch to a different language.

The function calculate represents a calculator. It is passed a mathematical operation, either addition or negation. An addition sums two values. A negation negates a single value.

interface MathOperation {
type: "add" | "negate";
value1: number;
value2?: number;
}

function calculate(op: MathOperation): number {
switch (op.type) {
case "add":
return op.value1 + op.value2;
case "negate":
return -op.value1;
}
}

The mathematical operation is imprecisely represented. It covers all valid instances like { type: "add", value1: 5, value2: 7 } and { type: "negate", value1: 11 }. But it also covers invalid instances like { type: "add", value1: 5 } and { type: "negate", value1: 11, value2: 13 }. Because of that, both the code and the tests need to account for that.

it("returns the negation of the first value", () => {
expect(calculate({ type: "negate", value1: 11 })).toEqual(-11);
});

it("ignores the second value", () => {
expect(calculate({ type: "negate", value1: 11, value2: 13 })).toEqual(-11);
});

If we represent the types properly, then the second test case is not required anymore. It would even cause compiler errors.

interface Addition {
type: "add";
leftValue: number;
rightValue: number;
}

interface Negation {
type: "negate";
value: number;
}

type MathOperation = Addition | Negation;

function calculate(op: MathOperation): number {
switch (op.type) {
case "add":
return op.leftValue + op.rightValue;
case "negate":
return -op.value;
}
}

We represent a math operation correctly as either an addition or a negation. An addition always requires exactly two values. A negation always requires exactly one value. The second test is not required anymore.

But what about the first test?

Since the types are now properly represented and more restrictive, the function body is also able to do less and therefore less wrong. Given the operation is a negation, the case block only has access to the parameter op of the restricted type Negation. The value op.type is a string and can therefore not be returned. The value op.value is a number and can be returned. The value op.value is the only realistic return value that the function could return apart from the correct return value -op.value. Everything else would be a visible bug in the code itself without the need of a test case.

Still, even for this one scenario of returning op.value a test case would not be beneficial. The reason for this is that the test case would be of the same complexity as the code itself.

Note: Great function signatures hide complex implementations. Great tests are as simple as the function signature and less complex than the implementation.

it("returns the negation of the value", () => {
expect(calculate({ type: "negate", value: 11 })).toEqual(11);
});

This test case is wrongly implemented. The operator - is missing in the test case. The bug in the test is as likely to occur as the bug in the code. If I still write the test, then it would only be a double check instead of a test. But by that reasoning one could also require that a function is always required to be implemented twice instead of requiring test cases. In great tests bugs are easier to spot than in the code.

Unnecessary tests due to unnecessarily complex code ๐Ÿ”—

Let's take a look at an example where the types are precise but the implementation is unnecessarily complex.

interface Book {
cost: number;
}

function costs(books: Book[]): number[] {
let costs = [];
for (let i = 0; i < books.length; ++i) {
costs.push(books[i].cost);
}
return costs;
}

The expression i < books.length vs. i <= books.length is a common off-by-one bug. Also, zero-based arrays i = 0 vs. i = 1 can be a cause for bugs. We could write a test for it.

it("sums up the total costs of all books", () => {
expect(totalCosts([{ cost: 3 }, { cost: 5 }])).toEqual([3, 5]);
});

Instead, it is a better option to refactor the implementation towards a more declarative style. Such a style would prevent off-by-one errors to occur in the first place.

function costs(books: Book[]): number[] {
return books.map((book) => book.cost);
}

Without index handling in the implementation, there is no need to test the index handling in the tests. With the more declarative implementation, we again reached the point that the test would be as complex as the code.

Where I like test-driven development ๐Ÿ”—

For every tool, there are cases where they are more beneficial and cases where they are less beneficial. For test-driven development, there are also cases where I like to use it.

Reproducing bugs ๐Ÿ”—

I found test-driven development to be particularly beneficial for reproducing bugs.

Assuming a bug occurs in production and I only have a bug report and ideally some log traces. Then first writing a test gives me three advantages:

  • I confirm that my suspicion is indeed the source of the bug.
  • I know when the bug is fixed.
  • The bug is less likely to reappear in the future.

Coding against specification ๐Ÿ”—

Sometimes there are testable properties that you know in advance.

  • You want a GET endpoint to return the same result after two immediate calls.
  • You want a merge of two instances of an unordered data structure to be commutative.
  • You want the sorting of a returned list to be deterministic.

Tests of such properties verify a usually more stable signature of a bigger unit. It can make sense to write these tests in advance.

Conclusion ๐Ÿ”—

Instead of test-driven development I like to refer to type-driven development. Types are a more expressive mechanism with faster feedback loops than tests. Types make the refactoring phase faster than tests. Still, I find test-driven development to be helpful for coding against specification and reproducing bugs.