Björn Geisemeyer

No Comments

Testing legacy code with approval tests

December 20, 2023

Exploring the code widths of the legacy space - Where no test has gone before

In the galaxies of software development, ensuring quality is of paramount importance and at the same time so often neglected. Working in untested and unknown code is as common as it is universally disliked. Who looks forward to legacy code! For years I shuddered at bug fixes or new features that had to be tinkered into an old, messed up, incomprehensible pile of code. Refactorings were part of my everyday life, often enough without the absolutely necessary validation. It is essential to test the code before refactoring. The existing behavior must not be changed. At the same time, writing unit tests for legacy code is a challenge: What do my assertions need to look like? How do I capture all use cases, fail cases and edge cases? How do I find out what the code actually does? What kind of setup do I need for my tests? All of this has to be found out first. Or not? Fortunately, there is an alternative: approval tests.

In the following, I will first describe the procedure. Examples will follow below.

How do approval tests work?

Approval tests are implemented in the code like unit tests. They check the result of a test run against the approved output of a previous run of the same test. This approved output, known by some as Gold Master serves as a reference for future test runs. A test fails if the current result is not identical to the approved result. Following a failure, a diff tool opens, which compares the two results. We can now decide whether the current result meets our expectations or not. If the current result is desired, it is approved and thus becomes the new reference.

Of course, there are frameworks that support this type of testing.ApprovalTests.com for example, offers packages for a variety of languages. For our C# example, we use the framework Verify.

How do approval tests differ from unit tests?

Unit tests define expected results in order to check the behavior of code. The assumption is made that the code delivers the expected result. The code should fulfill a desired behavior. The test fails until a certain behavior is achieved. Approval tests, on the other hand, document the actual state. The existing code itself delivers the result that is subsequently tested. The actual state may well be different from the desired behavior. The test is successful until the existing behavior changes.

Unit tests check specific behavior in isolation. Use cases, fail cases and edge cases are analyzed for a piece of code and at least one test is generated for each case. If a test fails, we know exactly where and for what reason.

Approval tests compare a final result. There is one test that covers the entire behavior. The noticeable disadvantage compared to unit tests is that it is not possible to precisely identify where a change has altered the behavior. Strictly speaking, several cases may even have been violated. In case of doubt, the source of the change must be determined. The obvious advantage of approval tests is to be able to detect changes in behavior that were overlooked even in a detailed unit test suite. Unit tests expect behavior, all cases must be anticipated. Unforeseen behavior is therefore not covered. Approval tests find every change in behavior.

Unit tests are tests that are written from the perspective of developers. They are fine-grained and test the individual components that belong to the implementation of a behavior. Approval tests can be used in many different ways. They can fulfill the purpose of a unit test, but they serve the perspective of users much better. They can be used excellently to test use cases and thus take the place of an acceptance test.

When should we use approval tests?

In an unknown code base, approval tests help us to capture the behavior. A test suite that provides unknown code with approval tests piece by piece creates a documentation of the actual state. These tests allow us to understand how the code works. Michael Feathers describes this process in his book Working effectively with legacy code (In our book recommendations). In an article he calls this approach Characterization Testingbecause the tests reveal the character of the code base.

Complex data poses a challenge for unit tests because they often require a large number of assertions to check a specific result. Data structures such as JSON, XML, HTML, HTTP requests, JPG, PNG, PDF and others can only be evaluated with great effort or no expected result can be generated. Approval tests can be used effectively here and save a lot of effort, or make certain tests possible in the first place.

Refactoring of legacy code

An absolute rule of complex refactoring is: put it under test. Before we refactor even a single line of legacy code, we must ensure that existing behavior is not changed by our refactorings. Unit tests are often not feasible, or only with a complex setup. We also need to know exactly what the code is doing. Approval tests are an excellent alternative here. As a rule, they require a much smaller setup. And we don't even need to know what the code is processing. We can first record the actual state and then look at what happens during refactoring.

The extended pattern: Arrange, Act, Print, Verify

The traditional test pattern "Arrange, Act, Assert" is adapted for approval tests. The Assert wird durch den Schritt Verify which makes the comparison between the two results. Before Verify, an additional step Print supplemented. This step is of crucial importance and fulfills several purposes.

Firstly, we ensure that the comparisons between the current status and the approved initial status work. To do this, we filter the result set. Volatile data such as timestamps or guids cause tests to fail. This data must therefore be removed from the data set to be compared.

Secondly, we should make sure that we get a result set that is as readable as possible for the evaluation of the comparison. After all, we don't want to struggle through the comparison.

Thirdly, we often have to translate result sets into a textual representation before they can be analyzed using diff tools.

If none of this is necessary or useful, we can skip the print step.

Conclusion

Approval tests are good where unit tests tend to fail. Be it in a refactoring context, when checking complex data or when exploring unknown code. They document the current behavior of the code and are able to uncover unforeseen changes. At the same time, they require significantly less effort than unit tests for complex test sets. Don't pull your hair out if you have to put an annoying piece of code under test in the next refactoring. Don't stress if someone asks you to write a test to compare large amounts of data. Don't despair if you get lost in unfamiliar code. Think about approval tests.

Example

The other day in the supermarket...

The template for this approval test demonstration is the https://github.com/emilybache/SupermarketReceipt-Refactoring-Kata by Emily Bache. She provides an excellently comprehensive dojo on the topic of refactoring, both in terms of the number of katas and the supported languages.

My task is to put the code under test in order to prepare a refactoring. I am using the C# version. The starting point is a supermarket software. A look at the project folder shows me the class structures of the production code and a test project with a unit test [Listing 1]. Products, Discounts and Offers can be created. A SupermarketCatalog serves as persistence for products and prices. In the class ShoppingCart products can be stored. The class Plate (cash register) manages quotations and creates a Receipt for a shopping cart. I get all this information from the existing unit test alone, which checks a use case. According to the name, a 10% discount is to be tested. I ignore the production code, the unit test is documentation enough.

The Arrangement guides me legibly through the necessary preparations, in the Act is the central method and in the Assertseven assertions check the result. If we do not know the code, questions remain unanswered. We do not know whether the Receipt is fully checked, the test is green when executed.

				
					[TestCase]
public void TenPercentDiscount()
{
    // ARRANGE
    SupermarketCatalog catalog = new FakeCatalog();
    var toothbrush = new Product("toothbrush", ProductUnit.Each);
    catalog.AddProduct(toothbrush, 0.99);
    var apples = new Product("apples", ProductUnit.Kilo);
    catalog.AddProduct(apples, 1.99);

    var cart = new ShoppingCart();
    cart.AddItemQuantity(apples, 2.5);

    var plate = new Plate(catalog);
    teller.AddSpecialOffer(SpecialOfferType.TenPercentDiscount, toothbrush, 10.0);

    // ACT
    var receipt = teller.ChecksOutArticlesFrom(cart);

    // ASSERT
    Assert.AreEqual(4.975, receipt.GetTotalPrice());
    CollectionAssert.IsEmpty(receipt.GetDiscounts());
    Assert.AreEqual(1, receipt.GetItems().Count);
    var receiptItem = receipt.GetItems()[0];
    Assert.AreEqual(apples, receiptItem.Product);
    Assert.AreEqual(1.99, receiptItem.Price);
    Assert.AreEqual(2.5 * 1.99, receiptItem.TotalPrice);
    Assert.AreEqual(2.5, receiptItem.Quantity);
}

Instead of using a unit test, which requires me to familiarize myself with the code beforehand, I can also use an approval test. It's shorter and I can set it up without having to understand the code. What's more, I don't have to worry about expected results.

With Nuget, we add the package Verify.nunit added. Verify is based on the functionality of unit test frameworks. Therefore, there is a Verify package for Nunit, Xunit, MS Test and for F# users also for Expando. Once we have integrated the package, we can get started.

				
					[Test]
public async Task Printed_receipt_should_not_change_during_refactoring()
{
    // ARRANGE
    SupermarketCatalog catalog = new FakeCatalog();
    var toothbrush = new Product("toothbrush", ProductUnit.Each);
    catalog.AddProduct(toothbrush, 0.99);
    var apples = new Product("apples", ProductUnit.Kilo);
    catalog.AddProduct(apples, 1.99);

    var cart = new ShoppingCart();
    cart.AddItemQuantity(apples, 2.5);

    var plate = new Plate(catalog);
    teller.AddSpecialOffer(SpecialOfferType.TenPercentDiscount, toothbrush, 10.0);

    // ACT
    var receipt = teller.ChecksOutArticlesFrom(cart);

    // VERIFY
    await Verifier.Verify(receipt.ToString());
}

I create a new test. Important to note: Verify works asynchronously. In contrast to the unit test, I do not name the test after a specific test case, but after its current purpose. This test should ensure that our refactoring does not change the existing behavior. The code from the phases Arrangement and Act I simply copy it. I call the third step after the method used Verify. The method expects a string, so I try the simplest version of a string representation of the Receipts. Then I run the test for the first time. The test fails, a VerifyException provides information about this and a diff tool opens. On the left is a file with the format [Testclass].[Testname].received.txt, on the right [Testclass].[Testname].verified.txt.

On the left I have inserted a line with the result of Receipt.ToString(). Logically, everything on the right is empty, because I don't yet have a verified result for comparison. I recognize that the ToString() Method not for Receipthas been overwritten. I try a different approach and close the window.

The print step is added. I need a class that gives me a string representation of the Receipt object. I call them ReceiptPrinter. I would like to receive all information from the Receipt with as little work as possible. So I try to Receipt to serialize it as JSON. If it is structured as a data class, it is a one-liner that firstly returns all the data and secondly even in a reasonably readable format.

				
					internal static class ReceiptPrinter
{
	public static string PrintReceiptAsJson(Receipt receipt)
	{
	    return JsonSerializer.Serialize(receipt);
	}
}

In the Print step, I call up the method.

				
					// ACT
var receipt = teller.ChecksOutArticlesFrom(cart);

// PRINT
var printReceipt = ReceiptPrinter.PrintReceiptJson(receipt);

// VERIFY
await Verifier.Verify(printReceipt);

The diff tool opens when the test is run and only shows empty brackets on the left.

I obviously cannot avoid superficially familiarizing myself with the structure of the Receipt Great to look at. Context menu suggestions show me Get methods, a date property and other methods. That's enough for me. Gain in knowledge: the Receipt class is definitely a candidate for refactoring. The next attempt to create a print version combines string builder and serialization.

				
					private static readonly CultureInfo Culture = CultureInfo.CreateSpecificCulture("de-DE");
        
public static string PrintReceiptAsJson(Receipt receipt)
{
    var builder = new StringBuilder();
    var items = JsonSerializer.Serialize(receipt.GetItems());
    var discounts = JsonSerializer.Serialize(receipt.GetDiscounts());
    var total = receipt.GetTotalPrice().ToString(Culture);
    var stamp = receipt.CheckoutTimestamp.ToString(Culture);

    builder.AppendLine(items);
    builder.AppendLine(discounts);
    builder.AppendLine(total);
    builder.AppendLine(stamp);

    return builder.ToString();
}

Too bad, no more one-liners. Still pretty clear though. I have simply packed the 2 lists with their own data types back into the serializer. The rest of the data is serialized via the ToString() Method formatted. Caution, ToString()is culture-dependent! So I have to decide for everyone ToString() inserts to select an explicit culture. Otherwise, the tests are dependent on the default culture of the operating system and could fail. The result is now impressive.

The Items and Discounts can therefore be serialized. I also see a data class Product. The Discounts are empty. There is a TotalPrice and a date. Now I have the data that the Receipt is available. I can save the result. I transfer the left result to the right, it is saved in the text file with the extension *.verified.txt and now serves as a reference for further runs.

But there is something I don't like. The unit test that I used as a template is called TenPercentDiscount. From this I deduce that such a Discount also in the Receipt should be found again. But I don't see any Discount in the result set. This means, Discounts are not checked at all. Error in the unit test? I look at the arrangement part in the test and see that a SpecialOffer in the Plate was set up for toothbrushes. The assert explicitly checks that the list of discounts is empty. This is at least an unfortunate test name, as it suggests that a discount is being applied. But there is no toothbrush in the shopping cart. So I add four toothbrushes and start the test again.

The diff tool opens and new information appears on the left. The toothbrushes appear in the first line. The discount is displayed. The TotalPrice has been updated, as has the timestamp. I verify, the new reference is saved. Now all the data seems to be together. I want to see the test green, run it again... and the diff tool opens. Logically, the timestamp has been updated. Volatile data slips into an approval test just as it does into a unit test. I decide that this is not an important element to test. There is no logic behind it. Just a call to DateTime.Nowsomewhere in the generation of the Receipts is set. I may exclude the element. I add the PrintReceipt method, replace the timestamp with a placeholder and add Timestamp and TotalPrice nor the text. This is easier to read for anyone who has to evaluate the diff. It is good practice not to simply remove data, but to include placeholders. This makes it clear to everyone that this data has been deliberately removed. Readability and filtering are the tasks of the print step.

				
					public static string PrintReceiptAsJson(Receipt receipt)
    {
        var builder = new StringBuilder();
        var items = JsonSerializer.Serialize(receipt.GetItems());
        var discounts = JsonSerializer.Serialize(receipt.GetDiscounts());
        var total = receipt.GetTotalPrice().ToString(Culture);

        builder.AppendLine(items);
        builder.AppendLine(discounts);
        builder.AppendLine($ "TotalPrice: {total}");
        builder.AppendLine("CheckoutTimestamp: [DateTime]");

        return builder.ToString();
    }

The new reference now looks like this:

This completes this test case. It is executed again green. We have built an approval test that checks the use case. Without knowing the logic. We do not yet have 100% test coverage, but it is only dependent on the arrangement. For example, if we want to check other discounts, we just have to add them. All added elements are output via the printer. Otherwise, no further work is necessary, apart from verification.

This example relates to the checking of complex data types. These can still be checked relatively well, albeit laboriously, with assertions. However, if the results are even more complex, the benefits of approval tests become all the more apparent. Generated image files or PDFs, for example, can be tested very easily. With the help of appropriate tools, a lot of data can be easily converted into a textual form that can be compared. Writing a unit test with assertions for this is difficult and joyless. Supplementing the test suite with approval tests is the sensible alternative.

Our seminars

course

Clean Code Developer Basics

Principles and tests - The seminar is aimed at software developers who are just starting to deal with the topic of software quality. The most important principles and practices of the Clean Code Developer Initiative are taught.

to the seminar "

course

Clean Code Developer Advanced

From requirements to clean code with Flow Design - Get to know a software development process with Flow Design that takes you smoothly from requirements to clean code.

to the seminar "

course

Clean Code Developer Tests

Test Framework - In this seminar, participants learn how to use a test framework.

to the seminar "

course

Clean Code Developer Refactoring

Brownfield instead of greenfield - The seminar is aimed at developers who maintain and extend existing code. You will learn how to improve code quality with refactoring measures.

to the seminar "

course

Clean Code Developer Architect

The Big Picture - The seminar is aimed at experienced developers who are concerned with the question of how to create an appropriate rough structure for a software system.

to the seminar "

course

Clean Code Developer Multiplier

Supporting a team - This seminar is aimed at software developers who want to support their colleagues with their knowledge of Clean Code Developer principles and practices.

to the seminar "

course

Clean Code Developer Trainer

Conducting seminars as a trainer - This seminar is aimed at software developers who would like to pass on their knowledge of Clean Code Developer principles and practices or Flow Design to others as a trainer.

to the seminar "

course

Clean Code Developer CoWorking

Online CoWorking incl. coaching -
We are often asked what developers can do to keep up with the topic of clean code development. Our answer: Meet up with other Clean Code Developers online on a regular weekly basis.

to the seminar "

Testing legacy code with approval tests

How do approval tests work?

How do approval tests differ from unit tests?

When should we use approval tests?

Refactoring of legacy code

The extended pattern: Arrange, Act, Print, Verify

Conclusion

Example

The other day in the supermarket...

Our seminars

Leave a Comment Cancel Reply