Clean code in legacy projects

Clean code on a greenfield site is easy. Or not. But clean code in legacy code is definitely not easy. If the code has grown over years or decades, the many violations of the clean code principles cannot simply be eliminated by various refactorings. One reason is the lack of test coverage, the other is the question of meaning.

Lack of test coverage

Typically, test coverage for legacy systems is not particularly good. As the developers have not written automated tests from the outset, the structure of the software does not usually allow tests to be added "just like that". This is mainly due to dependencies and unclear responsibilities of methods and classes. The clean code principles Single Responsibility Principle (SRP), Dependency Inversion Principle (DIP) and Integration Operation Segregation Principle (IOSP) are injured.

Clean Code Developer

The purpose of clean code

Before thinking about clean code in a legacy project and applying refactorings, it is important to realize the following: Clean code is not an end in itself. The four values are always at the forefront:

  • Correctness
  • Changeability
  • Production efficiency
  • Continuous improvement

So when someone in the team comes up with the idea of finally cleaning up and applying the clean code principles, the question arises: Why? What value is to be achieved by introducing clean code through refactoring?

Typically, the values Correctness and Changeability to bear.

Correctness

If the legacy system is struggling massively with bugs, customers slowly become restless and look for alternatives. This can have unpleasant consequences for a company's economic situation. As a rule, the legacy systems are the cash cows, i.e. they contribute the most to the company's revenue. In this situation, it is necessary to complete tests as quickly as possible in order to be able to rectify the errors efficiently.

Changeability

The value of changeability is jeopardized when developers are no longer able to efficiently add new features or make changes. This is usually due to methods and classes that have far too many responsibilities. This results in a structure that is difficult to understand. But before changes can be made, tests must also ensure that nothing is broken by the refactorings.

So why should a team responsible for a legacy system concern itself with clean code? Either bugs need to be fixed or new features need to be implemented.

Current Clean Code trainings

Clean Code Developer Basics | 3 days

Dates of the individual training days:

08.04.2025 / 29.04.2025 / 20.05.2025
Docker basics

Dates of the individual training days:

14.04.2025

We also run all seminars as closed company courses for you.
If you are interested or have any questions please contact us.

Clean code in legacy code

Define the goal

Before starting to add tests or refactor the code, the goal of the measure must be defined. The effort required to make the entire code "pretty" is too high. Consequently, a goal must be defined in order to sensibly limit the testing and refactoring measures to parts of the code. If it is more about bugs, the places in the code where the frequency of errors is particularly high must be identified. A look at the version control helps here. It shows which files are frequently affected by changes. An attempt should be made to achieve good test coverage in the code areas that are most affected by errors or feature requests.

Tests are just beginning

In both cases, troubleshooting or feature requests, the system must be put under test. Only automated tests lead to the required efficiency. If you continue to try to ensure correctness by testing by hand, you remain in the previous pattern. As a rule, problems are not solved by doing (or not doing) the same thing over and over again.

Since the structure of legacy systems usually does not allow isolated unit tests to be written on small sections of the software, the first step is to write Integration tests here. This initially skews the test pyramid because there should actually be more unit tests than integration tests. However, in the legacy case, the most efficient and safest way is to start with integration tests first, as these require no or very few changes to the code. About the terms integration test, unit tests etc. you can find an article here. Furthermore Docker and the TestContainers help the project.

Continuous Integration

The best tests are useless if they are not carried out regularly. On the one hand, this is the responsibility of the developers. On the other hand, it must be ensured that defects are detected as quickly as possible, even if a developer checks their code into version control without running the tests first.

Here helps Continuous Integration (CI). If you have not already done so, a CI process should definitely be set up at this point. With tools such as GitHub or GitLab, this can be done with manageable effort. As integration tests were primarily created at this stage, the challenge in the CI process lies in executing the integration tests on the CI server. On the one hand, they take longer to run, on the other hand, they require external dependencies such as databases or similar. This is where the use of Docker helps, as this is also supported on the CI server. I have also prepared written elsewhere.

Supplement unit tests

Since integration tests can ensure the correctness of a small section of the software, the structure in this area should be improved with the aim of being able to supplement isolated unit tests. But beware: the test coverage is not high enough to refactor manually. In this phase, it is imperative to focus primarily on Tool-supported refactorings to leave. They ensure that the semantics are retained. Useful refactorings that can be used to prepare the introduction of unit tests are, for example

  • Extract Method
  • Introduce Variable
  • Introduce parameters

After a code snippet from a larger method with Extract Method was extracted, this new method can be tested in isolation.

Improve the structure

With increasing test coverage, more extensive refactorings are now also possible, which cannot always be carried out with full tool support. The aim here is to structure the code in such a way that it is easier to understand. This is where the Single Responsibility Principle (SRP) for use. When responsibilities are shifted, the topic of dependencies inevitably comes up. A first step was previously the Extract Method refactorings. Now come the principles Dependency Inversion Principle (DIP) and integration Operation Segregation Principle (IOSP) for use.

Through Dependency inversion and Dependency Injection dependencies can be replaced by dummies in the test. However, this should only be used for external resources in order to be able to replace them occasionally in the test.

The IOSP should be used for the entire domain logic.

Don't Repeat Yourself (DRY)

Eliminate duplications

Now that the test coverage in the critical area of the software has been increased, the topic of Don't Repeat Yourself (DRY) be tackled. Duplications in the code are inefficient, as in case of doubt several places have to be changed. Above all, however, the question arises as to whether there is actually a simple duplication or whether there are differences in the copied code. This archaeological work is also completely inefficient. When in doubt, it is therefore sometimes better to live with duplications.

Before proceeding to eliminate a duplication in the code, it must be clarified whether the two identical or very similar code parts will actually change for the same reasons.

Sometimes the same code exists, but fulfills different requirements that can develop separately. If it is not the same requirement, the code should remain separate.

Helpful refactorings in the context of DRY violations are above all Extract Method and Introduce parameters. First, the duplication is extracted into a separate method using Extract Method. Then, by introducing parameters with Introduce Parameter, an attempt can be made to work out the differences in the duplications via parameters. This often makes it possible to eliminate the duplication, even though it is not an exact copy.

Conclusion

Refactorings of existing code must be carried out with care. The Scout rule can be applied at any time. However, this only includes tool-supported refactorings that can be carried out in just a few minutes. Don't forget to commit! Each of these small refactorings must be traceable in version control and, above all, reversible if something has broken.

The process in brief:

  • Define the goal
  • Selection of the affected areas, e.g. via the change frequency
  • Supplementing integration tests to set up an initial safety net
  • Set up Continuous Integration (CI) if not already in place
  • Tool-supported refactorings with the aim of being able to supplement unit tests
  • Structural changes to dependencies and introduction of DIP and IOSP
  • Eliminate duplications
  • Every complex refactoring (see e.g. https://refactoring.com/catalog/) must have a goal

This process is not set in stone. It is intended to illustrate that refactoring legacy code requires an orderly approach.

We are happy to support you with our training courses for such a project!

Our seminars

course
Clean Code Developer Basics

Principles and tests - The seminar is aimed at software developers who are just starting to deal with the topic of software quality. The most important principles and practices of the Clean Code Developer Initiative are taught.

to the seminar "
course
Clean Code Developer Trainer

Conducting seminars as a trainer - This seminar is aimed at software developers who would like to pass on their knowledge of Clean Code Developer principles and practices or Flow Design to others as a trainer.

to the seminar "

Leave a Comment

Your email address will not be published. Required fields are marked *

en_USEnglish