How legacy code is made – part 2

In the previous post, I told you how tight coupling causes systems to be hard to maintain and expand with new features. Today I’m going to show you how developers often take shortcuts which then add up to the technical debt.

I worked on a big e-commerce system for one of the leading Polish education publishers. When I joined the team, there were already many crazy workarounds in the system. I found one particular gem which was really funny and perfectly showcased the absurd of taking ridiculous shortcuts. Let me describe it.

There was a feature for wholesale customers to import a CSV file with all the products they wanted to order. The import script read the CSV line by line and placed every product into the order. Every line started with a product code consisting of several letters and numbers. The Polish alphabet, apart from standard Latin letters, contains 9 letters with diacritics (accents). They are not contained within the basic ASCII set, so at some point the team ran into character encoding issues.

This ticket was found in the issue tracker:

Issue title: Cannot import CSV file with diacritics

Issue description: The import function do not recognize and RŁ2 product codes. Under Windows, CSV files have a default CP-1250 encoding. The database uses UTF-8 encoding. For these two product codes, we need to add special conditions.

Notice that the last sentence of the issue description is a trap; it suggests creating a workaround, some special case in the code. One PHP developer fell into that trap and implemented a fix like this:

while (($csv = fgetcsv($fp, 256, $delimiter)) !== false) {
// for CP-1250 files we convert Ł letter to UTF-8
$csv[0] = str_replace("\xA3", "\xC5\x81", $csv[0]);
// ...

Now imagine that we have some product codes with another diacritic letters. What would an average developer do? Add another special case for converting that character? It is really tempting to do just that, then commit the change and forget about it.

This is the problem with workarounds: people easily follow them instead of solving the problem properly. Not everyone has the courage to modify the code they did not write. People think, “If it’s been around here for several months or years, it must be correct. Someone who did this must have known what he or she was doing.”

In this specific example we could simply use PHP’s iconv function to solve the encoding issue properly. So it’s hard to say that anyone has actually made a shortcut here; in fact, the original solution was a much longer walk than it should be.

Don’t rush! You won’t save time!

We all do bad things when working under pressure of deadlines. But often a hurry makes us actually work slower. Trying to take a shortcut and creating workarounds has its consequences which then are stacked to something we call technical debt. Think, research and ask before you code.

New developers who join the team do not know the project and its practices. They just follow examples in the existing code base. This can lead to funny and awkward situations. We should spend more time on leaving good examples for our future coworkers (and ourselves). If we can’t make a good code before the deadline, we should clearly mark our workarounds and schedule some refactoring as soon as possible.

How a legacy code is made

“The (…) problem with legacy assets is how difficult and risky they are to invoke. They were developed with a particular feature-set in mind, and are highly coupled to those features. This makes it hard to access what you want directly without going through layers designed for very different purposes. The data you want is nearly always intertwined with other data and functionality, so that it is hard to detach just the part you want without dragging along an endless tangle of interactions and associations.”

Eric Evans, “Getting started with DDD when surrounded by legacy systems“, 2013

I received an apparently easy task to do. A head of sales wanted to receive a list of all products sold in one of the departments. She expected a table with just three columns: product name, default rebate name and rebated price. Almost every product had some rebate attached to attract clients.

This task should take maybe 15 minutes. Or maybe I should not even do it at all. Salespeople should have a reporting feature for that. However, it turned out that a mechanism to fetch products with prices was strictly tangled with the way these items were presented in the shop. I couldn’t retrieve all the products at once because I was forced to do it page by page (maximum 50 products per page). The rebate system worked only for a currently logged in user and everyone could have different rebates based on the account settings. I could not easily simulate rebates on another accounts. Eventually, preparing a rather simple products summary took three hours of writing a very dirty code…

The shop was custom-made to meet strict specifications oriented towards an ordinary web browser. From an end user’s standpoint, a shop should have a paginated list of products, a shopping cart, rebates info etc. Just some standard e-shop features. No one thought about additional use cases that might show up some day – like salespeople asking for summaries and reports. No one thought that some day we would need to share data through an API endpoint. Deadlines were coming after us. All developers just wanted to get it done according to specs. A legacy, proprietary PHP framework did not make it any easier.

That way, we created lots of code which became “legacy” and inefficient just after half a year past deployment.

Separating code responsibilities into layers

To handle such different requirements, a multi-tier/multi-layer and hexagonal architectures were invented. Sometimes, simple CRUDs shown in framework’s documentation are not enough; complex applications need a clear separation between a business/domain logic, infrastructure and presentation. We create a network of loosely coupled modules which, thanks to interfaces and dependency injection, can be easily switched and replaced. This approach has a lot of pros:

  • Every layer can be, to some extent, developed independently. Because of abstraction, the business logic does not depend on a particular way of presentation, page rendering, data storage etc.
  • Every layer can have a separate set of automatic tests.
  • While editing the code of one layer, we are not distracted by the code of other layers.
  • The same data can be presented in multiple ways. We just attach a different presentation layer implementation (HTML, PDF, JSON API, CLI…).
  • Business rules are clear. We don’t need to look for them in strange places, like templates.

“Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.”

Alistair Cockburn, “Hexagonal architecture

In practice, using a multi-layered, hexagonal architecture with loose coupling needs a lot of experience. I discovered that it’s really hard to convince a team to talk about abstract models. People tend to ask about the details very early, like database, graphics – and there’s nothing wrong about it. People need to imagine the final effect, a real use case – it makes them easier to understand the project.

During project discussions, I suggest not to avoid speaking about implementation. However, we should strive for a transparent and fluent code. It’s worth negotiating some additional time with the client, so that we can create a good project that will save him maintenance costs in the future.