Why we adopted a coding standard and how we enforce it

Everyone might have their own code formatting preferences. Problems arise when a team consisting of a couple of individuals work on a common code base and every developer has different preferences. It’s hard to maintain pull requests in this situation.

The most ridiculous and unproductive quarrels in my career were about whether we should use tabs or spaces; should we place a curly bracket in the same line or another; should we leave a blank line at the end of file; etc. It’s mostly a matter of a personal preference, however both sides had some interesting arguments.

Once I told my team that we should adopt common code formatting rules for PHP. It was clear to us that we had a mess in our repos. I asked if we wanted to waste our time endlessly discussing every formatting detail. This way we adopted PSR-2 standard.

Of course we couldn’t just reformat all our repos at once. Every one of seven PHP developers already worked on their branches. We had new things merged to develop or master a couple of times per day. This is why we were reformatting our code piece by piece. We were choosing the best moments to do a global reformat in PhpStorm. It was easiest when we knew that only one developer at a time was working on a certain repo or module.

It is important to place a code reformatting operation in a separate commit. You shouldn’t mix formatting changes with functional changes because it makes browsing pull requests difficult. Some Git GUIs can hide whitespace changes but they cannot hide syntax changes like array() to [].

Automatic code checks and analysis

To ensure that our coding standards are met, we set up PHP Code Sniffer in some repos. It verifies that the code complies to the PSR-2 coding standard. After every push to the central repository we get a message about formatting errors. Our testing and analysis tools are automatically launched by the CI (Continuous Integration) tool, for example GitLab, Jenkins, Travis, CircleCI, Bamboo.

You should notice that phpcs by default enforces that no classes remain in the global namespace. It’s good because, believe me, code can grow really fast and it becomes more and more cumbersome to understand the sophisticated code base without proper namespace structure – best done according to PSR-4.

We can try some other static code analysis tools as well: PHPStan, PHPMetrics etc. They do very good job of finding basic and most common code flaws. During the pull requests we can focus on checking advanced business logic because we know that the code formatting and basic code smells were already checked automatically. It’s important especially for dynamically-typed, interpreted languages like PHP where there is no compilation process. Also, PHPStan helps us prepare the code for new PHP versions.

Learn how to count money… or you will lose it

Does anyone know where does the following difference come from?

$ php -r "var_dump((int) (4.20 * 100));"
int(420)

$ php -r "var_dump((int) (4.10 * 100));"
int(409)

It sounds weird when I get a ticket like this: When I set the price to $4.20 everything’s fine. But I cannot set the price to $4.10 because the system shows $4.09. I did some research and discovered that a user entered the price in dollars and then we converted it to cents to store in the database as an integer. That’s where the mistake was made.

Those problems arise from the way that CPU stores floating point numbers. PHP does not have a built-in decimal type, so it uses IEEE-754 standard. In this standard, numbers are stored in a binary system; both the mantissa and the exponent.

You can play with converting different numbers here: IEEE-754 Floating Point Converter. You can see that the number 4.10 in fact is stored as 4.099999904632568359375. When you multiply it by 100, you get around 409.99999. Casting to (int) causes dropping the fractional part (not rounding), so we get the number 409.

How to operate with currencies in PHP? The best way is to use decimal types or dedicated currency classesMoney is a classic example of a value object as a Domain-Driven Design building block.

If I had a dime for every time I’ve seen someone use FLOAT to store currency, I’d have $999.997634 – Bill Karwin

How unit tests help changing existing code

You should have some tests. Why? Because developers are afraid to make changes in a code they don’t understand. This is a common problem not only with fresh employees, but with everyone who stands in front of a huge, complex, legacy system. They are afraid to add a new if not to break the other ones. They are afraid to erase code that seems obsolete.

Instead of refactoring, people tend to add classes and methods with suffixes like _new_2 and so on. I saw the same fear of changes in QA and op teams. When I try to refactor things, I often hear: Oh, maybe not now in case if something breaks, maybe later…

Overcoming the fear of changing an unknown code

I had a problem changing a piece of complex code adding symbols to online transactions. The symbols depended on product types, legal issues, invoices etc. The sales dep coined some weird terms which developers didn’t understand, and those terms were very important for them. I was told to add yet another weird symbol on top of that. The original code looked like this:

public function getDescription()
{
if (\in_array($this->getItemsType(), [self::ORDER_MATERIAL, self::ORDER_MIXED], true)) {
$type = 'DW';
} elseif (self::ORDER_COURSE === $this->getItemsType()) {
$type = 'SzO';
} else {
$type = $this->hasInvoice() ? 'M' : 'P';
}

return sprintf('Order #%u / %s', $this->getId(), $type);
}

After analyzing the original code, I wrote a unit test for all cases:

use PHPUnit\Framework\TestCase;
use Piotr\Blog\Entity\Order;

final class OrderTest extends TestCase
{
/**
* Test data will be provided by ordersProvider().
* @dataProvider ordersProvider
*/

public function testDescription(int $orderId, int $itemsType, bool $hasInvoice, string $expected)
{
$order = new Order($orderId);
$order->setItemsType($itemsType)->setHasInvoice($hasInvoice);
$this->assertEquals($expected, $order->getDescription());
}

public function ordersProvider()
{
return [
[123, Order::ORDER_MULTIMEDIA, false, 'Order #123 / P'],
[123, Order::ORDER_MATERIAL, false, 'Order #123 / DW'],
[123, Order::ORDER_MIXED, false, 'Order #123 / DW'],
[123, Order::ORDER_COURSE, false, 'Order #123 / SzO'],
[123, Order::ORDER_MULTIMEDIA, true, 'Order #123 / M'],
[123, Order::ORDER_MATERIAL, true, 'Order #123 / DW'],
[123, Order::ORDER_MIXED, true, 'Order #123 / DW'],
[123, Order::ORDER_COURSE, true, 'Order #123 / SzO'],
];
}
}

I checked code coverage for this method, by curiosity. It was 100% which made me confident that all lines are executed during the test (however, it might not ensure that all cases are checked). Now I was ready to add another condition to the getDescription() method:

public function getDescription()
{
if (self::ORDER_VIDEO === $this->getItemsType) {
$type = $this->hasInvoice() ? 'W' : 'WP';
}
/* ... */
}

I ran my unit test and received no errors. Success – I didn’t break anything! Since I added some extra lines to my class, the code coverage dropped. I needed to add new test cases:

public function ordersProvider()
{
return [
/* ... */
[123, Order::ORDER_VIDEO, false, 'Order #123 / WP'],
[123, Order::ORDER_VIDEO, true, 'Order #123 / W'],
];
}

Test is passing and my code coverage is 100% again. Now I know that another developer who takes this code over will have a trustful test checking all conditions.

Make sure your business works!

The above example is easy. Unfortunately, in every day job writing unit tests is hard if we need to deal with a poor system architecture, tight coupling and… managers refusing the team to spend extra time writing tests. Moreover, unit tests alone might not be sufficient – you would want integration and UI tests also, and it takes time. But, as Robert C. Martin ironically pointed out once:

Can you imagine telling your users: You know, I don’t write tests. I just write the code. Sometimes it even works. And I ship it to you, and if there are bugs, you’re going to tell me, aren’t you?

Your company might not need a 100% code coverage – often it’s very difficult and even unprofitable (some people claim it’s dangerous). From the business perspective, we should write tests that protect the key business issues first – and that’s usually the domain logic. We want to make sure that none of the business rules that we agreed upon will break after deploying new features. We want to make sure that users will still be able do place orders, and all the orders will be properly accounted.

However if you want to write tests for business rules, then those rules have to be clearly written in the code. This is often cumbersome – and I’m going to tell you about it, some day.

How legacy code is made – part 2

In the previous post, I told you how tight coupling causes systems to be hard to maintain and expand with new features. Today I’m going to show you how developers often take shortcuts which then add up to the technical debt.

I worked on a big e-commerce system for one of the leading Polish education publishers. When I joined the team, there were already many crazy workarounds in the system. I found one particular gem which was really funny and perfectly showcased the absurd of taking ridiculous shortcuts. Let me describe it.

There was a feature for wholesale customers to import a CSV file with all the products they wanted to order. The import script read the CSV line by line and placed every product into the order. Every line started with a product code consisting of several letters and numbers. The Polish alphabet, apart from standard Latin letters, contains 9 letters with diacritics (accents). They are not contained within the basic ASCII set, so at some point the team ran into character encoding issues.

This ticket was found in the issue tracker:

Issue title: Cannot import CSV file with diacritics

Issue description: The import function do not recognize and RŁ2 product codes. Under Windows, CSV files have a default CP-1250 encoding. The database uses UTF-8 encoding. For these two product codes, we need to add special conditions.

Notice that the last sentence of the issue description is a trap; it suggests creating a workaround, some special case in the code. One PHP developer fell into that trap and implemented a fix like this:

while (($csv = fgetcsv($fp, 256, $delimiter)) !== false) {
// for CP-1250 files we convert Ł letter to UTF-8
$csv[0] = str_replace("\xA3", "\xC5\x81", $csv[0]);
// ...
}

Now imagine that we have some product codes with another diacritic letters. What would an average developer do? Add another special case for converting that character? It is really tempting to do just that, then commit the change and forget about it.

This is the problem with workarounds: people easily follow them instead of solving the problem properly. Not everyone has the courage to modify the code they did not write. People think, “If it’s been around here for several months or years, it must be correct. Someone who did this must have known what he or she was doing.”

In this specific example we could simply use PHP’s iconv function to solve the encoding issue properly. So it’s hard to say that anyone has actually made a shortcut here; in fact, the original solution was a much longer walk than it should be.

Don’t rush! You won’t save time!

We all do bad things when working under pressure of deadlines. But often a hurry makes us actually work slower. Trying to take a shortcut and creating workarounds has its consequences which then are stacked to something we call technical debt. Think, research and ask before you code.

New developers who join the team do not know the project and its practices. They just follow examples in the existing code base. This can lead to funny and awkward situations. We should spend more time on leaving good examples for our future coworkers (and ourselves). If we can’t make a good code before the deadline, we should clearly mark our workarounds and schedule some refactoring as soon as possible.

How a legacy code is made

“The (…) problem with legacy assets is how difficult and risky they are to invoke. They were developed with a particular feature-set in mind, and are highly coupled to those features. This makes it hard to access what you want directly without going through layers designed for very different purposes. The data you want is nearly always intertwined with other data and functionality, so that it is hard to detach just the part you want without dragging along an endless tangle of interactions and associations.”

Eric Evans, “Getting started with DDD when surrounded by legacy systems“, 2013

I received an apparently easy task to do. A head of sales wanted to receive a list of all products sold in one of the departments. She expected a table with just three columns: product name, default rebate name and rebated price. Almost every product had some rebate attached to attract clients.

This task should take maybe 15 minutes. Or maybe I should not even do it at all. Salespeople should have a reporting feature for that. However, it turned out that a mechanism to fetch products with prices was strictly tangled with the way these items were presented in the shop. I couldn’t retrieve all the products at once because I was forced to do it page by page (maximum 50 products per page). The rebate system worked only for a currently logged in user and everyone could have different rebates based on the account settings. I could not easily simulate rebates on another accounts. Eventually, preparing a rather simple products summary took three hours of writing a very dirty code…

The shop was custom-made to meet strict specifications oriented towards an ordinary web browser. From an end user’s standpoint, a shop should have a paginated list of products, a shopping cart, rebates info etc. Just some standard e-shop features. No one thought about additional use cases that might show up some day – like salespeople asking for summaries and reports. No one thought that some day we would need to share data through an API endpoint. Deadlines were coming after us. All developers just wanted to get it done according to specs. A legacy, proprietary PHP framework did not make it any easier.

That way, we created lots of code which became “legacy” and inefficient just after half a year past deployment.

Separating code responsibilities into layers

To handle such different requirements, a multi-tier/multi-layer and hexagonal architectures were invented. Sometimes, simple CRUDs shown in framework’s documentation are not enough; complex applications need a clear separation between a business/domain logic, infrastructure and presentation. We create a network of loosely coupled modules which, thanks to interfaces and dependency injection, can be easily switched and replaced. This approach has a lot of pros:

  • Every layer can be, to some extent, developed independently. Because of abstraction, the business logic does not depend on a particular way of presentation, page rendering, data storage etc.
  • Every layer can have a separate set of automatic tests.
  • While editing the code of one layer, we are not distracted by the code of other layers.
  • The same data can be presented in multiple ways. We just attach a different presentation layer implementation (HTML, PDF, JSON API, CLI…).
  • Business rules are clear. We don’t need to look for them in strange places, like templates.

“Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.”

Alistair Cockburn, “Hexagonal architecture

In practice, using a multi-layered, hexagonal architecture with loose coupling needs a lot of experience. I discovered that it’s really hard to convince a team to talk about abstract models. People tend to ask about the details very early, like database, graphics – and there’s nothing wrong about it. People need to imagine the final effect, a real use case – it makes them easier to understand the project.

During project discussions, I suggest not to avoid speaking about implementation. However, we should strive for a transparent and fluent code. It’s worth negotiating some additional time with the client, so that we can create a good project that will save him maintenance costs in the future.