There are some limits to the YAGNI principle

Have you heard of YAGNI? The acronym stands for “You Ain’t Gonna Need It“. The original meaning, as Extreme Programming guru Ron Jeffries said, was to “implement things when you actually need them, never when you just foresee that you need them”. But how did this principle evolve over time?

Since the late 90’s we have witnessed a lot of frameworks and tools emerge. We love new shiny tools, don’t we? Sometimes we think that a new framework will solve all our problems. That’s what the “Hype Driven Development” article described.

So then people started to talk: okay, maybe we should care more about the business, not just our fun? Maybe we don’t need so many tools and libraries everyone else is hyped about? Maybe we should not do a big rewrite of our systems every six months after another tech conference?

Don’t do everything by yourself

If using the latest hyped tool for every task is one extreme, then doing everything by yourself is another one. The latter option was proposed by some guy conducting a training on microservices (another hype term by the way) for my company.

I was in a team using PHP and we wanted to know something more about developing proper microservices architecture. We did a lot of analysis of our current system and decided this might be a chance. What we heard during the training was:

  • Our coach would never use Symfony for a microservice because “it’s huge and will cause performance issues”. That was a couple of months before Symfony 4 was released, but the same guy argued that “computing power is cheap and we shouldn’t care”. Umm…
  • Our coach would never use Doctrine for a microservice because “it’s huge and will cause performance issues”.

Instead of showing us some alternatives, the coach asked us to write an example application in vanilla PHP.

By the time the training was conducted, I had already spent years working on home-brew “frameworks”, made by people who did not believe in the popular systems and desperately wanted to do things their own way. Such people tend to leave the company a couple of years later, overwhelmed by all the issues caused by their “ingenious” frameworks.

Pick tools carefully, get to know them well

So you’re afraid of incorporating third-party code into your project? Cool, it means you’re a responsible developer. But before you turn down popular and well-tested solutions, get to know them well. How much can you fine-tune them? How much can they be customized and stripped of unnecessary features?

Back to PHP stuff, Symfony is highly customizable especially since version 4. Symfony and Doctrine tend to cache a lot of things. You can also optimize Composer’s autoloader.

Don’t optimize prematurely

Performance can be easily improved. But business logic can be a real monster.

After a couple of years working on big, sophisticated projects I realized that performance is a secondary issue. The biggest challenge for me has been always dealing with complex business logic, huge number of moving parts dependent on each other, unclear rules etc. You need good tools to model and test that logic.

So what you shortened page loading time from 300 to 100 milliseconds if your teammates do not understand the crazy optimizations you just made? What benefit do you have after forcing your team to use weird tools they will complain about?

You need to balance these things out.

Defensive coding: Make local objects final

In the very first article on defensive coding we talked about avoiding mutability.

Let’s talk about JavaScript for a while. Every modern tutorial says that you can instantiate new objects either with let or const. You use let if you intend to create a variable (like with an obsolete var keyword), or you can use const to indicate that an object – once initialized – should remain constant. This gives developers great clarity and saves them from mistaken mutations.

What about Java? You don’t have a const keyword, but you have final. I love using it and many developers in my current project also use final wherever feasible. You can mark properties, local objects and method arguments as final. There are very rare cases when we need to mutate an object, so must of the time I end up having final everywhere:

public BigDecimal getTotalPrice(final Product product,
                                final BigInteger quantity) {
    final BigDecimal totalNetPrice = product
             .getUnitPrice()
             .multiply(quantity);

    return totalNetPrice.multiply(product.getTaxRate());
}

Why use final everywhere? It can look strange in the beginning, but it protects us from bugs. How many times have you seen robust methods spanning multiple screens, with vague and similar variable names? How many times have you mistaken foo, oldFoo, databaseFoo, foo1 etc.?

In PHP, the final keyword can be applied only to classes. You can’t create local constants, only class constants with const keyword. It’s even worse that you don’t clearly distingish first object instantiation from subsequent instantations using the same variable name. This code is perfectly valid in PHP:

$builder = new ProductBuilder();
$builder = new UserBuilder();

Picking a PHP tool to read and manipulate PDF files

In the previous article I described several tools that can be used together with PHP to create PDF files. Back then, the choice was not easy and we had a lot of criteria to consider while picking the best tool. Today we will browse possibilities to read and edit existing PDF files.

Native PHP libraries

Again, we will start from checking if there are any PHP libraries to manipulate PDF files without depending on external binary tools.

pdfparser

There is an interesting library called smalot/pdfparser. It has almost 1000 stars on GitHub. It utilizes TCPDF Parser to parse a PDF file into an array of document objects which is further processed to get what we need.

The library is convenient as it supports both parsing an existing file or a string with PDF data. It allows you to extract metadata and plain text from a document. You can test the library at its demo page.

The problem is that Sebastien’s library is based on old TCPDF version 6 parser which some day is going to be replaced by a newer rewrite called tc-lib-pdf-parser. However, that new parser is still under development and Sebastien’s is aware of its existence.

smalot/pdfparser has commercial support from Actualys.

FPDI

I got familiar with this library when I received a bug report for a watermarking module in some e-book system. The module received a PDF, parsed it using FPDI, generated a watermark with FPDF and stamped it over all pages.

The problem is that the free version of FPDI supports only PDF version 1.4 and below. To support higher document versions, you have to buy a full library. And that’s what the bug report was about. We decided to switch to another tool, pdftk, which is described below.

Command-line tools

The first command-line tool I played with was pdftk. I used it to join separate documents into one, apply watermarks and extract basic data, like a number of pages. It supports all PDF formats unlike FPDI library. The only thing that’s missing is a text extraction feature.

The need to extract plain text from a document led me to the Apache PDFBox library. It is written in Java and, as I described before, it offers some very nice features. However, in the PHP world we can only access a CLI wrapper for that library which has a limited set of options.

Later I discovered the Poppler library, which is said to fully support the ISO 32000-1 standard for PDF. This C++ library can be accessed via dedicated CLI tools – poppler-utils, which we can run from PHP. For example, the pdftotext tool gives a lot of control over the plain text dump – you can even preserve a proper document layout while rendering, or crop the document to a specified region. Also, pdfinfo provides comprehensive information about a file, like page format, encryption type etc. You can use it to extract JavaScript too.

Sometimes you might want to create a PNG or JPEG screenshot of a document. You can do it with pdftocairo from Poppler, or use ImageMagick’s convert.

Wrappers

For pdftk, check out this library: mikehaertl/php-pdftk.

PDFBox CLI can be accessed via schmengler/PdfBox.

Imagemagick and Ghostscript are the basis for spatie/pdf-to-image wrapper.

Poppler has several PHP wrapper libraries:

  • spatie/pdf-to-text only allows to extract text from a PDF. It requires an input PDF to exist in the file system. The library does not wrap additional input arguments, so you have to specify them manually.
  • ncjoes/poppler-php: a library supposed to wrap all poppler-utils, but at the moment pdftotext is still unsupported. Also, this library is not very convenient as it forces you to choose an output directory for a file (it does not return processed data as string).

In fact, these two libraries are wrappers to a wrapper, since poppler-utils are just a collection of CLI wrappers for the Poppler C++ library 😉

Which to pick? Native or CLI?

There are a couple of basic considerations.

Native PHP libraries should work independently from the host environment. They are a lot easier to set up and update. The only depedency tool you use is Composer.

CLI tools, especially these written in C/C++, might be faster and use less memory. However I don’t have strict evidence at the moment. Maybe all the optimizations that came with PHP 7 will make this point obsolete. Also, I believe that C/C++ tools have a wider audience and thus might receive more community support.

You should pick a tool that’s best for your specific requirements. Most tools will do a decent job while simply rendering an unencrypted PDF to an image or some plain text. But if you need to have more control on the output file structure or you want to process encrypted documents, poppler-utils will be a good choice.

Sometimes it occurs to me that many developers are just reinventing the wheel, especially when it comes to a multitude of PDF processing libraries for PHP. The Portable Document Format has almost seven hundred pages of specification. We are all struggling with the same processing issues. That’s why I rather prefer to choose the best tools in different technologies and connect them with interfaces rather than doggedly sticking to a single technology.

Check out the List of PDF software at Wikipedia.

Picking a PHP tool to generate PDFs

I spent a lot of time working with different tools to generate PDF files, mainly invoices and reports. Some of these documents were really sophisticated, including multi-page tables, colorful charts, headers and footers.

I know how hard it is to choose between a multitude of libraries and tools, especially when we need to do a non-trivial job. There is no silver bullet; some tools are better for certain jobs and not so good for other jobs. I will try to sum up what I’ve learned through the years.

Two ways of creating a PDF file

A PDF file contains a set of objects which make a document, like pieces of text, images, lines, forms, fonts, and so on. So creating a PDF is an act of putting all these pieces together in a proper order and layout. Most objects utilize a subset of PostScript commands, so you can even write them in your text editor.

One way is to create these objects “by hand”: we add every line of text separately. We draw all tables manually, calculating cell widths and spacings on our own. We must know when to split longer contents into multiple pages. This approach requires a lot of manual work and very good programming skills, so we don’t end up with a spaghetti code where it is hard to find any meaningful logic between all the drawing commands.

Another way is to convert one document, for example HTML, LaTeX or PostScript into PDF.

We used LaTeX for an education app which allowed composing tests for students from existing exercises prepared by professionals. Since LaTeX was a primary tool for our editors, it was natural for us to convert their scripts straight to PDF.

Converting HTML to PDF is way more complex as today’s web standards are having more and more features, just to mention CSS Flexbox or Grid layouts. Let’s see what we can do.

Native PHP libraries

My first experience was with native PHP libraries where you had to do most things by hand, like placing text in proper positions line by line, drawing rectangles, calculating table cells, and so on. It was quite fun at the time, but creating more robust documents turned out to be very hard. We used FPDF and ZendPdf libraries (the latter is discontinued).

At some point, I ended up maintaining multiple-page, sophisticated school reports with tables and charts rendered by ZendPdf. Business wanted to add even more types of reports. I decided to rewrite all reports as HTML documents with stylesheets and then try to make PDFs from that.

There are three PHP libraries capable of parsing HTML/CSS and transforming that to PDF:

Rendering HTML and CSS certainly isn’t easy. Modern browser engines are huge projects and I can’t imagine a fully functional rendering engine written in pure PHP. So you cannot expect these libraries to provide the same output you’re seeing in Firefox or Chrome. However, for simple layouts and formatting they should be enough. Plus is that you still do not depend on any external tools – just plain PHP!

To give you some idea of what to expect from above libraries, I compiled a comparison of an invoice renderings. These three pictures are made from the same HTML 5 source which utilizes CSS Flexbox to position “Seller” and “Buyer” sections next to each other. It has also some table formatting:

Google Chrome (reference image)
TCPDF

mPDF
Dompdf

As you can see, none of the PHP libraries understood CSS Flexbox. mPDF and TCPDF had some problems with painting the table. Dompdf performed the best and I’m pretty sure that making the “Seller” and “Buyer” sections the old-school way, like float or <table> would be enough to have a proper result.

External CLI tools

Native PHP solutions were not enough for me, so I decided to use an external tool backed by a fully functional, WebKit rendering engine. My employer was already using wkhtmltopdf which supports everything I needed: SVG images, multi-page tables, headers and footers with page numbers and section names, automatic bookmarks. Having old reports rewritten to HTML and CSS, I was able to implement all the new features requested by the business.

wkhtmltopdf certainly isn’t bug-free; for example, I had some issues with repeating table headers on consecutive pages. Also, upgrading from 0.12.3 to 0.12.4 broke my document layout which used dynamic headers and footers, so I had to go back to the old version.

Then I got familiar with PhantomJS, which was used mainly to conduct automatic browser tests in a headless mode (without the browser window). It could also capture PNG and PDF screenshots. PhantomJS used a newer version of the WebKit engine. However, the project is suspended now.

Almost a year before the suspension of PhantomJS, Google announced that Chrome can run in a headless mode from version 59. This means you can utilize the latest Blink rendering engine to convert HTML/CSS to PDF from your command line. This is perfect for rendering really complex documents utilizing latest web standards. The document looks exactly the same in your browser and in the final PDF file which makes development a lot easier.

Connecting PHP with external tools

The easiest way would be to execute an external tool as a shell command. You can do it with PHP functions like shell_exec or proc_open, but it’s not very convenient.

I recommend using symfony/process library and utilize standard streams whenever applicable. A process should accept input HTML through STDIN and send the resulting PDF via STDOUT. It can also produce some errors through STDERR. It might turn out that you won’t need any temporary files to do the job.

There are also several wrapper libraries, like phpwkhtmltopdf or KnpLabs/snappy.

For Chrome, consider using Browserless. You can choose between a free Docker image with pre-configured Chrome with dependencies, or a paid SaaS platform to convert your HTMLs to PDF. With the Docker image, it is really easy to send HTML and receive PDF via HTTP.

Conclusion

There is a wide choice of PHP libraries and external tools which can be used to dynamically create PDF files. You should choose a combination which suits your business needs. For simple documents, you don’t need a complex rendering engine. Save disk space, CPU and RAM!

Please also remember that many tools are developed by the Open Source community and receive little commercial support. They can be abandoned at any time or they might not support newest PHP version from day one (which can impede migrating the rest of your app). And your dependencies have dependencies too, so take a look at composer.json when picking a library.

And if your favorite Open Source tool does not do everything you need properly – maybe try contributing? It’s a community, after all.

Do we still need recruitment agencies?

Article originally published on dev.to

As a candidate looking for jobs, I’ve never cooperated with any recruitment agencies. But as a senior developer responsible for tech interviews, I was forced to work with some HR companies and I had some really weird situations because of that. Sometimes it’s annoying, sometimes it just makes me laugh. Anyway, due to my bad experiences it’s hard for me to find any reason to pay commission to an HR/talent agency. I’ve always disliked “men-in-the-middle”. Yet, many companies seek talents through agencies in addition to their own headhunting struggles.

Let me briefly explain a standard recruitment process we developed in our company. We didn’t have an HR department, so the whole process was led by me and CTO. First, I prepared a job offer which would reflect our current needs. Then, the CTO posted that offer on various platforms. We received different resumes and reviewed them. If you did not have a meaningful GitHub profile, we usually asked to do our simple recruitment task: write a PHP shopping cart implementation for existing unit tests; kind of TDD, most people solved it under 4 hours. We also sent some technical questions to see if we don’t waste your (and our) time meeting you at the office. Especially if you lived several hundred miles away. If we liked your answers and the solution to our task, we would invite you to a personal meeting (around 1.5 hour) consisting of a soft interview and tech interview. Finally, you could receive an offer from us… and turn it down if you did not like it.

Simple, isn’t it? Just two parties making business with each other. We are the client and we buy services that you provide. We negotiate the deal and if everyone’s happy, just tell us when you can start. The description I mentioned is generic; we tried to approach every candidate individually according to the provided materials, proven experience and a lot of other factors. You didn’t have to solve our coding task if you found another way to show off. Fun fact: we’ve never cared about your formal education.

Unfortunately the resumes we received from job board users were not enough and because the company did not want to invest its money in greater headhunting endeavors like going to conferences or recruiting a full-time HR guy, well… it decided to use some help from external HR agencies. We would present our needs to the agency and it was supposed to find right people for the job. We received a written recommendation for every candidate. After successfully hiring an employee, the company paid commission to the agency.

How recruiting with agencies went wrong

I wouldn’t moan if those companies did their job well, but what I experienced instead was:

  1. The recommendations were generic and useless. Every candidate was described as a positive, pro-active team player who aims for personal development. Blah, blah, blah. If you stripped that BS, you would find out that you’re dealing with a mediocre coder with a boring portfolio. Both the company and the candidate wasted time talking to the agency because one way or another, we had to figure out most things for ourselves. The most interesting facts were those off the record.
  2. One agency used to save these recommendations as DOCX files. I work on Ubuntu and LibreOffice did not properly render those files, throwing some contents away from the page. I had to switch to Windows, launch Microsoft Word, prepare myself a PDF file and switch back. Eventually, the HR guys learned how to create PDF themselves. What a relief.
  3. The same agency stripped my recruitment task. It originally was a GitHub repo which consisted of some important files in the root directory (composer.json, docker-compose.yml, phpunit.xml.dist and – most important – README.md!) and a tests directory. You had to write a simple implementation which passed all of my tests. I surprisingly found out that all candidates sent by that agency rewrote composer.json on their own. I asked them: “Why would you do that? The whole autoloader was defined there!.” It turned out that the agency sent the candidates only the tests directory. They did not send the full repo, of course – the candidate could find out what company made that task. The target company’s name is top secret in the beginning – and I recklessly used it as a vendor part of PSR-4 namespaces.
  4. The situation with missing composer.json involved also one candidate which was so tired of endless meetings with an agency that – after learning what company is he applying to – he declined to cooperate with the agency and sent his resume directly to us. I didn’t know that story and I was surprised to see that he already solved our task.
  5. Speaking of endless meetings – that’s the thing I always hear if someone decides to share his or her experience with an HR agency. When you see a fancy job ad like “For our client, a market leader…” and you send your resume, you’re invited to an entry interview in the agency HQ. Then you receive our task to do at home. Then they invite you to another meeting… but not with us! If that meeting succeeds, you eventually make your way to our facility. You go there and probably hear the same questions you’ve already answered in the previous meetings, because the incompetent agency did not properly sum up your answers and we don’t have enough data about you.
  6. Sometimes we don’t receive a full resume, but only a solved task (kind of a blind recruitment). Sometimes we receive a resume with a surname washed out. Sometimes we receive only the recommendation and not the original resume.
  7. Sometimes a candidate is forced to visit the target company together with a guy from HR, who ensures that we don’t make a deal behind their back and we don’t abuse you. To me it’s like coming to an interview with your dad! He’s a man-in-the-middle. It’s not a deal between an employee and an employer. Fortunately, your “dad” stays only for a “soft interview” and he walks away during the tech interview (which I lead). He goes for a coffee and waits there until we’re done. So basically he gets paid for drinking coffee in our office.
  8. It’s funny to see how an extravert, upbeat HR guy brings a terrified candidate to the meeting. At the first glance I would hire the HR guy, not a shy, scared dev.
  9. My coworker received an offer for my position from an agency when I decided to leave the company.
  10. My boss received an offer from an agency to hire a colleague who was just about to leave the company.
  11. When you get hired and work for a while, the agency calls you from time to time to ask how is it going. That’s what my coworkers told me. As an employee, I would find this annoying.

I have no way to convince my soon-to-be-ex employer to stop wasting money with HR agencies (or invest in the right one). But I still wonder why good devs in Poland respond to the job ads posted by agencies where you don’t have a target company’s name disclosed. The salary range is not specified either. If I invest my time talking to the HR guy, I would like to know in advance who am I going to work for and what salary I can expect!

Carving my own path

Like I said in the beginning, I have never looked for a job through an agency. I have some basic googling skills, but let me show you a brief story of my career which provides some more tips:

  1. I met my first employer during high school. It was a local news company. My friend had already worked for them and he asked if I could make some photos at local events. Still being a student, I started my humble photography career in my spare time. After my high school exams (Polish matura) and before going to university, I got a permanent job offer. They discovered I can code PHP for food, so they hired me to work on their new website during the day AND make photos in the evenings. Weird setup, but this job gave me a lot of life experience.
  2. After five years, I met my future girlfriend. I decided to relocate to Gdańsk which is ~300 km away from my hometown. I started browsing pracuj.pl, a Polish job board and soon I found an offer from an education company. I sent my resume, did a recruitment task and after three weeks I got an offer! I’ve been working there for over four years. During that time, my salary has doubled and I went from a mid developer to a team leader.
  3. Me and my girlfriend decided to change our lifestyles and travel more. I needed a full remote job. I remember meeting a nice software house at a PHPers conference in Poznań. I sent my resume. Unfortunately they just stopped their recruitment process at a time. But after a month I received an e-mail inviting me to a new process, which I passed successfully. I had three interviews via phone and Skype: entry, tech and soft skills. I really liked all those interviews. One of my interviewers said he saw my PHPers presentation about database optimization and he already knew my name. I got the offer with a salary which allows me to live decently, buy more music equipment (oh yeah!) and even save money for retirement.

As you can see, my ~10 years career as a developer did not include any deals with any recruitment agencies. It is important to say that I’m not an easy-going, upbeat and extravert person. I’m not good at getting my foot in the door, but at least I’ve learned how to create my resume properly, find a possible employer and make him interested in my offer. I’ve spent a lot of time browsing Polish nofluffjobs board where all the offers are plain and simple, with salary specified upfront.

What’s more to do? I guess I should visit more conferences and give more talks, so that people will know my name. I should write more blog posts and possibly contribute to Open Source. That way I’m going to develop my personal brand and hopefully do my business stuff without the HR agencies.

If you work in an HR/talent agency, please improve your skills. I know that IT headhunting is very hard and it might be really frustrating to abuse LinkedIn for the whole day just to receive a bunch of rude replies (or no replies at all). But if you want to do a good job as a headhunter, you need to understand how candidates and companies behave and what they really need. We’re all here to make business happen, right?

Why we adopted a coding standard and how we enforce it

Everyone might have their own code formatting preferences. Problems arise when a team consisting of a couple of individuals work on a common code base and every developer has different preferences. It’s hard to maintain pull requests in this situation.

The most ridiculous and unproductive quarrels in my career were about whether we should use tabs or spaces; should we place a curly bracket in the same line or another; should we leave a blank line at the end of file; etc. It’s mostly a matter of a personal preference, however both sides had some interesting arguments.

Once I told my team that we should adopt common code formatting rules for PHP. It was clear to us that we had a mess in our repos. I asked if we wanted to waste our time endlessly discussing every formatting detail. This way we adopted PSR-2 standard.

Of course we couldn’t just reformat all our repos at once. Every one of seven PHP developers already worked on their branches. We had new things merged to develop or master a couple of times per day. This is why we were reformatting our code piece by piece. We were choosing the best moments to do a global reformat in PhpStorm. It was easiest when we knew that only one developer at a time was working on a certain repo or module.

It is important to place a code reformatting operation in a separate commit. You shouldn’t mix formatting changes with functional changes because it makes browsing pull requests difficult. Some Git GUIs can hide whitespace changes but they cannot hide syntax changes like array() to [].

Automatic code checks and analysis

To ensure that our coding standards are met, we set up PHP Code Sniffer in some repos. It verifies that the code complies to the PSR-2 coding standard. After every push to the central repository we get a message about formatting errors. Our testing and analysis tools are automatically launched by the CI (Continuous Integration) tool, for example GitLab, Jenkins, Travis, CircleCI, Bamboo.

You should notice that phpcs by default enforces that no classes remain in the global namespace. It’s good because, believe me, code can grow really fast and it becomes more and more cumbersome to understand the sophisticated code base without proper namespace structure – best done according to PSR-4.

We can try some other static code analysis tools as well: PHPStan, PHPMetrics etc. They do very good job of finding basic and most common code flaws. During the pull requests we can focus on checking advanced business logic because we know that the code formatting and basic code smells were already checked automatically. It’s important especially for dynamically-typed, interpreted languages like PHP where there is no compilation process. Also, PHPStan helps us prepare the code for new PHP versions.

Learn how to count money… or you will lose it

Does anyone know where does the following difference come from?

$ php -r "var_dump((int) (4.20 * 100));"
int(420)

$ php -r "var_dump((int) (4.10 * 100));"
int(409)

It sounds weird when I get a ticket like this: When I set the price to $4.20 everything’s fine. But I cannot set the price to $4.10 because the system shows $4.09. I did some research and discovered that a user entered the price in dollars and then we converted it to cents to store in the database as an integer. That’s where the mistake was made.

Those problems arise from the way that CPU stores floating point numbers. PHP does not have a built-in decimal type, so it uses IEEE-754 standard. In this standard, numbers are stored in a binary system; both the mantissa and the exponent.

You can play with converting different numbers here: IEEE-754 Floating Point Converter. You can see that the number 4.10 in fact is stored as 4.099999904632568359375. When you multiply it by 100, you get around 409.99999. Casting to (int) causes dropping the fractional part (not rounding), so we get the number 409.

How to operate with currencies in PHP? The best way is to use decimal types or dedicated currency classesMoney is a classic example of a value object as a Domain-Driven Design building block.

If I had a dime for every time I’ve seen someone use FLOAT to store currency, I’d have $999.997634 – Bill Karwin

How unit tests help changing existing code

You should have some tests. Why? Because developers are afraid to make changes in a code they don’t understand. This is a common problem not only with fresh employees, but with everyone who stands in front of a huge, complex, legacy system. They are afraid to add a new if not to break the other ones. They are afraid to erase code that seems obsolete.

Instead of refactoring, people tend to add classes and methods with suffixes like _new_2 and so on. I saw the same fear of changes in QA and op teams. When I try to refactor things, I often hear: Oh, maybe not now in case if something breaks, maybe later…

Overcoming the fear of changing an unknown code

I had a problem changing a piece of complex code adding symbols to online transactions. The symbols depended on product types, legal issues, invoices etc. The sales dep coined some weird terms which developers didn’t understand, and those terms were very important for them. I was told to add yet another weird symbol on top of that. The original code looked like this:

public function getDescription()
{
if (\in_array($this->getItemsType(), [self::ORDER_MATERIAL, self::ORDER_MIXED], true)) {
$type = 'DW';
} elseif (self::ORDER_COURSE === $this->getItemsType()) {
$type = 'SzO';
} else {
$type = $this->hasInvoice() ? 'M' : 'P';
}

return sprintf('Order #%u / %s', $this->getId(), $type);
}

After analyzing the original code, I wrote a unit test for all cases:

use PHPUnit\Framework\TestCase;
use Piotr\Blog\Entity\Order;

final class OrderTest extends TestCase
{
/**
* Test data will be provided by ordersProvider().
* @dataProvider ordersProvider
*/

public function testDescription(int $orderId, int $itemsType, bool $hasInvoice, string $expected)
{
$order = new Order($orderId);
$order->setItemsType($itemsType)->setHasInvoice($hasInvoice);
$this->assertEquals($expected, $order->getDescription());
}

public function ordersProvider()
{
return [
[123, Order::ORDER_MULTIMEDIA, false, 'Order #123 / P'],
[123, Order::ORDER_MATERIAL, false, 'Order #123 / DW'],
[123, Order::ORDER_MIXED, false, 'Order #123 / DW'],
[123, Order::ORDER_COURSE, false, 'Order #123 / SzO'],
[123, Order::ORDER_MULTIMEDIA, true, 'Order #123 / M'],
[123, Order::ORDER_MATERIAL, true, 'Order #123 / DW'],
[123, Order::ORDER_MIXED, true, 'Order #123 / DW'],
[123, Order::ORDER_COURSE, true, 'Order #123 / SzO'],
];
}
}

I checked code coverage for this method, by curiosity. It was 100% which made me confident that all lines are executed during the test (however, it might not ensure that all cases are checked). Now I was ready to add another condition to the getDescription() method:

public function getDescription()
{
if (self::ORDER_VIDEO === $this->getItemsType) {
$type = $this->hasInvoice() ? 'W' : 'WP';
}
/* ... */
}

I ran my unit test and received no errors. Success – I didn’t break anything! Since I added some extra lines to my class, the code coverage dropped. I needed to add new test cases:

public function ordersProvider()
{
return [
/* ... */
[123, Order::ORDER_VIDEO, false, 'Order #123 / WP'],
[123, Order::ORDER_VIDEO, true, 'Order #123 / W'],
];
}

Test is passing and my code coverage is 100% again. Now I know that another developer who takes this code over will have a trustful test checking all conditions.

Make sure your business works!

The above example is easy. Unfortunately, in every day job writing unit tests is hard if we need to deal with a poor system architecture, tight coupling and… managers refusing the team to spend extra time writing tests. Moreover, unit tests alone might not be sufficient – you would want integration and UI tests also, and it takes time. But, as Robert C. Martin ironically pointed out once:

Can you imagine telling your users: You know, I don’t write tests. I just write the code. Sometimes it even works. And I ship it to you, and if there are bugs, you’re going to tell me, aren’t you?

Your company might not need a 100% code coverage – often it’s very difficult and even unprofitable (some people claim it’s dangerous). From the business perspective, we should write tests that protect the key business issues first – and that’s usually the domain logic. We want to make sure that none of the business rules that we agreed upon will break after deploying new features. We want to make sure that users will still be able do place orders, and all the orders will be properly accounted.

However if you want to write tests for business rules, then those rules have to be clearly written in the code. This is often cumbersome – and I’m going to tell you about it, some day.

How legacy code is made – part 2

In the previous post, I told you how tight coupling causes systems to be hard to maintain and expand with new features. Today I’m going to show you how developers often take shortcuts which then add up to the technical debt.

I worked on a big e-commerce system for one of the leading Polish education publishers. When I joined the team, there were already many crazy workarounds in the system. I found one particular gem which was really funny and perfectly showcased the absurd of taking ridiculous shortcuts. Let me describe it.

There was a feature for wholesale customers to import a CSV file with all the products they wanted to order. The import script read the CSV line by line and placed every product into the order. Every line started with a product code consisting of several letters and numbers. The Polish alphabet, apart from standard Latin letters, contains 9 letters with diacritics (accents). They are not contained within the basic ASCII set, so at some point the team ran into character encoding issues.

This ticket was found in the issue tracker:

Issue title: Cannot import CSV file with diacritics

Issue description: The import function do not recognize and RŁ2 product codes. Under Windows, CSV files have a default CP-1250 encoding. The database uses UTF-8 encoding. For these two product codes, we need to add special conditions.

Notice that the last sentence of the issue description is a trap; it suggests creating a workaround, some special case in the code. One PHP developer fell into that trap and implemented a fix like this:

while (($csv = fgetcsv($fp, 256, $delimiter)) !== false) {
// for CP-1250 files we convert Ł letter to UTF-8
$csv[0] = str_replace("\xA3", "\xC5\x81", $csv[0]);
// ...
}

Now imagine that we have some product codes with another diacritic letters. What would an average developer do? Add another special case for converting that character? It is really tempting to do just that, then commit the change and forget about it.

This is the problem with workarounds: people easily follow them instead of solving the problem properly. Not everyone has the courage to modify the code they did not write. People think, “If it’s been around here for several months or years, it must be correct. Someone who did this must have known what he or she was doing.”

In this specific example we could simply use PHP’s iconv function to solve the encoding issue properly. So it’s hard to say that anyone has actually made a shortcut here; in fact, the original solution was a much longer walk than it should be.

Don’t rush! You won’t save time!

We all do bad things when working under pressure of deadlines. But often a hurry makes us actually work slower. Trying to take a shortcut and creating workarounds has its consequences which then are stacked to something we call technical debt. Think, research and ask before you code.

New developers who join the team do not know the project and its practices. They just follow examples in the existing code base. This can lead to funny and awkward situations. We should spend more time on leaving good examples for our future coworkers (and ourselves). If we can’t make a good code before the deadline, we should clearly mark our workarounds and schedule some refactoring as soon as possible.

How a legacy code is made

“The (…) problem with legacy assets is how difficult and risky they are to invoke. They were developed with a particular feature-set in mind, and are highly coupled to those features. This makes it hard to access what you want directly without going through layers designed for very different purposes. The data you want is nearly always intertwined with other data and functionality, so that it is hard to detach just the part you want without dragging along an endless tangle of interactions and associations.”

Eric Evans, “Getting started with DDD when surrounded by legacy systems“, 2013

I received an apparently easy task to do. A head of sales wanted to receive a list of all products sold in one of the departments. She expected a table with just three columns: product name, default rebate name and rebated price. Almost every product had some rebate attached to attract clients.

This task should take maybe 15 minutes. Or maybe I should not even do it at all. Salespeople should have a reporting feature for that. However, it turned out that a mechanism to fetch products with prices was strictly tangled with the way these items were presented in the shop. I couldn’t retrieve all the products at once because I was forced to do it page by page (maximum 50 products per page). The rebate system worked only for a currently logged in user and everyone could have different rebates based on the account settings. I could not easily simulate rebates on another accounts. Eventually, preparing a rather simple products summary took three hours of writing a very dirty code…

The shop was custom-made to meet strict specifications oriented towards an ordinary web browser. From an end user’s standpoint, a shop should have a paginated list of products, a shopping cart, rebates info etc. Just some standard e-shop features. No one thought about additional use cases that might show up some day – like salespeople asking for summaries and reports. No one thought that some day we would need to share data through an API endpoint. Deadlines were coming after us. All developers just wanted to get it done according to specs. A legacy, proprietary PHP framework did not make it any easier.

That way, we created lots of code which became “legacy” and inefficient just after half a year past deployment.

Separating code responsibilities into layers

To handle such different requirements, a multi-tier/multi-layer and hexagonal architectures were invented. Sometimes, simple CRUDs shown in framework’s documentation are not enough; complex applications need a clear separation between a business/domain logic, infrastructure and presentation. We create a network of loosely coupled modules which, thanks to interfaces and dependency injection, can be easily switched and replaced. This approach has a lot of pros:

  • Every layer can be, to some extent, developed independently. Because of abstraction, the business logic does not depend on a particular way of presentation, page rendering, data storage etc.
  • Every layer can have a separate set of automatic tests.
  • While editing the code of one layer, we are not distracted by the code of other layers.
  • The same data can be presented in multiple ways. We just attach a different presentation layer implementation (HTML, PDF, JSON API, CLI…).
  • Business rules are clear. We don’t need to look for them in strange places, like templates.

“Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.”

Alistair Cockburn, “Hexagonal architecture

In practice, using a multi-layered, hexagonal architecture with loose coupling needs a lot of experience. I discovered that it’s really hard to convince a team to talk about abstract models. People tend to ask about the details very early, like database, graphics – and there’s nothing wrong about it. People need to imagine the final effect, a real use case – it makes them easier to understand the project.

During project discussions, I suggest not to avoid speaking about implementation. However, we should strive for a transparent and fluent code. It’s worth negotiating some additional time with the client, so that we can create a good project that will save him maintenance costs in the future.