Piotr Horzycki - Java and PHP developer’s blog

8 Programming Myths That Impede Your Career

2022-04-18T11:00:00+00:00

During 14 years of my software development career, I’ve seen - and was a victim of - numerous myths and fads of the IT industry. Common beliefs and misconceptions often impeded my career because I wasted energy on activities that did not bring the expected benefits.

Here’s my guide on how to avoid fighting for a lost cause as a software developer.

We must have Scrum

Most companies have management issues, and once a software developer experiences problems, they believe there must be a clever solution. Some silver bullet, some magic pill to end all pain.

Scrum is often depicted as such a magic wand, but a struggle to adopt it may become even more frustrating than the initial problems. We think that Scrum is a perfect solution, so we assume that it’s all our fault that we can’t get it work.

Leading people is a lot more than telling them what to do. Building an organization’s culture is a lot more than setting up Jira.

Although Scrum is a “lightweight framework”, it still imposes rules that some organizations will be unable to adhere to:

We work in constant timeframes (“sprints”) and try not to interrupt that work.
We have meetings (daily, review, planning, retro) in constant time and place.

If the company does not respect these two rules, Scrum won’t work. However, there are so many other things you can do to improve the management culture:

Establish a better feedback loop with business people. You need to know each other’s problems. You need to know how your technical solutions perform in real life. They need to know what it takes to build an IT system.
Discuss business priorities. Inform your client that you can’t do everything at once. Explain that multitasking causes bad performance.
Introduce high quality standards in your team. Static code analysis, tests, CI/CD, any kinds of automation.
Insist on transparency and good communication.
Integrate your team. Go out with them, have fun, get to know each other.

100% code coverage

Code coverage is a metric that defines a percentage of lines of code executed during tests. It is a common belief that higher test coverage means better quality.

How about this example:

class CalculatorTest {
  @Test
  void shouldPerformAddition() {
    int result = calculator.add(2, 2);
    
    assertTrue(true);
  }
}

This test verifies only the fact that the code has been executed without errors. But there is so much more to do! We need to check the logic, verify calculations, find edge cases.

In general, the more complex is the code, the more test cases you should have. A single line of code may be covered by multiple tests.

If your team aims only at the highest line coverage possible, it might:

give you a false sense of safety while in fact, the tests may pass serious bugs, and
encourage people to write phony assertions just to boost the overall metric.

A great tool to measure quality of tests is mutational testing. Programs like PITest or InfectionPHP generate different versions of your production code, for example by altering conditions, removing lines, and so on. If tests do not fail despite these changes, it means they don’t catch enough bugs (mutants). Usually these problems get fixed by writing more precise test cases.

Rewrite is necessary

Developers love greenfield projects because they treat them as playgrounds to try all the fancy techniques, tools and frameworks they crave for.

While maintaining an old and messy project, there’s usually a temptation to rewrite it from scratch. “This time we’ll do it better,” everyone thinks. This is rarely the truth.

Apart from obvious technical circumstances, like Adobe Flash being retired, a rewrite causes more harm than good. Martin Fowler in his book “Refactoring” tells a story of a software project gone down because of a fatal attempt to rewrite everything. The project was late, over budget and didn’t work properly.

My first action when dealing with a legacy project is to write proper tests, which are often missing. I want to understand the system’s behavior, including all hidden behaviors and side effects. With a good set of unit, integration, API and E2E tests I can proceed to refactor the most annoying parts of the system. Tests make me confident that I don’t break any of the existing behaviors that users actually rely on.

There are other efficient strategies to deal with legacy systems that do not involve a major rewrite: Strangler Pattern, Anti-Corruption Layer, Facade. All of them assume that you start building new modules step by step, but still route traffic to the old code. When a new module is ready, you just switch traffic.

Instead of conducting a costly rewrite that takes months to complete, it’s better to improve the project step by step and have a tight feedback loop. You can release a small fix every week and see if it’s working properly and how it contributes to the overall project quality.

Sophisticated architecture is cool

A lot of engineers believe that complicated things are more professional. When they master a difficult technique, they feel an urge to prove themselves and use the newly acquired skill in real life.

This goes along with trying to build the most flexible, dynamic and abstract system possible. As the source code is split into more and more layers of abstraction, it becomes more difficult to understand and maintain.

The urge to complicate software design is caused by:

The fear of legacy code. Developers traumatized by old, messy codebases are trying to avoid them so hard by using fancy design patterns that they’re actually making a new mess that only looks clever on the outside.
The fear of change. Developers notoriously ask this question: “What if business requests a change?”. This often goes in pair with the business imposing strict deadlines. Developers try to anticipate these requests by building a “flexible” system, but the development delays because of all the crazy tricks in the code.

I have two principles that help me overcome those fears: Just In Time and Keep It Simple, Stupid. I don’t have to build an empire on day one. Let’s start simple.

There has to be a balance. Not every project requires CQRS. Not every project requires an ORM. Not every project requires a ton of interfaces, abstractions, layers, providers, resolvers, adapters, or whatever fancy design pattern you love.

When I’m forced by business people to deliver working software fast, I always tell them: okay, I can cut corners, but the future changes will take more time. I warn them, and it’s fine. Business sometimes wants just to validate an idea, or quickly solve an issue. If you’re patient enough, they will eventually understand the technical consequences.

Also, remember the more sophisticated your architecture is, the more difficult it will be to onboard new developers.

Must have the latest version

Most software projects depend on external libraries and tools. It can be satisfying to upgrade to all new versions available, but how can you be sure that it’s better, or at least still works?

Despite semantic versioning promises, and all the effort that went into Quality Assurance, even the most professional software vendors make mistakes. Even a tiny update from version 1.2.0 to 1.2.1 can introduce bugs.

On the other hand, regular updates are important due to security issues being discovered. It’s also cool to work on recent software and easier to attract new talents.

You can save yourself from trouble and make updates easier by implementing integration and E2E tests.

Must follow all the trends

The so-called “Hype-Driven Development” (or ironically, “Resume-Driven Development”) has many victims. There can be a strong neophyte effect after reading a popular book or attending a tech conference. People think that their workplace can be improved only by applying all the recent discoveries.

Somewhere between 2015 and 2018, there was a huge hype on microservices. Conference speakers claimed that we should split old monoliths into flexible microservices, just because Netflix does this. They didn’t warn about all the additional problems caused by the new approach: performance, stability, data separation, and so on. Several years later there were voices saying that microservices are not for everyone and you should consider a modular monolith.

It’s good to know what happens in the industry, but you shouldn’t adopt every new buzzword. Carefully analyze whether the new solution fits your project and organization.

We don’t need meetings

Business people focus on meeting people, talking to them, building relationships and a network. Developers love to focus on the code. They depict every meeting as an interruption, “not work.”

How many times did you hear a meeting being concluded with these words: “Ok, let’s go back to work.” Was that meeting not work? Of course it was, but for developers, only coding feels like “real work.” This is a mistake.

Developing software is a team effort, and to build a team (and a product), you have to talk to each other.

If you feel overwhelmed by meetings, possible solutions include:

Having a clear goal and agenda for every meeting. If you receive an invitation without these things specified, ask for details or deny.
Putting all the meetings (like Scrum ceremonies) in one day. This works really well for my team.
Adding “Focus time” or similar items in your calendar, so that other people know you’re busy. You have a right to go offline from time to time!
Picking a moderator for every meeting. That person is responsible for making sure a meeting is effective and comfortable. It can be a Scrum Master, but doesn’t have to.
Utilizing every tool possible to make communication better: webcams, Miro, Notion, Google Workspace. Instead of just talking, make everyone collaborate on a document, diagram, drawing.

Business doesn’t understand us

It’s common among developers to think that the “ordinary” business people don’t understand nor appreciate how “clever” the developers are. Whether business is requesting crazy features, imposing deadlines, or just complaining about broken software - I can often hear this voice inside development teams: “oh, they don’t understand.”

Business people focus on getting clients and making money. Developers focus on technology. It’s important to both parties to explain difficult topics to each other. Why is the business doing all these pivots? Why are the developers talking about a rewrite again? Just talk to each other, and it will already solve a lot of problems.

Another important thing to do is to get out of your room and gain a wider perspective. When you put so much effort into solving a small coding problem, it can be totally insignificant from an overall company’s standpoint. Relax and focus on something that matters more.

How to set a font in a PDF document

2021-07-17T14:00:00+00:00

In this article, you will learn how to set custom fonts when converting HTML to PDF. We will cover several conversion tools, including Headless Chrome, WeasyPrint, Prince, wkhtmltopdf and PHP libraries: mPDF, TCPDF and Dompdf.

Some theory about fonts and text

Before we start, there are some terms you should familiarize with.

Most documents are based on text. To build a piece of text you need characters that will make letters and words.

A character set defines mappings between numeric codes and characters: letters, digits, symbols, and so on. For example, in the ASCII table, the decimal number 65 represents a Latin letter A. This is an abstract representation; we still don’t know how this letter should be drawn on screen or printed.

An encoding specifies how the character codes will be represented as bytes. For ANSI this is simple: a byte value 65 (decimal) is equal to ASCII code 65, which represents capital letter A. However, if a character set exceeds 256 possible values of a single byte, we dive into the world of multi-byte encodings. The most popular ones are UTF-8, UTF-16, UTF-32, UCS-2 and UCS-4 for the Unicode standard.

A font is a set of glyphs - readable characters and other symbols that represent a character set. A font data file contains either bitmaps or vectors that make up all the character shapes.

The first thing you need to properly render your HTML code to PDF is a character set declaration:

  
     charset="utf-8">
  
    ...

Having these basics described, we can start using fonts and typing!

Picking a proper font

To use a custom font, first you have to choose one that covers all characters you need in your document or its part. This should be common sense, but sometimes we (or the client) forgets about it.

For example if you pick a fancy header font and your language includes non-Latin characters (accents, umlauts, ogonki, Cyrillic alphabet etc.), check if the font contains glyphs for them! Either use a website that allows testing fonts or download the font files and try them in some text editor or graphics program.

Usually, there are no “one-size-fits-all” solutions. Some fonts do not have an “italic” or “bold italic” versions on purpose. Some fonts contain only uppercase letters (capitals). Other fonts, like fancy handwriting-like ones, are not readable in small sizes.

Font types supported by PDF

The most common font file formats are OpenType, TrueType and Type 1. They differ in features and the way of describing shapes. All of them can be used in a PDF document.

The so-called “web fonts” are usually compressed with a WOFF2 format which is not supported by PDF. Google Fonts, a popular web font provider, fortunately offers a “Download family” feature which gives you the full TrueType archive.

However, if you only have a WOFF2 font file, you can still convert it to TrueType or OpenType. Either use an online tool, or the Linux terminal:

sudo apt install fontforge woff2
woff2_decompress font.woff2

Selecting a font in CSS

Let’s remind ourselves how to pick a font in CSS. The most basic syntax looks like this:

body {
  font-family: Verdana, Arial, sans-serif;
}

The example above means that we prefer the Verdana font, but in case if it’s not available we recommend substituting it either with Arial or any sans-serif font. We depend only on fonts available in a certain system. Every OS has a basic set of fonts, but you can also install your own.

Moreover, every PDF reader provides standard Type 1 fonts, including Times-Roman, Helvetica, Courier and Symbol.

You might want to use a custom font in your document without installing it globally in the operating system. In the example below, we import a font file and assign a local name Lato. We declare this is a normal (not italic) font of a regular weight:

@font-face {
  font-family: 'Lato';
  font-style: normal;
  font-weight: normal;
  src: url('file:///path/to/my/project/lato.ttf') format('truetype');
}

body {
  font-family: Lato;
}

The @font-face syntax works fine with any Chromium-based tools, and also WeasyPrint and Prince. Other tools make selecting a font a bit harder.

Providing a font to wkhtmltopdf

For security reasons, wkhtmltopdf blocks any access to remote font files. It cannot even read a font file from a local drive.

To pick a custom font, we will use a data URL trick. First we have to encode the font file with Base64. We can use either the PHP function base64_encode(), the Linux console command base64 or any Base64 encoder available online.

Then we copy the encoded file contents and paste into the CSS:

@font-face {
  font-family: 'CaslonItalic';
  src: url(data:font/truetype;charset=utf-8;base64,PASTE_IT_HERE) format("truetype");
}

body {
  font-family: CaslonItalic;
}

Because an encoded font file can be very long, it’s more convenient to move the @font-face declaration to a separate CSS file and then use @include to attach it to the main stylesheet. You can decide if you want to include that encoded file in your repository, or generate it on-demand in some build script.

Providing a font to Dompdf

The Dompdf PHP library has its internal font metrics engine which incorporates local caching. The mechanism is cumbersome because you have to manually register the font before using it.

Below, I assume that you’ve installed Dompdf with Composer, hence the vendor directory.

This can be done with a load_font.php script which is available in the dompdf/utils package. Since it would require to copy another repo to the vendor/dompdf/dompdf directory, I don’t really like this method.

Another way is to extend your PDF rendering code. During the first round, Dompdf will create cache files in the vendor/dompdf/dompdf/lib/fonts directory - which means your script must have write access there. Next time, those cached resources will be used to embed the font in a PDF:

use Dompdf\Dompdf;
use Dompdf\Options;

$fontDirectory = '/home/someuser/fonts';

$options = new Options();
$options->setChroot($fontDirectory);

$pdf = new Dompdf($options);
$pdf->getFontMetrics()->registerFont(
    ['family' => 'CaslonItalic', 'style' => 'italic', 'weight' => 'normal'],
    $fontDirectory . '/CaslonItalic.ttf'
);
$pdf->loadHtml($html);
$pdf->render();
file_put_contents('output.pdf', $pdf->output());

The setChroot() call is necessary for security purposes, so that Dompdf won’t access any system files.

Note that when adding a font file you must specify its corresponding style and weight.

Setting a custom font in mPDF

mPDF has a decent documentation which explains a lot of nuances related to international font handling.

To use your own font you have to register it. There is one major drawback: you have to invent a font family name that’s all lowercase and without any spaces nor other special characters. So instead of font-family: 'DejaVu Sans' you have to enter font-family: dejavusans.

You can register as many font directories as you need. Moreover, you’ll need a temporary directory to store font cache. By default it’s vendor/mpdf/mpdf/tmp/mpdf/ttfontdata (assuming you’ve installed mPDF with Composer) and your script must have write permissions for that. Fortunately you can set another cache path:

use Mpdf\Config\ConfigVariables;
use Mpdf\Config\FontVariables;
use Mpdf\Mpdf;

$fontDirectory = '/home/someuser/fonts';

$defaultConfig = (new ConfigVariables())->getDefaults();
$fontDirs = $defaultConfig['fontDir'];

$defaultFontConfig = (new FontVariables())->getDefaults();
$fontData = $defaultFontConfig['fontdata'];

$mpdf = new Mpdf([
    'fontDir' => \array_merge($fontDirs, [
        $fontDirectory,
    ]),
    'fontdata' => $fontData + [
        'caslon' => [
            'I' => 'CaslonItalic.ttf',
        ],
    ],
    'tempDir' => $fontDirectory . '/tmp',
]);
$mpdf->WriteHTML($html);
$mpdf->Output('output.pdf', 'F');

When registering font files, you have to declare their style with R, B, I and BI identifiers, corresponding to “regular”, “bold”, “italic” and “bold italic” styles, respectively.

Custom fonts in TCPDF

TCPDF follows a similar font registration pattern to the previous two libraries. You can do it in two ways - either in the command line, or directly in PHP code.

Thanks to the command line you can embed the conversion commands in some Continuous Delivery pipeline that builds your application. Instead of committing the temporary font files, you can rebuild them every time with a simple command like this (assuming you’re using Composer):

php ./vendor/tecnickcom/tcpdf/tools/tcpdf_addfont.php -b -f 32 -o /home/someuser/fonts/tmp/ -i CaslonItalic.ttf

If you don’t use the command line, you can still do the same conversion thing in PHP using the TCPDF_FONTS class:

$fontDirectory = '/home/someuser/fonts/';

// The trailing slash is mandatory here
$tempDirectory = $fontDirectory . 'tmp/';

$fontname = TCPDF_FONTS::addTTFfont(
    $fontDirectory . 'CaslonItalic.ttf', 'TrueTypeUnicode', '', 32, $tempDirectory
);

$pdf = new TCPDF('P', 'mm', 'LETTER');
$pdf->AddPage();
$pdf->AddFont($fontname, 'I', $tempDirectory . $fontname . '.php');
$pdf->writeHTML($html);
file_put_contents('output.pdf', $pdf->Output('', 'S'));

The addTTFfont() method parses the original font file and creates three temporary files in the directory of your choice. Obviously, the script must have write access to that path. The return value holds a font file name which is usually a lowercase string. With AddFont() method you register the PHP font definition file created earlier.

Now you can use the font inside the document like this (remember about the lowercase font family name):

body {
  font-family: 'caslon';
  font-size: 72pt;
  font-style: italic;
}

Instead of using CSS, you can also set the current font with PHP:

$pdf->SetFont($fontname, 'I', 72);

The mysterious number 32 which appears both in the command line call and the addTTFfont() method is the font descriptor flag from the PDF specification. Fixed and italic fonts are usually autodetected, but for other types you have to specify an exact flag value:

Font descriptor flag	Meaning
1	fixed font
4	symbol font
8	script (handwriting)
32	non-symbol (standard) font
64	italic font
65,536	all caps (no lowercase letters)
131,072	small caps

TCPDF does not support OpenType nor WOFF2 fonts.

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

How to encrypt a PDF document in PHP

2021-06-16T16:00:00+00:00

If your business uses Portable Document Format to send private and sensitive data like bank documents, you might need to use password protection. In this article you’ll see how to encrypt PDFs with tools available for PHP.

Types of encryption

To protect document contents, an encryption algorithm has to be used. PDF supports symmetric ciphers which use a password specified by the document creator to build an encryption key. The person who receives the document has to enter that password in order to decrypt the document.

Document Viewer on Ubuntu asking for a password

We have two algorithms to choose from, with different key lengths. The longer the encryption key used, the harder it is to crack the code:

RC4. The first algorithm supported by PDF. Unfortunately it is perceived as insecure because multiple vulnerabilities were discovered. Still, it’s the only algorithm implemented by most free PDF generators. Available key lengths are 40 and 128 bits.
Advanced Encryption Standard. This algorithm is approved even by the U.S. government to protect classifed information. There is no foreseeable possibility to crack the AES cipher in a reasonable time; with modern hardware it would take billions of years. If you receive password-protected bank documents, they’re most likely encrypted with AES. Available key lengths are 128 and 256 bits.

User permissions

When encrypting a PDF document, you can specify two passwords. One of them is for you as the document owner, so you can perform any editing and printing tasks. You can also set a user password that gives limited access to the document.

You decide what privileges you give to other people. For example, you might disallow full quality prints, so that users have only a preview. You might disable editing, disassembling pages structure, filling forms, and so on.

It is however up to the PDF reader to enforce these rules. A hacker could implement their own reader to disobey the limitations. Once a document is decoded with a password, a reader has full access to it and can perform any operations.

In the examples below we will set separate owner and user password, but the latter is always optional.

Encryption with TCPDF

The TCPDF library is the only free tool I know which supports all ciphers, including the strongest 256-bit AES. Internally, TCPDF uses PHP’s OpenSSL and Hash extensions to perform encryption.

$pdf = new TCPDF('P', 'mm', 'LETTER');
$pdf->SetProtection(
    ['print', 'modify', 'copy', 'annot-forms', 'fill-forms', 'extract', 'assemble', 'print-high'],
    'test123', 'test456', 3
);
$pdf->AddPage();
$pdf->writeHTML('Hello world');
file_put_contents('output.pdf', $pdf->Output('', 'S'));

The last argument to the SetProtection() method is a number specifying the algorithm and key length. The numbers start with 0 for a 40-bit RC4, and end with 3 representing 256-bit AES.

If you put an empty array as the first argument, no permissions will be granted for document users except for displaying it on a screen.

Encryption with mPDF

The mPDF library is better in HTML rendering than TCPDF. Unfortunately it doesn’t support such a wide range of encryption algorithms. You can only choose between 40-bit and 128-bit RC4 ciphers. The key length is specified as the fourth argument to the SetProtection() method below:

use Mpdf\Mpdf;

$pdf = new Mpdf(['format' => 'LETTER', 'orientation' => 'P']);
$pdf->SetProtection(
    ['print', 'modify', 'copy', 'annot-forms', 'fill-forms', 'extract', 'assemble', 'print-highres'],
    'test123', 'test456', 128
);
$pdf->writeHTML('Hello world');
$pdf->Output('output.pdf', 'F');

Encryption with Dompdf

The Dompdf library is quite good at rendering HTML and CSS code into a PDF. When it comes to encryption, it supports only the weak 40-bit RC4 cipher:

use Dompdf\Dompdf;

$pdf = new Dompdf();
$pdf->getCanvas()
    ->get_cpdf()
    ->setEncryption('test123', 'test456', ['print', 'modify', 'copy', 'add']);
$pdf->loadHtml($html);
$pdf->render();
file_put_contents('output.pdf', $pdf->output());

Also, Dompdf supports only basic four permissions from an older PDF standard.

Encryption with FPDF

The FPDF library does not have built-in encryption, but there’s a separate code snippet to implement 40-bit RC4.

Setasign offers a commercial library that supports encryption up to 256-bit AES.

Encrypting an existing file with command line tools

If your favorite PDF generator does not offer encryption, you can use PDFtk Server to encrypt an existing file with 128-bit RC4. PDFtk does not support AES.

With PDFtk, protecting a file is as simple as running the command below in your terminal:

pdftk input.pdf output encrypted.pdf owner_pw test123 user_pw test456

To use AES, you need to pick a commercial tool, for example Coherent PDF.

Summary

Document encryption is a good way to protect the document contents from being accessed by an unauthorized person. Banking documents are often sent via email and they could be stolen from a person’s account. With strong encryption, it’s not possible to read them.

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

Executing shell commands from a PHP script

2021-04-02T19:00:00+00:00

If you need to call an external program from your PHP script, for example to create a PDF file or convert images, there are several ways to do that.

I strongly recommend using the Symfony Process Component. It wraps around native PHP functions like proc_open() and it provides extra level of security. It is also very convenient because of an object-oriented interface. Take a look at this example:

use Symfony\Component\Process\Exception\ProcessFailedException;
use Symfony\Component\Process\Process;

// call the wkhtmltopdf program with two arguments; specify input and output
$process = new Process(['wkhtmltopdf', '-', '-']);

// send something to the command's input
$process->setInput($html);

try {
    // wait for process execution
    $process->mustRun();

    // get output from the command
    $pdf = $process->getOutput();
} catch (ProcessFailedException $exception) {
    echo $exception->getMessage();
}

Most guides around the web will simply tell you about functions like exec(), but that’s not how you should do it. You can either end up with security issues in your application, or just lack features.

This has been a short introduction. If you have more time to read, let me show you all the details of calling an external process from a PHP script.

Input, output and exit codes

A program running under an operating system is a process. This is the word we are going to use. We can run processes for example by entering commands inside a terminal.

A command usually consists of the program file name followed optionally by a set of arguments. Every program expects different arguments, and if we don’t know them, we simply ask the program for help:

ls --help

A process has several connectors to the surrounding environment. It’s using them to transfer data over streams.

A stream is just a sequence of bytes. There are three default streams in a terminal:

standard input (STDIN), connected to the keyboard
standard output (STDOUT), connected to the screen
standard error output (STDERR), either displayed on the screen or written to a log file

Using streams gives us great flexibility. Instead of operating on a real console or real files in PHP, we can send a string variable to a process and then read the output into another variable. We don’t need to remember to delete a file. This will help a lot for example with PDF conversions.

When calling processes that operate on files by default, sometimes a hyphen (-) is used in place of a file name argument to indicate that the process should read from STDIN instead, or write to STDOUT instead of a real file.

Additional input to a process consists of environment variables. These can be some user-specific data stored in their home directories, or variables provided at the process startup. They make the arguments list shorter because we don’t have to specify common settings on every command call. Perhaps the most known environment variable is PATH which stores a list of directories where commands are searched.

A process can also return a special code - exit code - which indicates success or failure. The convention is to use 0 in case of success and any other code from 1 to 255 to show a different situation.

You can connect several processes in a chain using the pipe operator. This means that the output of the first process is tied to the input of the second process, and so on. Such mechanism is commonly used in terminals, for example to paginate long output:

ls -al | less

In the example above, the output of the ls command was sent to the less command. If the ls failed, the chain would break and the second command would not be called.

By default, we have to wait until each process terminates. This means our PHP script will also be paused when executing an external command. If you add an ampersand (&) at the end of the command, the command will run independently. It won’t block your script and will last even after the script stops. You might need this especially when launching lengthy processes like generating a 100-page report.

It’s good to have some control over such a background action. Fortunately, every process receives an identifier after being opened. The process identifier (PID) can be used later for example to check if the process is still running or to shut it down.

Now that we have the general rules covered, we can switch to the PHP world.

Basic execution from a PHP script

There are four (!) PHP functions which purpose is to run an external command and return output:

exec() accepts command as input and returns the last line from the result of the command. Optionally, it can fill a provided array with every line of the output and also assign the return code to the variable. On failure, the function returns false.
passthru() executes a command and passes the raw output directly to the browser. The PHP documentation recommends it in case if binary output has to be sent without interference.
shell_exec() executes a command and returns the complete output as a string. It does not provide the exit code. The function return value is confusing because it can be null both if an error occured or if the command produced no output.
system() acts like passthru(), but it also returns the last line of the output. This function works well only with text output.

To confuse you even more, PHP has a backtick operator which works just like shell_exec():

$output = `ls -al`;

I don’t use any of these functions because none of them provides full control over streams.

Escaping arguments

Sometimes the full command is made from several parts, for example a file name coming from a user. We have to filter such input data properly to make sure it does not contain any unescaped special characters like spaces, quotes, backticks, slashes, and so on. They could either break the command or cause security issues.

An attacker could inject any other command and perhaps access protected data:

// this is terribly unsafe
system('touch ' . $_POST['filename']);

/*
 * $_POST['filename'] could be equal to something like:
 *   a || cat /etc/passwd
 * so the full command would become:
 *   touch a || cat /etc/passwd
 * which would reveal the contents of a protected file.
 */

Every command argument should be filtered by the escapeshellarg() function. This will ensure that your input data will be properly treated as a single safe argument:

// this is safer, but still ugly
system('touch ' . escapeshellarg($_POST['filename']));

Of course the filename parameter should be further filtered to make sure an attacker cannot access any other directories outside the current one:

// this is safe assuming we are in a special directory for uploaded content
system('touch ' . escapeshellarg(basename($_POST['filename'])));

Opening and controlling a process

The proc_open() function provides the most possibilities to control a process execution. Its usage requires a lot more code than the one-liners mentioned earlier, but it pays off.

Here’s an example of calling a wkhtmltopdf program which converts an input HTML document to a PDF. We’ll supply the HTML contents to STDIN and read the output from STDOUT:

$html = 'Test';

$descriptors = [
    0 => ['pipe', 'r'],  // we will write to stdin
    1 => ['pipe', 'w'],  // we will read from stdout
    2 => ['pipe', 'w'],  // we will also read from stderr
];

// this array will contain three pointers to all three pipes
$pipes = [];

// we're starting the process now
$process = proc_open('wkhtmltopdf - -', $descriptors, $pipes);
if (is_resource($process)) {
    // the process has been opened, we can send input data
    fwrite($pipes[0], $html);

    // you have to close the stream after use
    fclose($pipes[0]);

    // now we're reading binary output
    // PHP will wait until the stream is complete
    $pdf = stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    $errors = stream_get_contents($pipes[2]);
    fclose($pipes[2]);

    // all pipes must be closed now to avoid a deadlock
    $exitCode = proc_close($process);
}

Here you can see how we invoked the wkhtmltopdf process and told it to operate on standard input/output streams instead of real files (notice the two hyphens). Our script was halted until the external program returned full output and terminated. If everything went fine, $exitCode should equal 0.

There are three optional arguments to proc_open(), in consecutive order:

$cwd - current working directory; if not specified, the process will operate in the same directory as the current PHP process.
$env - an array of environment variables. If not provided, the child process will inherit all the environment of the PHP process.
$other_options - at the moment this can only contain Windows-specific console options. Nothing to see here.

If you only need a unidirectional pipe, you can use the popen() function (isn’t PHP function naming confusing?). A one-way communication is easier to handle:

// we will send HTML contents to STDIN and save PDF output to a file
$process = popen('wkhtmltopdf - output.pdf', 'w');
fwrite($process, $html);
pclose($process);

The most convenient solution: The Process Component

All the code demonstrated above looks like low-level C programming. It’s not really comfortable in the modern age of object-oriented programming and abstractions. Today you shouldn’t worry about resources, pointers and streams.

To include the Process component in your project, just use Composer in the command line:

composer require symfony/process

To remind you, the basic usage consists of just creating an instance of the Process class, providing input arguments and getting output:

use Symfony\Component\Process\Exception\ProcessFailedException;
use Symfony\Component\Process\Process;

$process = new Process(['wkhtmltopdf', '-', '-']);
$process->setInput($html);

try {
    // wait for process execution
    $process->mustRun();

    $pdf = $process->getOutput();
} catch (ProcessFailedException $exception) {
    echo $exception->getMessage();
}

Notice that we pass input arguments as an array. We no longer create a long command invocation by hand; the Process component assembles the call, taking care of proper argument escaping. Instead of exit codes, we use exceptions just like it should be done in a modern object-oriented environment.

An alternative to $process->mustRun() is just using $process->run() and then checking the result of $process->isSuccessful().

The process will inherit all environment variables from the PHP process running the script. You can provide additional variables (or override them) at runtime:

$process = new Process(['ls', '-al']);
$process->run(null, ['SOME_VARIABLE' => 'value']);

Refer to the documentation of the specific process you are calling to know all the input rules.

Asynchronous and background processes

As we know, running a child process blocks the parent process by default. This is the easiest and safest behavior.

Some processes take considerable amount of time and it would be good to let the user know what is happening. You can provide an anonymous function which is going to receive every piece of output coming from a child process:

// we need to receive the current unbuffered output of a process
ini_set('output_buffering', 0);

$process = new Process(['wkhtmltopdf', '-', '-']);
$process->setInput($veryLongHtml);
$process->run(function ($type, $buffer) {
    if (Process::ERR === $type) {
        $errorOutput .= $buffer;
    } else {
        $mainOutput .= $buffer;
    }
});

If our main script logic does not strictly depend on the complete child process execution, we can run things in parallel. While a child process does its job, we can do other things in the meantime and occassionally check on that process:

$process = new Process['wkhtmltopdf', '-', '-']);
$process->setInput($veryLongHtml);
$process->start();
$pid = $process->getPid();

// do some other things here...

$process->wait();
echo $process->getOutput();

Timeouts

To prevent a process from hanging forever, two types of timeout mechanisms were introduced:

a general timeout, measured from the process start,
an idle timeout, measured from the last output received from a process.

By default, a process has a general timeout of 60 seconds. You can change it with the setTimeout() method of the Process class. The other clock can be adjusted with setIdleTimeout().

When running a lengthy command asynchronously, you must use checkTimeout() to see if the timeout is reached.

Remember there are plenty of other timeouts in the surrounding environment. PHP has also its own maximum script execution times - different for a web server and CLI (Command-line Interface).

If your child process stops unexpectedly, this might mean that you’re exceeding some timeout, either set by yourself, the PHP environment, an operating system, a database or anything else.

Reporting progress of time-consuming tasks

When a user requests a report, a package or any other piece of data which preparation takes more time, you should not make people stare at the loading icon forever. They will either think that their internet connection is broken, or the server went down. They might panically hit the “Refresh” button, and thus make even more trouble by causing multiple requests to start. They might even hang your server.

The basic solution is to add a message which says “This might take a few minutes.” However, a user might hit a browser timeout while watching that loading icon.

It’s better to send the results in an e-mail, or make some push notification which says “ok, you can download your file here.” The user knows that they don’t have to wait until the process is done, they can just leave the computer for a while and come back later.

If the process produces a series of files, let’s say a hundred PDFs, it is fairly easy to track progress. The worker process simply has to report the number of finished items, for example by writing it to a file. Your frontend will simply read that file and render a nice progress bar. You can also track time of every item preparation to make fancy estimations about the remaining time.

Why not go even further and have anonymous statistics of all user requests? You can then tell users: “this usually takes 3 to 5 minutes.”

Queueing tasks

Your server has limited resources. What happens if a thousand users suddenly request a freshly generated PDF document?

In bars, restaurants and shops, people wait in line to be served. There might be for example three people on the counter, and every one of them can serve one customer at a time.

Same rules apply to computing. A processor has a finite number of cores. Running more processes than the number of cores requires your processor to switch between tasks. Your PHP installation when using FPM also has some pool settings which defines how many requests can be handled simultaneously.

When you know how much tasks your system can take at once, you should enforce limits and use some queue system. It can be RabbitMQ or Kafka, for example. These are battle-tested tools which are going to control your queues. I’m not covering them in this book.

Having a queue means that your user will have even more waiting time. You should take this into account when informing users about estimated delivery time. However, this is a basic way to ensure that your customers will be served at all, eventually. If you let everyone in at the same time, chances are no one will be served successfully and you’ll get bad reviews.

Wrapping up

There’s a lot of low-level PHP functions to call external commands, but today the best option is to use a wrapper library like Symfony Process.

When running other processes from your PHP scripts, you need to know how a process works, how to read and write data, and how much tasks your server can handle at once. Try queueing time-consuming tasks.

Too much REST will harm you: don’t blindly follow it!

2021-03-10T19:00:00+00:00

The screenshot above shows an example REST API described by Swagger (from petstore.swagger.io).

REST, or Representational State Transfer, is a set of web architecture best practices. Perhaps it is best known for associating resources and actions in order to create clean API interfaces. Although REST works perfectly fine in most situations, I will show you how it can cause security issues where security matters most: the payment industry.

The principles of REST were described in 2000 by Roy Fielding in his doctoral dissertation. He also co-founded the Apache HTTP Server project and he chaired the Apache Software Foundation. Who am I to disagree with such an accomplished man?

I’ve been working as a developer in the payment industry for two years and I’ll tell you one thing: this is different from most software development jobs. You can get away with a lot of bad behaviors in other industries, but here they are just unacceptable. For example, I know a case of a person who was fired from a payment company after editing production database without permission.

Security and reliability are obviously the apple of the payment industry’s eye. Companies have to pass external security audits which confirm compliance to several standards like the Payment Card Industry Data Security Standard. Otherwise, if hackers compromise their systems, these companies might go out of business.

When you start working in a high profile industry like this, suddenly you have to face challenges you were previously unaware of.

A verb and a resource: the essence of REST?

What Roy Fielding basically did was to promote best practices for the World Wide Web architecture and “identify architectural mismatches.” This became important as he witnessed multiple engineers around the world rapidly pushing the web into multiple directions, often making design mistakes or deviating from initial concepts.

As we know, the web consists of “resources” identified by their “locators” (URLs). A locator identifies a resource and tells you where you can find it. You can perform several tasks with these resources. The most common task is to GET it. You can also PUT a resource on the server, or even DELETE it. A POST method is commonly used to submit forms and also binary files, JSON or other data structures.

This simple and clever concept allows you to build a clean and understandable interface of a web service. Let’s say we have a blogging platform:

GET /article/too-much-rest will simply retrieve the article identified by a string “too-much-rest.”
PUT /article/too-much-rest will either create a new article or update an existing one. Server expects the request body to contain article contents.
DELETE /article/too-much-rest will remove the article “too-much-rest” if it exists.

As I said before, a URL serves two purposes. First, it identifies a resource. The string article/too-much-rest appears in all above URLs and it suggests that there is only one such resource on the server. Additionaly, an absolute URL like https://example.com/article/too-much-rest will also tell us where the article is stored and what protocol is used to communicate with that server.

Mr. Fielding has put a lot of pressure on using proper verbs for certain actions, for example:

It isn’t RESTful to use POST for information retrieval when that information corresponds to a potential resource, because that usage prevents safe reusability and the network-effect of having a URI.

Avoid exposing sensitive data in URLs!

So far we’ve discussed an example public API for a blogging platform. Nothing sensitive there. Blogging is public most of the time.

Now, imagine you are developing a top-secret system which handles customers’ names, their account numbers, passwords, SSNs, transaction IDs, session IDs, and so on. Imagine billions of dollars flowing through that system.

You should never create even an internal, private API like this:

GET /customers/social-security-number
GET /customers?name=John&surname=Doe&city=London
POST /customers/data?sessionId=1234

The reason is simple. Always treat a URL like public data because:

It is stored in browser’s history, sent over network to synchronize your account between devices, submitted to search engines, and so on.
It can be copy-pasted over various communication platforms. Especially if the URL is long, it’s easy to ignore sensitive data it contains.
It can be sent in a Referer header to other sites.
It is logged by multiple network devices, servers and proxies. These logs can be aggregated by third-parties, exchanged over insecure channels, etc.

Pentesters are going to report this

I once had a lot of work redesigning a robust system just because a pentester discovered sensitive data sent over GET requests. Earlier, a developer simply wanted to design a “proper” REST API, so they put customers’ data in the URLs.

The Common Weakness Enumeration (CWE) calls this vulnerability “Use of GET Request Method With Sensitive Query Strings”:

The query string can be saved in the browser’s history, passed through Referers to other web sites, stored in web logs, or otherwise recorded in other sources. (…) At a minimum, attackers can garner information from query strings that can be utilized in escalating their method of attack.

Exposure of sensitive data is also listed as #3 in the OWASP Top Ten Web Application Security Risks. URLs are pointed out as one of the ways data can be exposed.

How to protect your system?

Analyze what data the system processes and how. Limit the amount of processed data to the minimum. Don’t send more fields than needed, “just in case someone needs them in the future.”

Use the POST method to send sensitive, private data inside a request body. This is important even if the requests flow only inside an internal company network.

You can even perform “POST redirections” if needed. Instead of sending a mere Location header, prepare a HTML form and submit it automatically by JavaScript. Even IFRAMEs can be loaded with POST. Many payment platforms work this way.

Another solution is to encrypt URL parameters. However, there are different opinions on URL encryption. This isn’t an easy task because encryption algorithms get cracked sooner or later. There are many nuances to think about, like padding. Consider if the benefits are worth the effort.

“Form follows function”

Remember that in the 1990s, when the World Wide Web was born, its creators dreamed about a publicly accessible repository of knowledge and services — for everyone. On the contrary, most modern industries require utmost privacy and security.

I believe that Roy Fielding and other wise creators of the web standards did an awesome job. However, please keep in mind the web is such a dynamic environment that anything can happen. Always try to choose right tools for the job. Don’t stick to buzzwords.

It is important to realize that the web still grows rapidly all around the world and many of its core technologies are used in a way not predicted by their makers. It is your responsibility to use them wisely.

Article originally published on Medium

Picking a PHP tool to read and manipulate PDF files

2021-03-01T16:00:00+00:00

TL;DR For simple PDF text and metadata extraction, use pdfparser. For advanced options, try pdftotext and pdfinfo from Poppler. To join or split PDF files, encrypt them or apply watermarks, use pdftk. To make a JPEG or PNG screenshot of a PDF, use ImageMagick or pdftocairo.

In the previous article I described several tools that can be used together with PHP to create PDF files. Back then, the choice was not easy and we had a lot of criteria to consider while picking the best tool. Today we will browse possibilities to read and edit existing PDF files.

Native PHP libraries

Again, we will start from checking if there are any PHP libraries to manipulate PDF files without depending on external binary tools.

pdfparser

There is an interesting library called smalot/pdfparser. It has over 1500 stars on GitHub. It parses a PDF file into an array of document objects which is further processed to get what we need.

The library is convenient as it supports both parsing an existing file or a string with PDF data. It allows you to extract metadata and plain text from a document along with other objects (images, fonts). However, encrypted files are not yet supported. You can test the library at its demo page.

$parser = new Smalot\PdfParser\Parser();
$document = $parser->parseFile('test.pdf');

// creator, date of creation, number of pages etc.
print_r($document->getDetails());

// text dump
echo $document->getText();

smalot/pdfparser has commercial support from Actualys.

tc-lib-pdf-parser

This is a library made by the creator of TCPDF, a well-known library generating PDF files. This parser draws less interest than the first one, though the author has over 15 years of experience handling PDFs.

You can compare both libraries by parsing different documents. They can differ especially in terms of processing corrupted files.

FPDI

I got familiar with this library when I received a bug report for a watermarking module in some e-book system. The module received a PDF, parsed it using FPDI, generated a watermark with FPDF and stamped it over all pages.

The problem is that the free version of FPDI supports only PDF version 1.4 and below. To support higher document versions, you have to buy a full library. And that’s what the bug report was about. We decided to switch to another tool, pdftk, which is described below.

Command-line tools

The first command-line tool I played with was pdftk. I used it to join separate documents into one, apply watermarks and extract basic metadata, like a number of pages. It supports all PDF formats unlike FPDI library. The only thing that’s missing is a text extraction feature.

The need to extract plain text from a document led me to the Apache PDFBox library. It is written in Java and, as I described before, it offers some very nice features. However, in the PHP world we can only access a CLI wrapper for that library which has a limited set of options.

Later I discovered the Poppler library, which is said to fully support the ISO 32000-1 standard for PDF. This C++ library can be accessed via dedicated CLI tools – poppler-utils, which we can run from PHP. For example, the pdftotext tool gives a lot of control over the plain text dump – you can even preserve a proper document layout while rendering, or crop the document to a specified region. Also, pdfinfo provides comprehensive information about a file, like page format, encryption type etc. You can use it to extract JavaScript too.

Sometimes you might want to create a PNG or JPEG screenshot of a document. You can do it with pdftocairo from Poppler, or use ImageMagick’s convert. At the time of writing, there are no native PHP libraries to render a PDF.

Wrappers

For pdftk, check out this library: mikehaertl/php-pdftk.

PDFBox CLI can be accessed via schmengler/PdfBox.

Imagemagick and Ghostscript are the basis for spatie/pdf-to-image wrapper.

Poppler has several PHP wrapper libraries:

spatie/pdf-to-text only allows to extract text from a PDF. It requires an input PDF to exist in the file system. The library does not wrap additional input arguments, so you have to specify them manually.
ncjoes/poppler-php: a library supposed to wrap all poppler-utils, but at the moment pdftotext is still unsupported. Also, this library is not very convenient as it forces you to choose an output directory for a file (it does not return processed data as string).

In fact, these two libraries are wrappers to a wrapper, since poppler-utils are just a collection of CLI wrappers for the Poppler C++ library 😉

Which to pick? Native or CLI?

There are a couple of basic considerations.

Native PHP libraries should work independently from the host environment. They are a lot easier to set up and update. The only depedency tool you use is Composer.

CLI tools, especially these written in C/C++, might be faster and use less memory. However I don’t have strict evidence at the moment. Maybe all the optimizations that came with PHP 7 will make this point obsolete. Also, I believe that C/C++ tools have a wider audience and thus might receive more community support.

You should pick a tool that’s best for your specific requirements. Most tools will do a decent job while simply rendering an unencrypted PDF to an image or some plain text. But if you need to have more control on the output file structure or you want to process encrypted documents, poppler-utils will be a good choice.

Sometimes it occurs to me that many developers are just reinventing the wheel, especially when it comes to a multitude of PDF processing libraries for PHP. The Portable Document Format has almost seven hundred pages of specification. We are all struggling with the same processing issues. That’s why I rather prefer to choose the best tools in different technologies and connect them with interfaces rather than doggedly sticking to a single technology.

Check out the List of PDF software at Wikipedia.

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

I moved my WordPress blog to Jekyll. Here’s why and how

2021-02-26T19:00:00+00:00

I remember the times around 2000 when most websites were static. We edited them locally on our computers and then uploaded to an FTP server. There was plenty of free hosting services. Building your own site was very easy.

Then things became complicated. We were fascinated by the possibilities of PHP - a dynamic script interpreter, coupled with a MySQL database system. Of course such setup requires more server resources and page serving time is longer, but we were so thrilled we didn’t care.

Today I know it was stupid to run the whole PHP ecosystem only to join some text files together. Until February 2021 my blog was still built with WordPress. If I was clever enough to design and use a Static Site Generator in 2000…

Why use a Static Site Generator?

No need for a backend

I no longer need a full LAMP stack to host my blog. This allowed me to ditch my Lightsail instance which costed me several bucks a month. All pages are generated upfront and then served as static files.

I can code myself

WordPress and other Content Management Systems are good for people who can’t code. They log into the admin section and write posts with a WYSIWYG editor.

I found the WordPress editor bad for technical writing. This is where Markdown comes into play. I get syntax highlighting for my code snippets out of the box.

Free hosting

It’s a lot easier to find a free static site hosting than a server with a fully functional backend. I’m using GitHub Pages which integrates Jekyll, so all I need to do is to push changes into the repository and the site is regenerated automatically. No additional CI/CD pipelines are needed.

Also, GitHub Pages allows you to use your custom domain and automatically provides a TLS certificate for that domain from Let’s Encrypt. Refer to GitHub documentation to learn more.

Page loading speed

After moving to a static site, I noticed an increase in PageSpeed Insights from 70-80% to 100%. The site loads faster because there is no backend and because I have full control over all the stylesheets, scripts and images that are loaded.

No cookies

Using a full backend usually comes with some cookies, for example for user session handling. I also had a Google Analytics script attached.

However, with cookies you are obliged to add that annoying and distracting warning for the EU. Moreover, a worldwide discussion about privacy is getting bigger every year. Some people stand against the so-called “surveillance capitalism”, where by using “free” products we ourselves become a product.

I moved my site statistics to Plausible.io. It turns out you can do decent analytics without cookies.

Building with Jekyll

The installation process and setting up a basic site with Jekyll is well documented. I will guide you step by step through how I adjusted the default setup to my needs. You are also invited to browse my blog repository.

Note that in order to host your site on GitHub Pages, the repository name must follow the format: username.github.io.

The initial commit consisted of just the default files and filling _config.yml with basic details:

name: Piotr Horzycki
title: Piotr Horzycki - Java and PHP developer's blog
email:
description: >-
  Software engineer since 2008. Experienced with complex systems for payments, media, advertising and education.
  Been a scrum master and a team leader. I love fintech, data processing and SQL optimization. Sometimes I talk at meetups.
url: "https://peterdev.pl"
twitter_username: peterdevpl
github_username:  peterdevpl
theme: minima

After typing jekyll serve in the console, I was able to view my site at http://localhost:4000.

Writing posts

I spent a couple of days reformatting over 30 posts using Markdown. I store all of them in the _posts directory. Names follow one pattern: YYYY-MM-DD-slugged-post-title.markdown. All files start with a front matter which contains metadata, for example:

layout: post
title: "Picking a PHP tool to generate PDFs (2021 update)"
date: 2019-01-11 17:00:00 +0100
last_modified_at: 2021-01-10 17:00:00 +0100
description: "Comparison of HTML to PDF conversion tools: mPDF, TCPDF, Dompdf, wkhtmltopdf and Headless Chrome."
excerpt: I spent a lot of time working with different tools to generate PDF files, mainly invoices and reports. Some of these documents were really sophisticated, including multi-page tables, colorful charts, headers and footers. I tried generating documents by hand and converting HTML to PDF, or even LaTeX to PDF.
image: /assets/generating_pdf_files.jpg
permalink: /2019/01/11/picking-a-php-tool-to-generate-pdfs/
tags: pdf php

I don’t always use all metadata, but in the example above I really need to tell people that the article has been updated. I also specify a short SEO description, excerpt to be published on the home page and an illustrative image.

During migration, I took great care to preserve all existing links and thus avoid trouble with redirecting pages indexed by search engines. For a permalink like above, Jekyll automatically generates the whole directory structure.

Read more about the jekyll-seo-tag plugin to know how to take care of your metadata and SEO.

Changing permalinks and making redirects

By default, WordPress creates links that include the post date. I decided I no longer want to have the full date in my URLs. But I can’t just change all the links on my blog all of a sudden. This would destroy my presence in the search engines, social media and people’s bookmarks.

The jekyll-redirect-from plugin automatically creates redirects for me. All I need to after installing and enabling the plugin is a small change in the post’s front matter:

permalink: /picking-a-php-tool-to-generate-pdfs/
redirect_from: /2019/01/11/picking-a-php-tool-to-generate-pdfs/

Jekyll will compile the post to the new directory without a date prefix. In the old path, Jekyll creates a small HTML file which redirects the browser to the new link. That way, everyone having the old link can smoothly jump to the new version.

Customizing post layout

By default, Jekyll provides a theme called Minima. You might customize it by either adjusting some configuration options, or copy-pasting the layout files into your blog project. Either way you have to inspect the Minima’s source code to check the file and variable names. Be sure to browse the correct version of the repository.

I created my own _layouts/post.html, so it includes additional metadata I’ve been using in my articles:

---
layout: default
---
 class="post h-entry" itemscope itemtype="http://schema.org/BlogPosting">
   class="post-header">
    {%- if page.image -%}
       src="{{ page.image }}" width="740" height="340" alt="Featured illustrative image" class="featured-image">
    {%- endif -%}
     class="post-title p-name" itemprop="name headline">{{ page.title | escape }}
     class="post-meta">
      {%- if page.last_modified_at -%}
          Last updated  class="dt-modified" datetime="{{ page.last_modified_at | date_to_xmlschema }}" itemprop="dateModified">{%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}{{ page.last_modified_at | date: date_format }}, first published
           class="dt-published" datetime="{{ page.date | date_to_xmlschema }}" itemprop="datePublished">
            {%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}
            {{ page.date | date: date_format }}
          
        {%- else -%}
           class="dt-published" datetime="{{ page.date | date_to_xmlschema }}" itemprop="datePublished">
          {%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}
          {{ page.date | date: date_format }}
          
      {%- endif -%}
    
  

   class="post-content e-content" itemprop="articleBody">
    {{ content }}

The layout files are parsed by the Liquid template engine. Refer to its documentation to know the syntax.

Customizing header and footer

I simplified both the header and footer by creating my own _includes/header.html and _includes/footer.html files. I initially copied the original Minima code and then adjusted it. I invited people to browse the blog’s repository.

Customizing home page

I wanted to remove the “Posts” header, show post excerpts and more metadata. First, I added show_excerpts: true to my .config.yml. Then I copy-pasted Minima _layouts/home.html and adjusted the posts list:

{%- for post in site.posts -%}

  {%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}
   class="post-meta">
    {%- if post.last_modified_at -%}
      {{ post.date | date: date_format }}, last updated {{ post.last_modified_at | date: date_format }}
    {%- else -%}
      {{ post.date | date: date_format }}
    {%- endif -%}
  
  
     class="post-link" href="{{ post.url | relative_url }}">
      {{ post.title | escape }}
    
  
  {%- if site.show_excerpts -%}
    {{ post.excerpt }}
  {%- endif -%}

{%- endfor -%}

The default theme, Minima, will embed links to your social media. You just need to specify account names in .config.yml. I already did this for Twitter and GitHub, but Minima also accepts other, including LinkedIn and Facebook:

twitter_username: peterdevpl
github_username:  peterdevpl
linkedin_username: piotr-horzycki
facebook_username: piotr.horzycki

The social media icons are rendered inside _includes/social.html file which you can override if you really need to.

Grouping posts by tags

In my WordPress instance I had all the tags under /tag/something URLs. In Jekyll, I can recreate that structure with jekyll-archives plugin. First I install it in the command line by typing gem install jekyll-archives. Then I go to .config.yml and set the plugin up:

plugins:
  - jekyll-archives

jekyll-archives:
  enabled:
    - tags
  layouts:
    tag: tag
  permalinks:
    tag: '/tag/:name/'

In the configuration we mentioned a layout called tag. It will contain the markup needed to list all posts for a given tag. Let’s create a file _layouts/tag.html:

---
layout: default
---

Tag: {{ page.title }}

 class="main">
  
    {% for post in page.posts %}
       href="{{ post.url }}">{{ post.title }}

    {% endfor %}
  

The layout inherits from a default one, so there’s no need to attach all the surrounding markup. …

Now we need to add the list of tags to the post layout. I added the following code to the meta paragraph:

{%- if page.tags -%}
  •
  {% for tag in page.tags %}
    {% assign tag_slug = tag | slugify: "raw" %}
     href="/tag/{{ tag_slug }}/">#{{ tag }}
  {% endfor %}
{%- endif -%}

The hash above just adds a hash sign, it’s not a part of the syntax.

The jekyll-archives plugin can also list your posts by months. See the full guide

Setting ATOM feed

I have my blog aggregated in some lists, so I need to maintain either an RSS or ATOM feed. WordPress did this automatically under the path /feed/. Jekyll by default creates a feed.xml file in the root directory. I wanted to keep my old feed URL, so I did this in configuration:

feed:
  path: /feed/index.xml

More feed options

Extending the style sheets

I needed some extra CSS rules for image figures and to change some colors. Minima uses SASS to write and compile style sheets. Let’s start from _config.yml:

sass:
  sass_dir: _sass
  style: compressed

Then I copy-pasted _sass/minima.scss and linked my two new files:

$very-light-grey: #F8F8F8;

@import
  "minima/base",
  "minima/layout",
  "minima/syntax-highlighting",
  "figures",
  "layout"
;

The first file, figures.scss, contains some image-related rules:

figure {
  text-align: center;
}

figcaption {
  color: $grey-color;
}

.featured-image {
  height: auto;
  margin-bottom: 1ex;
}

The layout.scss provides just some eye-candy:

.site-header, .site-footer {
   background: $very-light-grey;

   .built {
      color: $grey-color-dark;
      font-size: 90%;
      margin-bottom: 0;
   }
}

Jekyll automatically compiles the style sheets, so I don’t need any other tools.

Using a custom domain

My blog has been working under https://peterdev.pl, so I wanted to keep it that way. I already had around 200 visitors from search engines every day.

GitHub Pages allows setting a custom domain. The official guide is a bit messy, so I had to do some more digging and experiments. I decided to use only the apex domain (peterdev.pl), as the www. subdomain didn’t work. I logged in to my domain registrar and set my A and CNAME records like this:

A	peterdev.pl.	185.199.111.153
A	peterdev.pl.	185.199.110.153
A	peterdev.pl.	185.199.109.153
A	peterdev.pl.	185.199.108.153
CNAME	www.peterdev.pl.	peterdevpl.github.io.

I also had to open my GitHub blog repository, go to the Settings page, set the Custom domain to peterdev.pl and tick Enforce HTTPS. As the DNS changes might take several hours to propagate, GitHub will initially complain about bad DNS configuration. You have to wait until GitHub gets your new DNS records and generates the TLS certificate. Then your blog should work under your domain and with HTTPS enforced.

I wish I have done this earlier

Keep it simple! Don’t use over-engineered solutions for simple problems. I wish I had my blog optimized as a static site from day one and haven’t paid for additional hosting.

Jekyll has many alternatives, and so does GitHub. For me, this combo works perfectly fine, but you’re free to discover other options.

You can also browse the entire source repository for my blog.

All you need to know about Java’s BigDecimal

2021-02-11T16:00:00+00:00

Popular programming languages do not natively support decimal numbers. This is because CPUs operate on binary numbers. Even though there is a new IEEE standard for decimal floating point types, CPUs still don’t support it fully. So every time we see a notation like 0.1 in the code, it’s not what it seems. Our calculations might be inaccurate.

Most modern languages have dedicated libraries to handle decimals. Internally, they use either a long integer type or a string to store the number. They implement their own arithmetic engines. In Java, there is a BigDecimal class.

The safest way to create a new number is to use a string as an input:

final BigDecimal number = new BigDecimal("123.45");

To save memory, special BigDecimal instances already exist: BigDecimal.ZERO, BigDecimal.ONE and BigDecimal.TEN. You should reuse them instead of creating your own.

It is not recommended to use the double type when creating a BigDecimal object. Even if we enter a value like 0.1, the actual representation equals to something around 0.10000000000000000555 which definitely does not look like a monetary amount or anything else that we would expect. This is because double is a base-2 scientific notation type. Try running this code to see it for yourself:

System.out.println(0.20 + 0.10);

The BigDecimal class offers several methods for basic operations like addition, subtraction, multiplication and division. Before we go into calculations, we need to talk more about the internals.

Precision and scale

BigDecimal uses two parameters to define the maximum number of digits it can hold and how many digits are behind the decimal point. The first one is called precision, and the other one is called scale.

It is very important that you understand what happens to these parameters because they affect rounding and the string representation.

If you use the simplest string constructor like in the examples above, precision is set to 0 (which means infinite length) and scale is set to the number of digits behind the decimal point. For 123.45, scale will be 2.

You can use the setScale() method to increase scale if you want to show the exact number of digits in a fraction, even if these will be zeros. The number 123.45 with a scale of 4 would be represented as 123.4500.

Scale can change when you add, subtract, multiply or divide numbers with fractions. This matters especially when you try to calculate taxes. For example, multiplying 123.45 times 1.23 gives us 151.8435, but this is not a proper monetary amount. You have to perform rounding using the second argument for setScale():

final BigDecimal net = new BigDecimal("123.45");
final BigDecimal tax = new BigDecimal("1.23");
final BigDecimal gross = net.multiply(tax).setScale(2, RoundingMode.HALF_UP);
System.out.println(gross);
// output is 151.84

Some numbers do not have a finite decimal representation, like 1/3. They cannot be stored as BigDecimal and rounding has to be applied. It’s your responsibility to specify target precision or scale, otherwise division might cause an exception:

final BigDecimal result = BigDecimal.ONE.divide(
    new BigDecimal("3"), 5, RoundingMode.HALF_EVEN);
System.out.println(result);
// output is 0.33333

Another operation that involves changing scale is removing trailing zeros. Sometimes, for example after several subtractions, you don’t want to leave zeros at the end. The stripTrailingZeros() method will return the same number without trailing zeros.

Rounding modes

In the previous example you’ve seen an example of rounding. The most popular option is called HALF_UP and it is commonly taught at school. You round up when the discarded fraction is greater than or equal to 0.5; you round down when the fraction is below 0.5. So for example, assuming a target scale of 2, the number 1.234 will be rounded to 1.23, and the number 1.235 will be rounded to 1.24.

However, different taxation laws might require different rounding modes, for example always rounding up. They are listed in an enum called RoundingMode.

Below are some trivias which can help you distinguish between different modes:

UP never decreases the magnitude of the calculated value.
DOWN never increases the magnitute of the calculated value.
HALF_UP is commonly taught at school.
FLOOR never increases the calculated value.
CEILING never decreases the calculated value.
HALF_EVEN is also known as “banker’s rounding” because it reduces error when performing multiple operations on rounded numbers. If the first digit outside scale is 5, we round to the nearest even number. Otherwise, standard rules apply.
UNNECESSARY is used to check if rounding was performed or not; if rounding would be necessary, an ArithmeticException is thrown.

Always consult the rounding mode and other assumptions with an accounting or taxation expert. It is their responsibility to make decisions according to the law, and your responsibility is only to write reliable software that implements these rules.

Understanding MathContext

The BigDecimal class uses rules defined by a MathContext to perform numerical operations. In most cases you won’t need to worry about it. However, we should get back to the example of dividing 1 by 3.

By default, BigDecimal numbers have “unlimited” precision. In fact, the maximum unscaled value is equal to 2^Integer.MAX_VALUE, according to the BigInteger documentation. This looks like more than enough to represent any finite number you need.

Nevertheless, we don’t want to run out of memory when doing a simple division of 1 by 3. Earlier, we just specified a desired scale and a rounding mode, but you should be also aware that you can control precision of such operation.

There are three MathContext objects that correspond to the IEEE 754R decimal formats. DECIMAL32, DECIMAL64 and DECIMAL128 allow a maximum number of 7, 16 and 34 digits, respectively. They all use the HALF_EVEN rounding mode. You can use these contexts to control division:

final BigDecimal result = BigDecimal.ONE.divide(
    new BigDecimal("3"), MathContext.DECIMAL32);
System.out.println(result);
// output is 0.3333333

Immutability

A very important concept of Java BigDecimal type is immutability. It means that once an object is instantiated, its state cannot be changed. The only way to obtain a modified object is to create a new instance:

final BigDecimal number1 = new BigDecimal("99");
number1.add(BigDecimal.ONE);
System.out.println(number1);
// number1 is still 99

This behavior prevents many bugs that could occur if we passed an object to other methods and they unexpectedly altered the object’s state.

String representation

A standard way to output a BigDecimal object on the screen is to just use the toString() method. There are two other methods though, and it’s worth to know them.

The difference is visible when we operate on numbers written using scientific notation, like 1.23E+3, which is equal to 1230. The toString() method will create a string in that notation, while toPlainString() will always return the full number. toEngineeringString() is a variation where the exponent is always a multiple of three (if an exponent is needed at all).

Input number	toString()	toEngineeringString()	toPlainString()
1.23E2	123	123	123
1.23E3	1.23E+3	1.23E+3	1230
1.23E4	1.23E+4	12.3E+3	12300

Just to remind, you can use stripTrailingZeros() to strip unnecessary zeros from fractions:

final BigDecimal numberWithZeros = new BigDecimal("1.000");
System.out.println(numberWithZeros);
// output is 1.000

final BigDecimal strippedNumber = numberWithZeros.stripTrailingZeros();
System.out.println(strippedNumber);
// output is 1

The only problem with all the examples above is that they don’t conform to language rules other than English. What if we want to make an international application?

Using locale for number formatting

Most programming languages assume English notation for numbers. They use a dot to separate decimal part from an integer part. When presenting a number to a user, we can optionally separate thousands with comma.

However, many languages and countries have different regulations. If our application is dedicated for international markets, localization is a very important matter we should take into account.

To make localization easier, a concept of a locale was introduced. A locale is a “set of parameters that defines the user’s language, region and any special variant preferences that the user wants to see in their user interface.” (Wikipedia)

A locale identifier combines language and country code. So for British English we have en_GB, American English is en_US, and Swiss German will be de_CH.

Let’s analyze how a sample number would be formatted using some of the world’s locales. We’ll pick twelve thousand three hundred forty five point sixty seven, which can be written as 12345.67 in the code:

Language	Country	Locale code	Formatted value
English	United States	en_US	12,345.67
Polish	Poland	pl_PL	12 345,67
Spanish	Spain	es_ES	12.345,67
Spanish	Mexico	es_MX	12,345.67

Notice the difference for Spanish language. In Spain, people use a dot to separate thousands and a comma as a decimal separator. In Mexico, it’s the other way around, just like in the U.S. It means that it’s not enough to localize your application for a specific language; the region is important too.

Formatting and parsing numbers with NumberFormat

An abstract class called NumberFormat has multiple getInstance()-like methods that we can use to create a localized number format, depending on our needs. As the only argument, we should specify a desired locale. If we skip this, the default system locale will be used.

final BigDecimal result = new BigDecimal("12345.67");
final NumberFormat numberFormat = NumberFormat.getInstance(Locale.forLanguageTag("en_US"));
System.out.println(numberFormat.format(result));
// output is 12,345.67

The number format can be further customized. For example, you can turn grouping off by calling numberFormat.setGroupingUsed(false).

You can also use NumberFormat.getPercentInstance() to create a percentage format. This way, a number like 0.51 will be presented as 51%. Such format is useful to print a tax rate.

Here’s an extended version of the code to calculate tax and gross values – typical data on every sales invoice:

final BigDecimal net = new BigDecimal("123.45");
final BigDecimal taxRate = new BigDecimal("0.23");
final BigDecimal tax = net.multiply(taxRate).setScale(2, RoundingMode.HALF_UP);
final BigDecimal gross = net.add(tax);

final NumberFormat numberFormat = NumberFormat.getCurrencyInstance(Locale.forLanguageTag("en_US"));
numberFormat.setCurrency(Currency.getInstance("USD"));
final NumberFormat percentFormat = NumberFormat.getPercentInstance(Locale.forLanguageTag("en_US"));

System.out.println("Net value:   " + numberFormat.format(net));
System.out.println("Tax value:   " + numberFormat.format(tax));
System.out.println("Tax rate:    " + percentFormat.format(taxRate));
System.out.println("Gross value: " + numberFormat.format(gross));

/* output is:
Net value:   USD 123.45
Tax value:   USD 28.39
Tax rate:    23%
Gross value: USD 151.84
 */

Wrapping up

Decimal calculations need extra care. Computers do not support decimal numbers natively, so we have to use dedicated libraries like BigDecimal.

Accuracy is especially important for monetary calculations. I recommend using the Java Money library as it also introduces handling currencies. However, knowing the BigDecimal class can still be useful.

PHP: How to take a screenshot of a PDF page

2021-01-28T16:00:00+00:00

If your application allows uploading PDF files, it’s likely that you need to prepare screenshots or thumbnails for these documents – at least the first page.

You can’t do this with a pure PHP setup. You’re going to need an external application to read PDF and save an image, like ImageMagick, GhostScript, Poppler or Inkscape. Before you start coding, check which one is installed on your server.

Sometimes you might need to check how different tools work for your documents. There can be slight differences in font rendering, handling alpha channel in images, speed and output file size.

For all cases we’re going to use the Symfony Process library to safely call external commands. Simply run composer require symfony/process in your project.

Using ImageMagick

ImageMagick has a handy tool called convert. Under the hood, it uses GhostScript to parse a PDF file. The simplest usage below extracts the first page of a PDF file and saves it as PNG:

use Symfony\Component\Process\Process;

$process = new Process([
    'convert',
    'input.pdf[0]',
    'output.png'
]);
$process->run();
if (!$process->isSuccessful()) {
    die('Error');
}

This code will extract the first page from input.pdf and save it as output.png. Note that pages in a document are zero-indexed. You can convert multiple pages if you wish.

The Process constructor takes an array of command-line arguments. The first element is always the command’s name or path to a program. I decided to put every argument in a separate line for clarity.

You might want to adjust some options for ImageMagick. For example, -alpha off and -background white will always set a solid white background even if the input document has a transparent background. With -density 200 you can increase the resolution:

use Symfony\Component\Process\Process;

$process = new Process([
    './convert',
    '-alpha', 'off',
    '-background', 'white',
    '-density', '200',
    'input.pdf[0]',
    'output.png'
]);
$process->run();

You can also create a JPEG thumbnail, for example 150 pixels wide with a quality set to 90%:

use Symfony\Component\Process\Process;

$process = new Process([
    'convert',
    '-alpha', 'off',
    '-background', 'white',
    '-resize', '150',
    '-quality', '90',
    'input.pdf[0]',
    'thumbnail.jpg'
]);
$process->run();

If you want to operate on variables and not on real files, you can use STDIN and STDOUT to deliver the PDF and receive an image. Enter a hyphen (-) instead of the input and output file names, then supply custom input and retrieve output from the process:

use Symfony\Component\Process\Process;

$process = new Process([
    'convert',
    '-alpha', 'off',
    '-background', 'white',
    '-[0]',
    'png:-'
]);
$process->setInput($pdf);
$process->run();

if (!$process->isSuccessful()) {
    echo $process->getErrorOutput();
    die('Error');
} else {
    $png = $process->getOutput();
}

More examples can be found on the ImageMagick site.

Using GhostScript

You might want to interact with Ghostscript directly, especially if for some reason ImageMagick is not installed or you need to fine-tune some rendering details:

use Symfony\Component\Process\Process;

$process = new Process([
    'gs',
    '-dFirstPage=1',    // process only 1st page
    '-dLastPage=1',
    '-dNOPAUSE',        // don't pause after processing a page
    '-dBATCH',          // don't run the interpreter
    '-r144',            // resolution: 144 pixels per inch
    '-q',               // surpress messages
    '-sDEVICE=png16m',  // 24-bit PNG without alpha channel
    '-sOutputFile=test.png',
    'input.pdf'
]);
$process->run();

Ghostscript is a Postscript interpreter. By default, it offers a console and stops after each page, so we’re using some options to change that behavior. We chose a 24-bit PNG here, but there are other formats available: pngalpha or jpeg for example. Run gs -h in console to see a full list of available formats (devices).

More Ghostscript command-line options

Using Poppler

There is a nice set of PDF tools called Poppler-Utils. One of them, pdftocairo, can convert a PDF to PNG, JPEG, TIFF, SVG or EPS. Usage is very simple:

use Symfony\Component\Process\Process;

$process = new Process([
    'pdftocairo',
    '-png',
    '-singlefile',
    'input.pdf',
    'output'
]);
$process->run();

See pdftocairo man page for more options.

Using Inkscape

Some people report Inkscape as the best application for exporting PDF files to bitmaps. This robust vector graphics editor can also be used in the command line:

use Symfony\Component\Process\Process;

$process = new Process([
    'inkscape',
    'input.pdf',
    '--export-dpi=600',
    '--export-area-page',
    '--export-background=#FFFFFF',
    '--export-type=png',
    '--export-filename=output.png'
]);
$process->run();

By default, the background is transparent, so I explicitly requested a white background. Also, instead of --export-area-page you might want to use --export-area-drawing to get only the contents and not a full page.

You can use the --pipe switch to make Inkscape read data from STDIN. If you omit the --export-filename option, the output will be sent to STDOUT.

Refer to the Inkscape man page for more options.

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

Secure generation of random IDs and passwords in Java

2021-01-10T16:00:00+00:00

The Apache Commons Lang library has a handy set of random string generators, enclosed inside the RandomStringUtils class. However, these are not cryptographically secure generators by default, which can trigger warnings in platforms like Veracode (for example CWE-331: Insufficient Entropy).

It’s even more important when you think what the random strings are used for. Most of the time these are some session, token or debugging identifiers, or even passwords. Such strings shouldn’t be predictable.

The default java.util.Random implementation is not cryptographically secure, and yet it is used by default in shorthand RandomStringUtils methods. However it is possible to pass a custom generator as the last argument, for example java.security.SecureRandom:

final SecureRandom random = new SecureRandom();
final String id = RandomStringUtils.random(10, 0, 0, true, true, null, random);
System.out.println(id);  // prints 10 random alphanumeric characters

A more sophisticated yet cleaner way might be to use RandomStringGenerator from Apache Commons Text together with SecureTextRandomProvider from Apache Syncope. Unfortunately, the latter class was removed in Syncope 2.1 and I couldn’t find any alternative.

Looks like Apache doesn’t like providing cryptographically secure random generators or even interfaces for them. The Apache Commons Random Numbers Generators documentation says: “The current design has made no provision for features generally needed for cryptography applications (e.g. strong unpredictability).”

One more library worth checking out is Passay. Its primary responsibility is to maintain a company’s password policy, and the library can be also used to generate random passwords according to the company rules. Of course you can provide SecureRandom as the source of randomness:

final SecureRandom random = new SecureRandom();
final PasswordGenerator generator = new PasswordGenerator(random);
final CharacterRule alphabet = new CharacterRule(EnglishCharacterData.Alphabetical);
final CharacterRule digits = new CharacterRule(EnglishCharacterData.Digit);
final String id = generator.generatePassword(10, alphabet, digits);
System.out.println(id);

This was just a basic example; Passay accepts more complex rules, for example a minimum number of letters, digits or special characters in a password.

Final thoughts

Use SecureRandom whenever you need to generate a random string.

Never use regular expressions to validate passwords against the company policy. Just don’t. Or you will end up with monsters like this (true story):

String PASSWORD_REGEX = "^[A-Za-z0-9!@#$%^&*()\-_=+:;'\"<>,.\\]{8,}$";

Piotr Horzycki - Java and PHP developer’s blog

8 Programming Myths That Impede Your Career

We must have Scrum

100% code coverage

Rewrite is necessary

Sophisticated architecture is cool

Must have the latest version

Must follow all the trends

We don’t need meetings

Business doesn’t understand us

How to set a font in a PDF document

Some theory about fonts and text

Picking a proper font

Font types supported by PDF

Selecting a font in CSS

Providing a font to wkhtmltopdf

Providing a font to Dompdf

Setting a custom font in mPDF

Custom fonts in TCPDF

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

How to encrypt a PDF document in PHP

Types of encryption

User permissions

Encryption with TCPDF

Hello world

Encryption with mPDF

Hello world

Encryption with Dompdf

Encryption with FPDF

Encrypting an existing file with command line tools

Summary

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

Executing shell commands from a PHP script

Input, output and exit codes

Basic execution from a PHP script

Escaping arguments

Opening and controlling a process

The most convenient solution: The Process Component

Asynchronous and background processes

Timeouts

Reporting progress of time-consuming tasks

Queueing tasks

Wrapping up

Too much REST will harm you: don’t blindly follow it!

A verb and a resource: the essence of REST?

Avoid exposing sensitive data in URLs!

Pentesters are going to report this

How to protect your system?

“Form follows function”

Picking a PHP tool to read and manipulate PDF files

Native PHP libraries

pdfparser

tc-lib-pdf-parser

FPDI

Command-line tools

Wrappers

Which to pick? Native or CLI?

See also

My book “Mastering PDF with PHP” is out now on Leanpub!

Learn how to create, read and edit PDF files in your PHP applications!

I moved my WordPress blog to Jekyll. Here’s why and how

Why use a Static Site Generator?

No need for a backend

I can code myself

Free hosting

Page loading speed

No cookies

Building with Jekyll

Writing posts

Changing permalinks and making redirects

Customizing post layout

Customizing header and footer

Customizing home page

class="post-link" href="{{ post.url | relative_url }}"> {{ post.title | escape }}

Linking social media accounts

Grouping posts by tags

Tag: {{ page.title }}

Setting ATOM feed