Why I didn’t end up being a database engineer

Once upon a time, looking for new career opportunities, I considered becoming a database engineer or architect. I was pretty successful in optimizing MySQL databases growing up to 65 GB. I liked it. However I noticed almost no one was hiring MySQL database engineers – it’s too simple. Real challenges await those familiar with Oracle and big finance systems, aviation etc. So, I bought a thick book about Oracle and… never used it.

The only profit I got from buying an Oracle guide was another book I was offered in package: “SQL Antipatterns: Avoiding the Pitfalls of Database Programming” by Bill Karwin. It provided me a lot of valuable tips which I still eagerly keep on sharing with my coworkers.

Is database the center of every app?

Anyway, I used to believe that a database is in the center of every application. Most of them process data, right? So I started all projects from conceiving a database model: relations, data types, queries, and so on. I spent a lot of time optimizing data structures.

My approach had several pros. First of all, it was intuitive for other developers because a good database model made them immediately understand the purpose of the system and navigate easily through the rising codebase. Secondly, a good model helped datasets grow significantly (like hundreds of millions of records) without performance loss. Moreover, with correct foreign keys and other constraints, data remained consistent.

I have a funny story about a team of developers which refrained from using foreign key constraints. They used ON DELETE CASCADE clause everywhere and they accidentally wiped away half of the test database. It was such a shock for them they removed all the foreign keys. Soon, data became terribly inconsistent and a lot of bugs surfaced in the apps depending on that database.

Carving a SQL giant

Optimizing databases was cool, but there was one bad practice I really disliked looking at some DBAs work. After I read about SOLID principles, Domain-Driven Design and other stuff, I started to hate SQL stored procedures, especially when they were too long.

I saw some MS SQL devs trying to handle all the business logic with SQL scripts. These systems already had many bugs, and my fellow devs just kept on adding more CASE…IF blocks after requests from the business people. After some time, it was nearly impossible to add or change anything without breaking stuff.

Such situations made me realize that you have to choose the right tools for the job, not the other way around. Of course data is a huge asset for every company, but behavior is important too. Is SQL a right tool to model complex business behaviors?

Getting to know the big picture

For me, data has never been the biggest challenge. It is always the sophisticated business logic that needs to be modelled very carefully. It evolves over time, and we have to make sure all developers (especially the new ones) dig it quickly and flawlessly.

I decided to improve my process modelling skills with OOP and FP, know SOLID, DDD and other widely adopted principles or design patterns. I decided to share business knowledge with all my teammates to make sure they understand the purpose of their work and nature of the surrounding business. Of course SQL knowledge is valuable too, but it’s not enough.

How a legacy code is made

“The (…) problem with legacy assets is how difficult and risky they are to invoke. They were developed with a particular feature-set in mind, and are highly coupled to those features. This makes it hard to access what you want directly without going through layers designed for very different purposes. The data you want is nearly always intertwined with other data and functionality, so that it is hard to detach just the part you want without dragging along an endless tangle of interactions and associations.”

Eric Evans, “Getting started with DDD when surrounded by legacy systems“, 2013

I received an apparently easy task to do. A head of sales wanted to receive a list of all products sold in one of the departments. She expected a table with just three columns: product name, default rebate name and rebated price. Almost every product had some rebate attached to attract clients.

This task should take maybe 15 minutes. Or maybe I should not even do it at all. Salespeople should have a reporting feature for that. However, it turned out that a mechanism to fetch products with prices was strictly tangled with the way these items were presented in the shop. I couldn’t retrieve all the products at once because I was forced to do it page by page (maximum 50 products per page). The rebate system worked only for a currently logged in user and everyone could have different rebates based on the account settings. I could not easily simulate rebates on another accounts. Eventually, preparing a rather simple products summary took three hours of writing a very dirty code…

The shop was custom-made to meet strict specifications oriented towards an ordinary web browser. From an end user’s standpoint, a shop should have a paginated list of products, a shopping cart, rebates info etc. Just some standard e-shop features. No one thought about additional use cases that might show up some day – like salespeople asking for summaries and reports. No one thought that some day we would need to share data through an API endpoint. Deadlines were coming after us. All developers just wanted to get it done according to specs. A legacy, proprietary PHP framework did not make it any easier.

That way, we created lots of code which became “legacy” and inefficient just after half a year past deployment.

Separating code responsibilities into layers

To handle such different requirements, a multi-tier/multi-layer and hexagonal architectures were invented. Sometimes, simple CRUDs shown in framework’s documentation are not enough; complex applications need a clear separation between a business/domain logic, infrastructure and presentation. We create a network of loosely coupled modules which, thanks to interfaces and dependency injection, can be easily switched and replaced. This approach has a lot of pros:

  • Every layer can be, to some extent, developed independently. Because of abstraction, the business logic does not depend on a particular way of presentation, page rendering, data storage etc.
  • Every layer can have a separate set of automatic tests.
  • While editing the code of one layer, we are not distracted by the code of other layers.
  • The same data can be presented in multiple ways. We just attach a different presentation layer implementation (HTML, PDF, JSON API, CLI…).
  • Business rules are clear. We don’t need to look for them in strange places, like templates.

“Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.”

Alistair Cockburn, “Hexagonal architecture

In practice, using a multi-layered, hexagonal architecture with loose coupling needs a lot of experience. I discovered that it’s really hard to convince a team to talk about abstract models. People tend to ask about the details very early, like database, graphics – and there’s nothing wrong about it. People need to imagine the final effect, a real use case – it makes them easier to understand the project.

During project discussions, I suggest not to avoid speaking about implementation. However, we should strive for a transparent and fluent code. It’s worth negotiating some additional time with the client, so that we can create a good project that will save him maintenance costs in the future.