Basics of Backward Compatibility

9 min readDec 28, 2022

Recently we discussed the issues associated with systems backward compatibility. On the one hand, backward compatibility support complicates the projects greatly. Components interaction becomes multi-stage. On the other hand, this is the only way to ensure the smooth operation of services.

In this article, we want to talk about backward compatibility for those who have yet to encounter it from an architectural point of view. We will introduce basic concepts and talk about the pros and cons.

The article was prepared based on the results of the speech by Andrey Burov (Maxilect) at the internal meetup.

By definition, backward compatibility is the ability of software to work with older versions of a system. As users, we encounter backward compatibility issues when we roll up an Android update and some software stops working.

Let’s start with the simplest example — how to avoid stepping on a rake when updating so that users do not encounter system stop.

Suppose we have a classic architecture: a database, a backend that works with it, and a frontend. Suppose the frontend works with the backend using some API v1. Then, a new release adds some features that cause the API to change a lot — it becomes API v2 (assume that API v1 is very different from API v2).

According to the classic misconception, it is enough for us to update the backend and frontend simultaneously and forget about backward compatibility. The front works according to API v2, and the backend also. Why complicate?

But not everything is so simple.

Let’s start with the question, what does simultaneous updating mean?

Front and back can be deployed separately. Different teams are often responsible for back and front. Even the submission route to production is different (for example, through Docker or Nginx). Even if the update process started simultaneously, one of them finishes first. Between the first and second updates, there will be a specific time interval during which the front and back APIs will be incompatible, and users will experience errors when they try to perform any actions with the service.

The problem will also occur for users with their copy of the front in the browser. Let’s say it’s a SPA that interacts with the backend via the REST API. For example, the user has loaded the page and works with it calmly by filling out a form. By updating the service on the servers, we deprive the user of the opportunity to submit the entered data — the old front will send a request to the back via API v1, and everything will fail with an error. The user must reload the page and fill in the data again.

You can update the backend in two steps to prevent this from happening.

Let’s divide the update into two parts

An intermediate release of the backend that supports both versions of the API — v1 and v2 at once will help avoid the above mentioned situation. When we roll out the new back version, the front continues to work as before, supporting API v1. At the next stage, we update the front — we switch to API v2 and do not get any specific effects.

This scheme has an additional advantage. If, for some reason, we roll up the backend and the front update is delayed (because of a critical error), we do not need to roll back anything. A new version of the front can be rolled out next week — users will be fine with this.

After a successful front update, you can remove support for the old API in one of the subsequent versions of the backend.

This example is about turning the front to the backend, but the situation can be projected onto any interaction of systems. The same ideas can be used when updating the API for the interaction of two independent services. It is convenient if one of the services will support both versions of the API part of the time (in this case, it will not break those who call it).

Let’s return to the update issues. We reduced the downtime due to non-simultaneous updating of the backend and frontend but did not completely get rid of it. When we roll out a new backend version, it does not accept requests for a certain period. This time can be large or small, but it is not zero. Purely theoretically, even a tested new version may fail at the start. How to be in a situation when we cannot allow such downtime?

Now we will divide the backend into two parts

To reduce downtime, raise the second backend and update it in stages. First, extinguish and update backend 1; when it works normally, do the same with backend 2.

Database issues

Let’s complicate the problem.

Let’s say we don’t want to update the interaction API but rather rename a column in the database. There was a typo in the column title that needs to be corrected.

How will a “spherical developer in a vacuum” do this?

It will add the column rename to its backend update. Migrate backend one, thereby renaming the column and “failing” backend two, which should process user requests (backend two does not know anything about renaming and still works with the old column name). Users will receive an error until we update backend two. This may take time to happen, as in the previous examples.

Here it is logical to apply the same logic of splitting the update into two stages. In the first stage, we must add a new column with the correct name to the database to work in parallel with the old one. It will be possible to delete the old column in the next release.

Other changes in BD fields are made similarly.

Adding new features without breaking anything

It’s easy to add new features that look like this:

Let’s consider the system you like and know well. You have been working on it for more than one year, you know all its features and pitfalls, and you can guess what needs to be added and where. If you need to add a new feature, there are no problems — you just add it, and the story ends there.

But such ideal situations rarely happen, except perhaps on pet projects or recent developments that have yet to be sold. If a dozen or more developers have been working on the project for several years, for a new person who just came and did not have time to figure everything out, the system will look something like this (even if the architecture is thought out and the code is written correctly) :

Before adding a new feature, you need to sort it out in your head and only then plan what and where to write. But it is challenging to study a large project if you were not from the start. You can’t do this in a reasonable time (and you certainly won’t be able to do it if you have 1–2 days to implement the feature). Therefore, in most cases, when a developer hired on an existing project is offered to implement some new functionality, he attaches it to the side so as not to break anything again. He implements it separately, attaching some “thread” to the project.

Why a feature must be “disabable”

From the point of view of stability, it is better if the new functionality is disabled by default. It must be controlled through some documented lever (feature flag).

These controllable features are pretty natural. Features are introduced into the code and get into the release. Who and when will turn them on is unknown. It can happen in a month or two. This is logical since the release can have much more critical things, such as fixing important bugs. If it falls due to the inclusion of secondary functionality, this is wrong because:

Stable production is much more valuable than even the coolest new feature. That is why, when adding new functionality, the main emphasis is on not breaking the existing one.

Feature flags are generally a helpful tool. They can be used to test new functionality. You can configure access rights so that only some users can enable the new functionality for themselves. When they ensure that the functionality works correctly, the feature can be enabled for everyone.

However, when a project has a lot of disabled functionality, it can be confusing, especially if the developers of these features do not take care of proper documentation. It’s sometimes unclear from the start which feature flags need to be enabled after the system reboot. When studying someone else’s code, it is not always clear whether it is currently participating in the work or is disabled by another flag because something did not work. There are also obsolete features that are disabled, but they are not cut out of the code for some reason.

All this code complicates the understanding of the project by a new person, which means it only aggravates the situation described above with the addition of new functionality.

Feature flags are an element that must be monitored separately. After the functionality has been rolled out to production and a decision has been made not to remove it, the corresponding feature flag should be cut out of the project and not left as a dead weight in the code. When deprecating functionality, it must also be removed rather than disabled as a temporary measure.

Conclusion

Backward compatibility issue is not only about the code but also about the project’s infrastructure. In other words, the developer who writes the code for sale must understand how it all works. Otherwise, the developer will be unable to maintain backward compatibility with updates correctly. Communication between backend developers and DevOps is critical in the described examples of updating the API and fixing the database. If it is violated, developers do not have the opportunity to write the correct code.

However, backward compatibility is not always supported. Sometimes the cost of its implementation is much higher than the losses due to user issues during the update. Therefore, the answer about the expediency of backward compatibility should be expected from the business. It will depend not only on global views on customers and the product’s value but also on general issues, like the updating speed. The scale of the project also plays a role. The larger project is, the more backward compatibility it needs. In the extreme case of some “bloody enterprise,” it is not known in advance what the change in the API or data scheme can break — then we will talk not only about the user’s comfort but also about the system’s stability.

In critical, for example, financial systems, the issue of backward compatibility is unambiguous — it is necessary. It is usual for any change in the database to require four or more releases (for example, create a new column, start reading from it, stop reading from the old one and delete the old column). For a startup, this is simply an unbearable amount of work — the company will not be able to quickly enter the market if so many transformations are required for each change.

A similar situation is with the flags for controlling the new features. Controlling is always a complication of an already complex project. On the opposite side of the scale is the stability of the entire system. Whether the customer is ready to pay for stability is an individual question and depends on the project’s specifics. The larger the project, the more this readiness is. In large financial projects, it is normal for the team to make presentations about what feature flags are in the code and how to handle them. A startup most likely simply does not need it.

PS. Subscribe to our social networks: Twitter, Telegram, FB to learn about our publications and Maxilect news.