Do You Really Need Asynchronous Processing in Your Projects? A Few Observations on Spring and Non-Blocking APIs for Beginners
Hello everyone! I work on my own project at Maxilect. It’s a high-load AdTech platform that includes an Ad Exchange server and related components. However, this article isn’t exactly about the project itself. Instead, I’d like to discuss asynchronous processing in tasks of this scale, using examples from my project as convenient reference points.
I won’t go into detail about what asynchronous processing is — the concept is old, and there’s an overwhelming amount of information available online. Instead, I’ll share some observations that might give you food for thought when considering whether you should introduce it into your own projects.
The article is based on issues we discussed during an internal technical meetup.
Let’s assume we have a data stream. Observers subscribe to this stream, can read data from it, and react to it — processing the data in some way or generating their own data. In the classic (synchronous) approach, a thread makes a request to an external system and waits for a response, blocking until the response arrives. In the asynchronous approach, it continues performing other tasks. Once a response comes back from the external system, the thread resumes execution and processes it.
Asynchronous processing is primarily needed in a microservices architecture, where a project involves many interactions between small system components via APIs. In such an architecture, services spend most of their time waiting for responses from their neighbors. To use threads more efficiently, asynchronous calls and reactive programming were introduced.
For implementing the asynchronous approach in our project, we use Spring. Internally, it is based on the excellent Reactive Streams library, which follows the specification of the same name. This library offers a wide range of features and has great documentation. Spring WebFlux, a framework for building APIs, is fully based on Reactive Streams.
The reactive programming paradigm looks elegant — no one is waiting for anyone. The code itself is written in a declarative rather than an imperative style — you describe how you want to process data streams, resulting in a clean and expressive implementation. However, in terms of performance, I wouldn’t say everything is entirely straightforward. Below, I’ll share a few examples we encountered in our project.
Example 1
We have a high-load service — a REST API that processes incoming HTTP requests. It, in turn, makes multiple requests to external systems, meaning it spends a lot of time waiting. At first glance, this seems like the perfect use case for a reactive approach.
Initially, the service was implemented in a blocking manner. A thread would send a request to an external service and wait for a response. As a result, the service had a massive number of threads — around 2000 — most of which were simply waiting. However, at any given moment, only about a hundred were actually performing useful work.
The service operated in this mode for several years.
At some point, I started optimizing it and decided to refactor it into a non-blocking style to eliminate idle waiting. In theory, threads are not free — they consume memory and introduce context-switching overhead. So, I expected this change to speed up the service.
As a result, the number of threads dropped from 2000 to 500, and none of them were idle — each one was actively doing something. It seemed like the system should now consume fewer resources. However, in reality, 2000 threads weren’t a significant load for the server to begin with. We noticed no difference in CPU load between 500 and 2000 threads. The only tangible outcome was the satisfaction of applying a trendy concept.
One might think that reducing the number of threads would improve scalability since we were no longer running out of them. But in practice, each external request consumes a file descriptor, and those are limited — we cannot accumulate an infinite number of outgoing requests.
In our case, file descriptors are exhausting if our own service slowed down for any reason. The incoming request load remained unchanged, and we ended up with up to 100,000 open file descriptors. To prevent this from happening, we had to introduce a semaphore-based limit on the number of concurrent external requests.
Example 2
We conducted our second experiment on an even more high-load service. Unlike the previous case, this service does not make external calls and was initially implemented in a non-blocking style, using a non-blocking Tomcat API via Spring. Essentially, a non-blocking approach was applied where it wasn’t needed.
The service’s load was so high that even micro-optimizations — those that usually have no noticeable effect — mattered. When I ran out of ideas for further improvements, I decided to rewrite it in a blocking mode using a thread pool. As a result, the service started running 15% faster. In other words, using Spring and non-blocking Tomcat introduced a performance overhead.
In most cases — when handling only hundreds of requests per second — this overhead is negligible. However, when dealing with tens of thousands of requests per second, an asynchronous implementation consumes additional resources.
Key Takeaways
Our experiments suggest that an asynchronous approach may be beneficial when a system makes many external calls and when you prefer the reactive programming paradigm. However, it can also introduce additional performance overhead.
Disclaimer: We might have achieved different results if we had used a different tech stack instead of Spring and Tomcat. But for this project, with this particular setup, these were our findings.