AWS Engineer Suggests Replacing Nagle with TCP_NODELAY to Fix Latency Issues

Latency in Linux

Few days ago, Marc Brooker, an AWS engineer, criticized and addressed the problems with which you have always had to deal with the issue of efficiency in the transfer of small messages when using the Nagle algorithm.

And is that mentions that today in distributed systems, latency is a critical factor which can significantly impact performance and user experience and one of the key solutions is setting parameters like TCP_NODELAY.

Marc mentions that although «latency issues that were fixed quickly to enable this simple socket option » TCP_NODELAY, This is not enabled, since the default option in the TCP/IP stack is the Nagle algorithm.

The algorithm of Nagle, formulated in RFC896 by John Nagle in 1984, aimed to address efficiency in data transmission in TCP networks. This allows small messages to be added to reduce traffic, suspending the sending of new TCP segments until confirmation of receipt of the previously sent data is received. For example, without applying aggregation, sending 1 byte sends an additional 40 bytes with TCP and IP packet headers. Under modern conditions, the use of the Nagle algorithm causes a noticeable increase in delays, which is unacceptable for interactive and distributed applications.

One of the complexities that arose with the implementation of Nagle's algorithm was his interaction with delayed ACKs. While Nagle sought to optimize the size of packets sent, delayed ACKs delayed sending confirmations of received packets. This combination can generate latency problems in applications sensitive to this factor.

In addition to that, Marc has cited three main reasons for using TCP_NODELAY as default, instead of Nagle, which suggests being disabled:

  1. Incompatibility with “Delayed ACK” optimization: Nagle's algorithm conflicts with the "delayed ACK" strategy, where the ACK response is not sent immediately but after the response data is received. The problem is that, in Nagle's algorithm, the arrival of an ACK packet is a signal to send aggregated data. If the ACK packet is not received, the sending is performed when the wait times out. This creates a cycle the ACK packet as a signal does not work because the other side does not receive the data due to its accumulation on the sender's side, and the sender does not send it before the timeout due to not receiving the ACK packet.
  2. Nagle Algorithm RFC Age: The RFC for the Nagle algorithm was adopted in 1984 and is not designed for modern high-speed networks and servers in data centers, creating responsiveness issues. In modern networks, the delay between sending a request and receiving a response (RTT) is very short, allowing modern servers to perform an enormous amount of work during these short time intervals.
  3. Changes in data sending pattern: Modern distributed applications no longer send individual bytes of data, and small data aggregation is typically implemented at the application level. Even if the payload size is minimal, the actual size of the information sent increases significantly after applying serialization, using JSON APIs, and sending using TLS encryption. Therefore, saving 40 bytes becomes less relevant compared to the improved performance and responsiveness by disabling Nagle's algorithm.

It is because of that Brooker has called for disabling the Nagle algorithm by default. This can be achieved by setting the TCP_NODELAY option for network sockets using the setsockopt call, as has long been done in projects like Node.js and curl.

Finally, if you are interested in knowing more about it, you can consult the details in the following link


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.