Delayed ACK and Nagle’s Algorithm

In this article, I am taking a shot at trying to explain the interaction between Delayed ACK and Nagle’s algorithm and how this could add latency during TCP session that requires transmission of small packets.

MSS:

Maximum Segment Size or MSS denotes the data that is being sent in the “Segment” utilized in the TCP layer of the OSI model. Default MSS is 536 Bytes. Default MTU is 576 Bytes.

MTU = MSS + 20B (TCP Header) + 20B (IP Header)

TCP Transmission:

RFC 1122 talks about the conditions under which data can be transmitted in TCP implementations. Data is transmitted when any of the following 3 conditions is met.

1. Immediately, if a full MSS or more can be sent.

2. {[No unacknowledged data] && 
    [(PSH flag is set) || 
     (Buffered data > 1/2*(SND Window))]}

3. {(PSH flag set) && (Override timeout expired)}

1/2*(SND Window) is implementation dependent and can differ across Operating System and within different versions of Operating System. Override timeout is roughly 200ms. This value could change between OS too. ACK Number represents bytes and not packets.

Delayed ACK:

Delayed ACK helps in avoiding “Silly-Window-Syndrome” (SWS) at the Receiver. The Receiver will delay sending an ACK in response to data received when all the 3 conditions match.

1. When there are no 2 packets / 2*MSS received.

2. When the client has no data to send.

3. When the Delayed ACK timer has not yet expired.

In the UNIX World, 2*MSS has to be received by Receiver in order for it to send an ACK and in the Windows World, 2 packets of any size has to be received by Received in order for it to send an ACK.

Nagle’s Algorithm (RFC896):

The goal of Nagle’s algorithm is to lower the number of small packets exchanged during a TCP session. This helps in avoiding “Silly-Window-Syndrome” (SWS) at the Transmitter.

Nagles algorithm can be summarized as follows:

A. If there are unacknowledged data 
   (i.e., data in flight > 0 Bytes), 
   new data is buffered.

B. If data to be sent is less than MSS, it is 
   buffered till the data to be sent is 
   greater than or equal to MSS.

Problem:

Under the right conditions, the 1-3 points outlined under Delayed ACK and A-B points outlined in Nagle’s Algorithm will freeze the interaction between the sender and the receiver for the duration of timeout which is roughly 200ms. This is often seen in applications that rely on smaller packet sizes.

A Simple Scenario:

Sender is a client machine that updates the Receiver with information. Receiver could be some kind of a data warehouse which stores information on financial transactions. In this case, the Sender has data to send to the Receiver and the Receiver acknowledges the data received and does not transmit any data to the client in response other than a simple ACK.

During the course of a TCP session, Sender has just sent 500B of data to the Receiver and this matches condition A outlined in Nagle’s algorithm. The application at the Sender side moves 400B of data to the TCP stack. At this point, Sender has not yet received an ACK from the Receiver and because the next 400B of data meets the B condition outlined in Nagle’s algorithm.

Thus, the 400B of data will be buffered till either one of the Nagle’s condition is met:

A. ACK is received from Receiver for the 
   previously sent 500B of data.

B. Application sends the TCP stack more data 
   that will push the existing buffered data 
   (400B) more than MSS i.e., application needs 
   to send 136B or more to the TCP stack in 
   order to push the buffered data to or beyond
   the MSS limit.

On the Receiver side, the receiver will refrain from sending an ACK after receiving the first 500B of data because the 1-3 conditions outlined under Delayed ACK hasn’t been met.

1. Only 1 packet of 500B (less than MSS) has been 
   received.

2. Receiver does not have any data to transmit, 
   other than ACK.

3. Delayed ACK timer has not yet expired.

Sender keeps the 400B of data buffered. Receiver will not ACK the previously sent 500B of data. Effectively, there will be a communication freeze between the Sender and the Receiver till the timeout expires. Usually this timeout is 200ms in different OS implementations.

For further understanding, I would highly recommend this youtube video on this subject by Hansang Bae.

F5 iRule – URI Masking

Requirement:

Client sends request to http://xyz.com/

Server needs to process http://xyz.com/append but client should only see http://xyz.com/ i.e., the URI  /append should not be visible to the client.

when HTTP_REQUEST { 
if { ([HTTP::host] equals "www.xyz.com") and ([HTTP::uri] eq "/") } { 
HTTP::uri "/append" 
} 
} 
when HTTP_RESPONSE {
if { [HTTP::header values Location] contains "/append" } {
HTTP::header replace Location [string map {/append /} [HTTP::header value Location]]
}
}

The F5 will complete the following steps using the iRule provided above:

F5 will add URI “/append” to the incoming request.

F5 will replace “/append” with “/” in the response from the server to the client.

Reference:

Mask URI – Devcentral Thread

Github

F5 – SSL Cert Expiration

K14318 – Identifying expired certs and certs about to expire in 30 days.

K15288 – Email reminder for cert expiration.

A few one-liners from bash to identify the cert expiration date:

Identifying the expiration date from the certificate name:

~ # tmsh list sys file ssl-cert domain.crt | grep expiration
    expiration-date 1505951999
    expiration-string "Sep 20 23:59:59 2017 GMT"

 

Identifying the Client SSL profile for a certificate:

~ # tmsh list ltm profile client-ssl one-line | grep domain.crt | awk '{print $3,$4}'
    client-ssl CLIENTSSL-domain.com

 

Identifying the Virtual Server from Client SSL profile:

~ # tmsh list ltm virtual one-line | grep CLIENTSSL-domain.com | awk '{print $2,$3}'
    virtual VS-10.10.10.10-Public

 

Identifying the expiration date for cert associated with VS:

~ # echo | openssl s_client -connect 10.10.10.10:443 2> /dev/null | openssl x509 -noout -dates
notBefore=Nov 21 00:00:00 2016 GMT
notAfter=Nov 22 23:59:59 2017 GMT

 

F5 Virtual Server – Order of Precedence

The VS order of predence differs with code version and the tm.continuematching db variable. This tm.continuematching db variable is set to false by default and hence, a lower predence VS does not handle the traffic if there exists an higher predence VS in a disabled state. If the traffic has to be handled by lower precedence VS when the higher precedence VS is disabled, we would have to set this db variable as true:

11.x Code Version:

(tmos)# modify /sys db tm.continuematching value true
(tmos)# save /sys config

9.x – 10.x Code Version:

bigpipe db TM.ContinueMatching true
bigpipe save all

The order of predence for VS processing for different code version is provided below.

Order of Precedence for code version: 9.2 – 11.2.x

<address>:<port>
 <address>:*
 <network>:<port>
 <network>:*
 *:<port>
 *:*

Order of Precedence for code version: 11.3 and later

Order Destination Source Service port
1 <host address> <host address> <port>
2 <host address> <host address> *
3 <host address> <network address> <port>
4 <host address> <network address> *
5 <host address> * <port>
6 <host address> * *
7 <network address> <host address> <port>
8 <network address> <host address> *
9 <network address> <network address> <port>
10 <network address> <network address> *
11 <network address> * <port>
12 <network address> * *
13 * <host address> <port>
14 * <host address> *
15 * <network address> <port>
16 * <network address> *
17 * * <port>
18 * * *

F5 – Bleeding Active Connections

Scenario:

A Virtual Server is load balancing connections to a pool with 2 pool members. During maintenance window, one of the two pool members is disabled and maintenance is completed followed by the other pool member.

However, as the users make continuous API calls every 5 seconds, the existing TCP connection never bleeds out. Even after waiting for 24 hours, connections still exist on the disabled pool member.

Solution:

By default, F5 makes load balancing decision when the 1st HTTP request within a TCP connection is received. Subsequent HTTP request within the TCP connection are sent to the same pool member as the very 1st HTTP request.

By enabling OneConnect profile with a /32 netmask (255.255.255.255), we were able to force the F5 to make load balancing decision for every HTTP request instead of its default behavior.

The OneConnect profile used along with disabled or forced-offline setting will move the connection from the failed pool member to the active pool member.

Reference Link.

Sub-Domain Delegation GTM/DNS

 

Lets say that you have domain.com hosted with a 3rd party DNS provider and you would like to create GTM (BigIP-DNS) DNS load balancing by utilizing Sub-Domain Delegation.

In this scenario, there are 2 GTM. One in each DC (DC-1 & DC-2). The basic set up has been completed and the GTMs are in a common sync-group.

Create A-Records for the 2 GTM using their Listener IP addresses:

 gtm1.wip.domain.com. IN A 100.100.100.100
 gtm2.wip.domain.com. IN A 200.200.200.200

gtm1 and gtm2 exist in DC-1 and DC-2 respectively and 100.100.100.100 & 200.200.200.200 are the listener IP address configured on gtm1 and gtm2.

Delegate the sub-domain to the GTM using NS Records:

 wip.domain.com. IN NS gtm1.wip.domain.com.
 wip.domain.com. IN NS gtm2.wip.domain.com.

Use CNAME records:

www.domain.com. IN CNAME www.wip.domain.com.

The above DNS records (A, NS & CNAME) will be added to the 3rd party DNS records that is hosting domain.com. Any request for

www.domain.com

will be sent to the 3rd party DNS provider which will then resolve to

www.wip.domain.com

because of the CNAME and that will be handled by the GTMs because of the NS & A records.

SOL277 – Sub-domain delegation.

OneConnect & HTTP Requests

This is a copy/paste of a Q&A in devcentral. I didn’t change it as it is quite descriptive and gets the point across.

Current Setup:

We are using Cookie Insert method for session persistence. So LTM adds “BigipServer*” Cookie in the http response header with value as encoded IP address and port name. Subsequent requests from the client (in our case browser) will have this cookie in the request header and this helps LTM to send the request to same server. This LTM cookie’s expiry is set to session, so this cookie will be cleared when we close the browser or we expire it using iRule.

Use Case:

We have set of servers configured as pool members serving traffic to users who are logged in. During release time, we will release the code to new set of servers and add those servers also to the LTM pool. LTM will now have servers with both old code as well as new code. We disable all servers which has the old code so that LTM routes only the requests which already has “BigipServer*” Cookie value pointing to those servers. This will not interrupt the users who are already logged in and doing some work. All new requests (new users) will be load balanced to any of the active servers which has new code. We will ask our already logged in users to logout and login back again once they are done with the current work. We have an iRule configured to expire the LTM cookie during logout, so our expectation is that users will be connected to new servers when they are logging in again.

Problem:

Even though iRule expires the LTM cookie during logout and the cookie is not present the request header of login, users are still routed to the same disabled server when they are logging in again. Ideally, LTM should have load balanced the request to any of the active servers.

Root Cause:

Upon analyzing this further with network traffic, we found that, whenever the browser has a persistent TCP connection open with LTM after logout, browser uses that existing TCP connection for sending the login request. LTM routes this login request to the same disabled server which handled the previous request even though LTM cookie is not present in the request header. If we close the TCP connection manually after logout (using CurrPosts or some other tool), the browser establishes a new connection with LTM during login and LTM load balances this requests to any active server. One option for us is to send “Connection: close” in the response header during logout, but the browser may hold multiple persistent TCP connections (I have seen browser holding even three connections) and hence closing a single TCP connection will not help. Other option is to close the browser, but we don’t have that choice for reasons I cannot explain here (trust me).

SOLUTION:

Try using the following:

  1. OneConnect Profile in VS with netmask of /32.
  2. Action on Service Down in the Pool set to Reselect.

(1) will force the load balancing decision to be made for every HTTP request instead of the the default of lb decision being made only for the 1st HTTP request within a TCP connection.

(2) will force the HTTP Request to be sent to a new pool member when the selected member is down as the load balancing decision is made for every HTTP request instead of the very 1st HTTP request within a persistent/keep-alive connection.

Keep-alive Connection (also referred to as Persistent Connection) is used to refer to the same feature provided by HTTP1.1 where you can utilize a single TCP connection in order to send multiple HTTP requests within a single TCP connection.

 

Adding a Blade to a Viprion

Normally, when you add the new blade, the current master blade will synch it’s configuration onto the new blade. Make sure that the existing blade is master. Backup all relevant configuration on the device and off the device before adding new blade.

Make sure that the blades are same model and see K16992

Look at K13965 🙂 to identify the master blade.

Considerations when moving blade between chassis: K10541271

F5 Code Upgrade Steps

This is a rough template of F5 Code Upgrade steps that could be of help for your maintenance work.

  1. Before performing any F5 code upgrade, make sure that the “Service Check Date” on the device is AFTER the License Check Date for the new code version as listed here in SOL7727
  2. Upload the new code to the partition that you prefer on the F5.
  3. cpcfg to the new code version location – Example: cpcfg HD1.2

    Although “cpcfg HD1.x” has worked most of the times, I would recommend backing up the .UCS file in a remote location and also saving a copy in “/shared/tmp/<UCS File>“. After saving the UCS file in the “/shared/tmp/” location, you can utilize “load /sys ucs <path/to/UCS> no-license” to load the configuration as noted in SOL12880

  4. Reboot.This will take about 5-10 minutes for Hotfix updates and about 15-20 minutes when migrating major code version.

Recommended maintenance window is about 1 hour. This could change depending on any application level testing that you would like to incorporate within your maintenance window.

Reference:

F5 Code Upgrade – 10.x to 11.x

Viprion Chassis – Adding New Blade

Normally, when you add the new blade, the current master blade will synch it’s configuration onto the new blade. Make sure that the existing blade is master. Backup all relevant configuration on the device and off the device before adding new blade.

Make sure that the blades are same model and see SOL16992

Look at SOL13965 to identify the master blade.

Considerations when moving blade between chassis: SOL10541271