NSX non distributed routing with VMware Cloud Director


While working with NSX Version 3.2.X together with Avi (ALB) and VMware Cloud Director, I encountered an unexpected behavior. Clients were not receiving responses from the configured load balancer VIP, and I was curious as to why.

Preserve Client IP Architecture

During my investigation using NSX Traceflow, I observed an unusual traffic pattern in the return path to the client. Although the NSX service insertion behaved as expected, I questioned why SNAT (Source Network Address Translation) was also triggered. It is worth noting that the load balancer virtual service is configured to preserve the client’s IP address.

Under normal circumstances, if the service insertion rule is triggered, there should be no SNAT. This could explain why the client might drop the return traffic from the Avi Service Engine if there is a potential change in the IP address.

Let us take a step back and swiftly explain preserve client IP with the Avi load balancer:

Preserve Client IP Architecture Example

The client sends a request to the load balancer VIP. This request is then forwarded to the appropriate pool. Given that the virtual service is configured to preserve the client’s IP address, this address is passed on to the pool member. The pool member attempts to respond to the source of the request - the client - but the response is redirected to the Avi Service Engine at the SR on the T1 due to a redirect rule established during the configuration of ‘preserve client IP’. Eventually, the load balancer sends a response back to the client, and we and the Client are happy.

However, if the IP address changes due to SNAT on its way back to the client, we will lose the packet. Let us dive into why there is SNAT happening on the T1 SR.

To put it simply, it is a known issue. Let us discuss what is happening and how we can prevent it.

An exception occurs when a SNAT rule and a DNAT rule are configured simultaneously on the logical router (T1 SR). In our case, the rules were configured via the VMware Cloud Director UI. The result of this configuration is that NAT will be enforced on the downlink traffic.

To alter this behavior, we need to disable the NSX distributed routing feature in VMware Cloud Director, as shown in the following example:

VMware Cloud Director disable-enable distributed routing on T1

Allow non-distributed routing on T1 Edge Gateway

VMware Cloud Director disable-enable distributed routing on tenant segment

Disable distributed routing on tenant network

This seems like a straightforward fix but let us discuss the potential drawbacks. With distributed routing disabled, we direct all traffic through the gateway’s service router. In some use cases, this is not desirable and may complicate routing. The east-west traffic between networks connected to the same T1 gateway becomes more isolated.

Example of non distributed routing with NSX service interface

Non distributed routing: segment attached to service interface

Remember, with no distributed routing, the traffic will go via the service router and T1 Firewall rules apply. We may also need to adjust the advertisement accordingly to ensure east-west traffic functions correctly.

Let us summarize the advantages of non-distributed routing when considering its general use:

  1. Isolation of East-West Traffic
    • Non-distributed routing enables the isolation of traffic between organization VDC networks.
    • This ensures controlled and secure communication within your organization.
  2. Efficient Control
    • Deactivating distributed routing allows precise control over east-west traffic.
    • When distributed routing is disabled, VM traffic is directed through the edge gateway’s service router, simplify efficient management and monitoring.

Note

updated on Mar 1, 2024

  • observed in version: 3.2.X
  • last time tested in version: 4.1.0
  • issue still persists: yes