As we discussed in the previous blog, a network virtualization system must adhere to specific properties such as equivalence, efficiency, and resource control. It is interesting to review some of the common network virtualization approaches discussed in the industry these days, and understand to what extent these approaches actually meet these requirements.
Let us start with the simple form of VXLAN, as it has been defined in IETF. The VXLAN proposal suggests L2 segments overlaid on top of an IP network by encapsulating Ethernet frames in IP/UDP packets. From a forwarding perspective this indeed is a virtualization method (or one level of indirection), that hides the existence of Virtual Machines (or tenants) from the core network. The IP network does not need to worry about VM addresses and in theory can scale much better.
However, the devil is in the details. Part of the VXLAN proposal and most of the first implementations, is also a simple control plane structure that relies on IP multicasting. In order to emulate the broadcast/multicast capabilities of an L2 segment, the proposed approach relies on IP multicast groups, and assumes that one multicast group is associated with each segment. This has several implications:
- The core network must now be aware of the overlay virtual networks, and indeed the amount of state that must be maintained in the core is linear to the number of the virtual networks. This introduces tremendous inefficiency and has significant scalability limits.
- The multicasting requirement restricts VXLAN within the boundaries of a single administrative domain. The requirement that tenant identifiers are global and there be one-to-one mapping between tenant networks and the multicast groups introduces significant operational complexity. If two different administrative domains have already used the multicast group for a different purpose, achieving interoperability in this global name space is almost impossible.
So, even though VXLAN starts with an excellent idea of hiding per-tenant addresses and VMs from the core of the network, its control plane implementation quickly fails to do that, and it does not provide full network virtualization.
But the limitations of the VXLAN control plane are only the beginning. The main service VXLAN aims to provide is a simple emulated layer-2 segment. This is far from being sufficient for enterprise applications that are typically based in multi-tier architectures. Let’s consider one of the most common multi-tier DMZ type of application as shown in Figure 1. Internet traffic is passed through a firewall and then routed to a set of web-servers that implement the presentation aspects of the application. The web-servers rely on business logic servers that are sitting behind another firewall and layer-2 segment.
First, let us consider the question: Why do IT organizations group together applications in layer 2 segments? In most cases, it is not because the servers in the same segment often communicate with each other. Indeed, in most cases web-servers will hardly ever exchange any information. Servers are organized in layer 2 segments in order to group and manage security policies, communicate with firewalls and load balancers. In other words, even the most simple and commonly deployed enterprise application requires much more than a layer 2 segment. It requires layer-3 routing and often layer-4 separation in order to enhance security.
A standard implementation of such a multi-tier application by utilizing technologies like VXLAN is illustrated in Figure 2. The approach relies on routers that are deployed as virtual machines. These VM routers are responsible for interconnecting traffic between different layer 2 segments (red and blue VMs). There are several issues that are introduced with such an approach:
- These VM routers can quickly become traffic choke points. If routed traffic is a significant percentage of application traffic, as is the case in multi-tier applications, then most traffic is passing through this router VM. This is clearly a bottleneck in the architecture, and even though the application developer expects that the application can horizontally scale, it is fundamentally limited in capacity. The data center and cloud fail to deliver to application developers a seamless experience.
- In addition to bandwidth bottlenecks, the VM routers are reliability risks. If the VM router fails, the application fails. This means that the application developer will have to put multiple of these routers, but then managing the traffic between virtual machines and routers is a significant challenge. Supporting redundancy protocols such as VRRP between the routers introduces [yet another protocol in the core network] and hence additional complexity. In some architectures, the recommendation is to rely on the hypervisor HA capabilities in order to achieve redundancy. What is hidden from this suggestion though, is that HA operates within a cluster, which is a localized set of physical servers. If the cluster or the network that connects the cluster to the data center fails, the application will fail again.
In both of the above cases that bandwidth and reliability bottlenecks introduced by the VM router approach have a direct impact on the application developer. They put the burden on the application developers to resolve networking issues and deal with the inefficiencies of the approach. They have to manage routers, bandwidth between routers, load balancing, and high availability, instead of being able to focus on the development of their application.
At the same time, though, there are significant impacts to the cloud operator. In most cloud management implementations, cloud operators would like to have complete freedom in placing their workloads around the data center without any limitations imposed by the network. The phrase, “take the network out of the way” has been coined to express exactly this requirement. If one distributes the VMs and associated VM routers in different zones of the data center in order to achieve maximum reliability, then they will quickly observe the “traffic tromboning” phenomenon. As is show in Figure 3, even for communication between two VMs that are placed in the same hypervisor, traffic might need to flow across the data center in order to be routed by the VM router that is placed in a different rack. This can dramatically decrease the effective bandwidth capacity of the DC fabric and/or introduce latencies and bottlenecks. For a given set of demands, the cloud operator will require a much higher capacity fabric, and this leads to additional expenses.
What we can see from the above discussion is that layer-2 virtualization technologies as currently defined in the industry fail in most of the virtualization properties that we discussed earlier:
- Network Equivalence: A VM router based approach does not provide the same behavior as a dedicated physical network, since it introduces artificial bottlenecks and reliability risks.
- Network Core Efficiency: Traffic tromboning can have a significant impact on the data center network, and traffic between VMs is not always in the shortest (or best) path.
This simplistic form of network virtualization, throws the problem over the fence to another domain (IP multicasting and orchestration), puts the burden of reliable networking to the application developer, and significantly limits the efficient utilization of network resources.