Some time ago, we discussed the principles of overlay networks and the design requirements for the physical infrastructure that can support such networks. As discussed in this blog, almost every communication channel we have today is indeed using one or more overlays – IP over Ethernet, IP over Optics, Wireless access over GRE tunnels over IP over Optics etc.
The idea of decoupling services from core networks, maintaining a simple core and pushing intelligence at the edge has been fundamental in the design of the Internet and is often referred to as the “end-to-end principle”. Network virtualization applies these same ideas in the context of enterprise and datacenter networks.
However, the introduction of these technologies to enterprises is often received with valid skepticism. Network operations teams often wonder whether the operational tools are there to provide the same visibility and control with a network virtualization implementation as was the case with the more integrated technologies. Indeed, some vendors considered the full vertical integration as their main value proposition and attempted to convince network teams that the only way to achieve visibility is by relying on a closed architecture.
What we are demonstrating today is that not only is it possible to achieve visibility in an overlay deployment, but that this is doable without any hardware lock-in to any vendors strategy and without any need for reinventing new protocols or APIs. There are simple solutions that are based on open protocols and standards, and by combining those with an intelligent correlation layer, one can achieve full operational visibility.
Before we explain the technology though, let us consider some examples of key requirements that operations teams are looking to resolve.
- The ability to understand how a virtual network is overlaid in the physical infrastructure, or in other words which physical hardware nodes are involved in implementing this service.
- The ability to predict the effect of hardware maintenance functions in the health of these services, by enumerating the services that a given physical node implements.
- The ability to do root cause analysis to understand what event in the network caused an outage, whether temporary or permanent, so it can be fixed or how to avoid similar problems in the future.
- A need for visualization tools that will help them parse through the thousands of events of large networks in order to figure out how a misconfiguration has affected their services.
The reality though is that these are all standard capabilities that have existed in service provider networks for quite some time now. When one considers an MPLS deployment for example, there are plenty of similar tools that service providers have been using for years in order to answer the exact same concepts. What we need to do is learn from these tools and expand these technologies to address the datacenter and enterprise needs.
The technology behind this is conceptually quite simple.
Step 1: A simple probe is instantiated in the network and peers with one or more of the routing systems as a passive protocol listener. The “route monitor” as we call it, is able to use standard routing protocols like BGP, OSPF, ISIS, to construct the physical topology of the network as any other router would do.
Step 2: Once the route monitor has a view of the L3 topology, it needs to reconstruct as much of the L2 topology as possible. For example, if a Top of Rack switch is connected to a set of servers, over an L2 network, the route monitor must identify the relationship between servers and physical ports of the ToR switch. This can be achieved by direct API calls between the route monitor and individual ToR switches, by using standard protocols like SNMP or vendor specific calls based on CLI commands.
Step 3: When both the network and server topology has been discovered, the information can now be correlated with the virtual networks. A vswitch resides on a server, and a “correlation engine” can correlate the physical and virtual topologies. This is achieved by combining the virtual network topology information made available by the network virtualization solution with the physical topology of the route monitor. The correlation engine has the exact same view of the physical topology as the routers in the network themselves and by running the same shortest path algorithms it can discover the exact route that a packet will take without being part of the physical infrastructure.
Step 4: Once all these capabilities are in place, the correlation engine can start maintaining historical events. Every route modification is an event that can cause service changes or potentially outages. Every interface that comes up or goes down will cause a route update that will cause some service event. By mining through the valuable data of the protocol events, the correlation engine can analyze faults and provide recommendations to operators on how to address these issues.
Not only does this approach provide full visibility into the relationship between physical and virtual networks, but we could argue that it extends beyond what has been available to enterprises until now. Datacenters built with
multi-tier L2 architectures do not offer nearly the same visibility as the approach identified here. Controlling ACLs and security rules at the network virtualization edge and distributing intelligence while at the same time maintaining visibility is much more powerful than any current architecture.
And this is all possible in a multi-vendor environment and without the need to lock enterprises in single vendor solutions. The Nuage Networks Virtualized Services Assurance Platform achieves this today.
For more info:
– Press Release: Nuage Networks solves critical operational challenge to accelerate adoption of SDN-powered enterprise cloud and wide area network (WAN) services
– Nuage Networks showcases VSAP at Tech Field Day Extra @ ONUG