Software Defined WAN (SD-WAN) technology provides many benefits compared with traditional approaches to building networks on private infrastructures or by building network overlays manually. Greater control and the ability to easily integrate with cloud services through application awareness, automated policy control, value added services, and zero touch secure provisioning, and other capabilities, allow operators to build more dynamic and powerful networks faster than ever before.
Connecting data centers, branch offices, public and private clouds through automated overlay Virtual Private Networks (VPN)s is the essence of SD-WAN. However, there are obstacles to overcome and many operators have a need to deploy SD-WAN in conjunction with existing customer transport networks and network segments that may be a barrier to seamless SD-WAN connectivity. This article will focus on the challenges associated with building so-called hybrid networks and the key features that enable seamless interworking.
Interworking up and down the stack
Network interworking requirements for a seamless SD-WAN service appear at both the underlying transport network (i.e. underlay network) and the overlay VPNs (i.e. overlay network) levels of the networking stack:
- Overlay interworking – overlays need to provide end to end connectivity transparent to the underlying transport and need to deal with many of the existing network problems (e.g. connecting to extranets or other overlays). In addition, overlay networks need to support traffic optimization by breaking traffic out at the most optimal location (e.g. Internet breakout)
- Underlay interworking – the underlying transport network that supports the overlay network. The main challenge here is enable overlays to seamlessly work over any type of underlay or any combination of underlays. For example, many operators would like to use an existing MPLS L2VPN or L3VPN in conjunction with an internet underlay
Overlay Interworking
Overlay networks start out life as a self-contained VPN. Any sites that belong to the overlay can communicate with other sites in the overlay. However, as is often the case, users and applications within the overlay network need to communicate with the outside world, or with services and users in other overlays or other extranets.
From central breakout to local breakout
A typical solution to enable communication with the outside world is to direct all traffic from the overlay to a central site, where a router provides connectivity to the Internet. This is known as centralized “Internet breakout”. While this provides ease of policy and control, being centralized does not always represent optimal traffic flow, and can result in unnecessary latency as well as dimensioning issues at the central site.
This is where local Internet breakout comes in. An ideal SD-WAN solution allows traffic to “break out” at any given site that has the requisite access to the external resource. For example, if the Internet is being used as the underlay network, then it is highly desirable to break traffic out locally if it is destined to the Internet itself. The solution should allow ease of provisioning of local breakout as well as the option to apply security policies and features like Network Address Translation (NAT) at the point of breakout. Route-to-underlay is also a highly useful tool particularly when NAT is not desired or needed, enabled by routing protocols such as BGP on a branch device’s uplinks.
Easing interworking at central locations
An operator may need to continue providing central access to external resources for customers, even while moving remote sites to an SD-WAN solution. In this case an SD-WAN device or devices can be placed at central locations to hand off traffic from the overlay network. The ideal way to do this is to have the SD-WAN device interact with routers connected to the legacy underlay networks and exchange routing information using BGP. Due to the importance of these central locations, the SD-WAN solution needs to scale well in terms of customer domains as well as having an extremely robust routing implementation including fully featured routing policy control.
Connecting 1000s of branch sites without needing to reconfigure existing routers
A key concern with deploying an SD-WAN solution is the complexity of managing interconnectivity with existing routers at the customers sites. Often these routers have already been configured with OSPF or BGP as a way of exchanging routing information with the rest of the network. If an SD-WAN solution and its branch device does not provide these protocols with full functionality, there is a possibility that existing customer routers would need to be reconfigured introducing increased risk, cost, and delay.
We can illustrate this with an OSPF example. Imagine that a large financial services customer has used a router at every site that collects routes from the site using OSPF and redistributes them into BGP. To implement an SD-WAN solution and an SD-WAN capable device at the site means the device needs to not only speak OSPF but support a wide range of OSPF features that may be in use at the site like: authentication, area type and options, redistribution policy, and so on. Therefore, it is important that not only the SD-WAN solution and branch device supports these routing protocols, but that it also fully supports the breadth of routing functionality often used by existing networks.
Linking SD-WAN solutions to the data center
It’s often the case that there are multiple data centers, with one or more customer routing management domains present. Often SD-WAN solutions need to concurrently provide access to these multiple domains to provide a seamless overlay network connecting all data centers and SD-WAN sites. Without support built into the SD-WAN solution, this can be tedious and time consuming to build, spanning multiple management domains and again, introducing risk, cost, and delays to deployment.
The ideal SD-WAN solution should provide a “domain linking” feature, whereby an SD-WAN domain for a customer can be linked to a domain at a datacenter, using the same management system. This requires the SD-WAN solution to be built with datacenter domain linking in mind. A device managed by the overall SDN solution can sit as a Border Router (BR) at (or connected to) a datacenter, operating simultaneously as part of SD-WAN domains and datacenter domains that need overlay connectivity to each other. On the SD-WAN side of the network, the BR should also enable features such as IPSec termination to protect traffic from parts of the underlay network that are not trusted. On the data center side of the network, the BR takes care of directing traffic to the appropriate domain within datacenter. Policy and other features should also be supported by the BR, for example NAT, which leads us to the next important aspect of SD-WAN solutions: extranet support.
Extranet support – enabling pre-existing networks to communicate securely
Extranet as a term can be broadly understood to mean a separate network that is “external” to another network. External in the sense that it exists already, may be operated by another entity, may be trusted to a limited degree (depending on the relationship between the different network operators), but has resources that need to be accessed from outside. Normally these kinds of networks have been built as private networks, and as such will often have private IP addressing schemes. Extranet scenarios can arise because of companies needing to provide communication between each other for limited services or because of acquisitions, amongst other reasons.
Two major problems arise when networks need to communicate with extranets: overlapping IP address space, and reconciling existing security policies (e.g. such as firewalling). These problems have been solved in existing networks by complicated NAT configurations on routers, and by adding and amending firewall policies that are often part of a different administrative domain. This takes great coordination between the various parties operating the networks.
As some of these disjointed networks start to migrate to SD-WAN, it is critical that the SD-WAN solution chosen provides the same level of functionality but in a much simpler manner. This can be achieved using the Border Router concept introduced earlier, as it can straddle different domains to create a seamless and transparent SD-WAN overlay.
The SD-WAN solution therefore needs to provide a single administrative domain to solve both overlapping IP space and security issues, reducing both risk and cost while significantly accelerating deployment. Overlapping IP space issues can be solved with different tools depending on the extranet communication required:
- SNAT/PAT – Source Network Address Translation with Port Address Translation is a simple way to allow remote networks to access a private network (e.g. accessing a central resource such as a server at a datacenter), even though there may be address overlaps
- Bi-directional NAT – needed when two or more networks have overlapping IP space and need to enable “two way” communication, (e.g. both have servers that need to be accessed by each other)
Underlay Interworking
Now that we’ve discussed some of the important aspects about overlay interworking, we turn our attention to underlay interworking.
Underlay networks or underlay transport networks can be thought of a little like “The Force” from Star Wars, and to liken it to the way Yoda describes the force, “they bind everything together and they’re everywhere”. It would be great if that meant seamless intercommunication for any device connected to a network, but unfortunately, we know that is not the case. First let’s discuss the major types of underlay networks that we might encounter and then we will move onto the issues involved that a good SD-WAN solution needs to solve.
- The Internet – ubiquitously available
- LTE – leveraging mobility for the underlay
- Layer 3 VPNs – normally built on MPLS
The wild west of IPv4 and Network Address Translation on the Internet
The scarcity of IPv4 address space has driven the wide deployment of Network Address Translation (NAT) over the years. Today, SD-WAN solutions need to handle the presence of NAT with deployed SD-WAN devices in order to use the Internet as an underlay transport network.
NAT has many different forms and implementations, and as a result there is a plethora of different behaviors. Further complicating this is the way that different types of NAT implementations interact with each other, making reliably establishing communication between sites more difficult. In the worst case, traffic will simply be black holed between two sites, leaving the customer with a difficult issue to debug.
Port forwarding (i.e. hole punching) is one solution but it is cumbersome and for all intents and purposes, is not practical when building networks at scale.
It is crucial that an ideal SD-WAN solution provide a scalable, easy to deploy, and automated NAT-Traversal (NAT-T) implementation. Key functionality that is needed in this solution are:
- Out-of-the-box deployment – by default, devices will always try to automatically create connectivity to each other, through any kind of NAT type or combination. We can liken this to a “STUN-like” behavior (see https://en.wikipedia.org/wiki/STUN).
- Intelligent fallback behavior – some NAT type combinations will just never work. When this happens, traffic should automatically route via an intermediary, for example, an Underlay Border Router (UBR). Although UBRs are for enabling traversal of disjoint underlays (more on that below), they are often well situated to “help” struggling NAT pairs. We can liken this to a “TURN-like” behavior (see https://en.wikipedia.org/wiki/Traversal_Using_Relays_around_NAT).
- Visualisation and troubleshooting tools – because NAT can introduce many different issues, excellent tools need to be present to help with diagnosis and troubleshooting
LTE – a great option for a primary or backup connection, but not without issues
LTE is fantastic for SD-WAN, providing a cost effective primary or backup connection while lowering deployment timeframes.
However, support for LTE in an SD-WAN solution is not always simple due to the wide range of LTE devices, regions, and operators. A strong SD-WAN solution must show comprehensive support for LTE devices and offer a robust implementation that operators can trust. As an operator, it is critical that LTE support is tested thoroughly before selecting an SD-WAN solution.
Also, remember that LTE is not just for the internet. It may provide access to a L3VPN, and therefore appear as though it is a separate underlay. We will cover that later.
Handling intermittent underlay transport issues – controller-less mode
It goes without saying that at any time there can be a loss of communication over a given underlay network. If the only underlay available happens to be the Internet, then planning needs to be made to handle outages. The key concern here is if the on-premise SD-WAN device loses connectivity to its controller. When this happens, in the worst case, the device simply stops forwarding any traffic at all or enters an unknown state.
During any such outage, there is most likely traffic at the site that needs to continue to be delivered. For example, a user may wish to use a local printer. Therefore, the SD-WAN solution and device needs to support a “controller less” mode, and continue forwarding traffic using the last known configuration, continue speaking local routing protocols such as OSPF or BGP, and so forth. When access to the controller becomes available again, the solution needs to seamlessly bring the device and site back into normal operation.
Which Underlay to choose, and the issue of “disjoint” Underlays
As we discussed earlier, there are several different types of underlay networks. There could be LTE/mobile, Private L3 VPNs (L3VPN), and the Internet. Remember that LTE may provide a L3VPN rather than the internet, and/or it may provide the Internet as an underlay but with different performance characteristics.
Not all of these underlays are homogenous. Imagine that one SD-WAN device (site#2) is connected to only the Internet, but another device (site#1) is connected only to an L3VPN. The operator and the customer want to enable these devices to provide a seamless L3 domain service. How do we do this?
The solution is to introduce the concept of an Underlay Border Router (UBR), the purpose of which is to allow an overlay service such as an L3 domain to seamlessly run across one or more underlays. It seems simple but in reality, there are multiple issues that a solution like this needs to solve:
- UBR selection – an intelligent selection criteria needs to be in place to allow the system to easily identify where a given SD-WAN device is, which underlay network is has access to, as well as take into account performance considerations such as latency, load on the UBR, etc.
- Scaling – the UBR design needs to be scalable, both with physical and virtual devices, as well as being capable of “scaling sideways”
- Resilience – related to scaling, the solution needs to allow for a fully resilient and self-healing design so that no UBR needs to be a single point of failure
Conclusion
We’ve touched on a wide range of issues and solutions related to interworking with existing networks, both at the overlay and the underlay layers. Although there is a strong focus (and rightly so), on value added services, cloud services integration, and other more advanced SD-WAN features, there is an equally strong need to ensure that the underlay and overlay interworking features are well designed and implemented.
An SD-WAN solution that enables customers to smoothly migrate from existing networks and to co-exist with them as long as necessary is the foundation to building new and more powerful networks. There are many areas within this internetworking that can create issues, and the best SD-WAN solutions will have taken all of this into account, ensuring that scalable, stable, resilient, and easy to use solutions are in place.