This seems to be the week of “Federation” (see here and here ). Since, we first discussed about the need for SDN federation when we launched Nuage Networks back in 2013, it is only fair to jump in the discussion.
The first observation we made when we started Nuage Networks was that when talking about networks we couldn’t restrict ourselves to islands or administrative domains. The Internet, after all, is built as a “network of networks”; An interconnection of technologies and networks that together solve the much larger problem of information distribution and interaction at the most massive scale known.
It’s refreshing to see that the industry is slowly recognizing the fact that solving a networking problem within simple domains is not really solving any problem. The reality is that there is no magic wand that will replace every network with SDN technologies overnight. Old and new technologies, different administrative domains, and different types of control will need to co-exist for a while. Greenfield installations of homogeneous networks where every one follows the same API structures are nice on paper, but not in real-life deployments.
So, when we talk about “federation” and in the danger of creating more “federation definitions” than “SDN definitions”, we should focus on the set of problems that we are trying to solve. Merriam Webster always comes handy in starting such conversations:
“Federation: an encompassing political or societal entity formed by using smaller or more localized entities”
Or in other words, federation is about creating larger entities with a goal of solving a problem by combining smaller independent entities. First and foremost in this definition is the lack of any requirement that entities need to be “homogeneous” or “similarly structured”. In most federations, what is defined is the rules of interactions between entities and the goals of the larger entity, rather than the structure of the individual entities themselves.
One can organize these federations by leveraging loosely coupled or tightly coupled mechanisms with top/down or distributed controls, and in the danger of a political undertone, there is the “Russian Federation” and the “United States of America” (or the Delian and Arcadian Leagues). Different approaches, different philosophies, and different results.
To a certain extend though, a federation immediately calls back for a “promise” between the participating entities. One can think each network or entity managing a part of the network as an autonomous agent that makes “promises” to the other agents with respect to the type of service that they can offer. Readers familiar with Promise Theory, will immediately see the connection. (For details please see the book “Promise Theory” by J. Bergstra and M. Burgess.) As defined there, agents are autonomous and only have a view of their local information and not a complete information of the whole world. Concepts of imposition and obligation are also defined and these apply to a certain extent to every form of federation, including networking scenarios that are of interest to us.
In such a distributed environment, these agents can be cooperative or non-cooperative, and the final behavior of the system can be often understood better by modeling the system using a game theoretic approach. Every agent has its own utility function, and its goal is to maximize its own utility. In certain situations agents will choose to lie or affect the final outcome in order to maximize their own utility, whereas in other situations they might be interested only in optimizing the global system. Very often agents are adversaries trying to minimize the utility of other agents. The design of the mechanism of interactions between the agents will determine whether the global system will reach stability or whether it will be optimal for the federated entity.
In order to better illustrate this, let us take the example of the Internet. A network is formed from a set of autonomous systems that make independent decisions and by using a federation mechanism, BGP in this case, to exchange promises. Autonomous system A promises that it has a route for given destinations to its neighboring autonomous systems, and provides a “cost” for that route. The other systems can choose whether to use that route, depending on the relative costs and peering agreements they have with other autonomous systems. Every autonomous system has implicit incentive to avoid carrying packets that are of no utility to it. But, due to interactions with many other systems, a network that does not help the other networks becomes less valuable in the federation, and other agents might choose not to serve it at all. In other words, being selfish does not always help, and cooperating is most often the best way to maximize the utility of an agent. It is true that the dynamics of these interactions are complex and I would point interested readers to the excellent work by Feingebaum and Shenker for more details (Distributed Mechanism Design: Recent Results and Future Directions).
Another interesting aspect of federation is that in general it does not require fully consistent state information distribution. Indeed, in most cases, agents do not know (or do not need to know) the exact state or the utility function of the other agents. The desired outcome can be achieved by the independent operation of the agents as opposed to a global knowledge center that tries to micro-manage every action in the system. Indeed, there are several parallels we can point to, where the concentration of knowledge is neither feasible nor optimal for the system.
Let us try now to translate these (rather abstract) observations to the SDN world, and the discussions around SDN federation that have become more popular recently.
First, the idea of “super SDN controllers” that micro-manage smaller entities can be immediately equated to the centralized mode of federation described earlier; Centralized knowledge base, homogeneous systems, single administrative domains, and to a certain extend an offer of the “least common denominator”. The central SDN controller is limited by the capabilities of the weakest link and becomes the bottleneck.
It is obvious that a distributed mechanism design where individual controllers can “promise” and offer services to other controllers can lead to a much more scalable system (provided that it is properly designed). And since engineering is a series of compromises, and we have to deal with the existence of a humongous infrastructure that we cannot change overnight, leveraging existing mechanisms of federation is the wise choice for the immediate future. There is an argument that such distributed control is not optimal. This might or might not be the case, depending on the design of the distributed system though. And even though it is conceptually easier to understand a centralized controller managing everything, it is not necessarily scaleable or operational.
Second, there is the concept of different layers of the network “federating” to offer a service. In a virtualized network wold, the overlay and underlay need to work together to offer a service. As we have discussed in more detail in a previous blog, there are several ways to achieve that. One can choose a tight coupling by combining the two together, or can choose a loosely coupled mechanism where the “underlay” can make specific “promises” to the overlay about the types of services that it can offer, and it can further notify the “overlay” when it can stop offering these services.
Moving forwards we expect to see richer mechanisms for interactions between overlays and underlays. This does not mean centralized controller or the formation of north bound APIs that limit the services that the network can provide to the lowest common denominator. We envision a network where multiple overlays offering different types of services and different service APIs can utilize the same or multiple underlays, and where they can work together to achieve a common end-to-end service.
The way to achieve these goals is neither by Super Controllers nor by Super APIs. What is needed is a promise theory based model that allows independent agents in overlays and underlays to interact with each other to deliver a service.
Dimitri Stiliadis