At Nuage Networks, we use a large cluster of machines for R&D and for various day-to-day operations. This means that we face some of the same challenges as most of our customers experience when it comes to operationalizing our internal data center.
One of my frequent and nearly daily tasks is spawning new “Nuage sandboxes” for our internal usage.
A “Nuage sandbox” is basically a complete end-to-end Nuage Networks setup that can be used for ad hoc testing of a feature, as a demonstration environment or for the quick validation of a code change in one of our modules. Sandboxes are also useful to provide access to our newest end-to-end setups to our extended team in the field. The goal of the sandbox is to replicate a typical Nuage deployment scenario.
Depending on the setup being emulated, a sandbox will be minimally composed of at least 7 independent virtual machines. Some of those VMs will host the Nuage Networks components such as the VSD and VSC and other VMs will be used for an OpenStack Controller or OpenStack compute.
Initially, we used scripts on top of the libvirt API to quickly spawn those virtual setups based on some “golden” snapshots. Paradoxically, one of the challenges with these “Nuage sandboxes” was to managing the network in our lab.
In order to make the setups available to our extended team, we have to plug these setups into our internal network (called “NuageNet”). In the meantime, we wanted all the VMs to be able to handle heavy communication such as the VxLAN tunnel encapsulation on a separate private plane (referenced here as the “data path”). One of the requirements was that the data path should be completely isolated for each setup.
In order to achieve this in our initial infrastructure, we cabled each hypervisor with two interfaces:
– eth0 was connected to the company-wide “NuageNet” and terminated on the hypervisor in a “NuageNet” LinuxBridge
– eth1 was connected to a completely private switch, shared with all the blades of the cluster.
With this initial sandbox infrastructure, when a new “Nuage sandbox ” was requested, a VLAN ID for the private data path was generated, and the VLAN was terminated on a new LinuxBridge on all the hypervisors that would host part of that new setup. This would theoretically ensure complete separation between all the different, isolated data paths.
This scheme worked but required centralized management of VLAN which became more and more difficult to manage and administer because we had to keep track of all the VLAN IDs used on all the hypervisors in the cluster.
The next evolution was to use OpenStack to automate the whole process. Nova gave us the ability to have one single API to manage our Virtual Machines on the entire Cluster. We no longer had to worry about which hypervisor was the most suitable and least crowded.
On the networking side, we started by using Neutron with the LinuxBridge plug-in in order to AutoProvision the VLAN IDs on all our blades.
In its latest evolution and with the development of our Neutron (then Quantum) Nuage plug-in, we moved the entire OpenStack-based infrastructure on top of the Nuage VSP.
Neutron runs as a Nuage plug-in connected to our Virtual Service Directory (VSD) and controllers (VSCs). Each server is re-cabled in order to be directly connected with a single 10 GigE interface to our 7850 Virtualized Services Gateway (VSG), where the VSG is also connected with 40 GigE to NuageNet.
Now, all the networks defined in Neutron are using VxLAN encapsulation over the Nuage Networks infrastructure. We defined a Public network in Neutron with an Uplink to NuageNet. Whenever a VM needs access to NuageNet, the 7850 VSG terminates the VxLAN encapsulation and sends the packet on the NuageNet Port/VLAN ID defined on our VSD.
The Nuage Networks VSP components on this cluster are updated with our internal beta releases prior to our official release. Using Nuage Networks VSP on our own critical infrastructure not only provides a valuable beta test environment, but also presents the same types of operational challenges with the same issues experienced by our customers. The very act of “eating our own dog food” provides insights into designing new features to make network provisioning and automation as seamless as possible.
Follow me on Twitter: @bvandewa