vCloud Director Networking for Dummies

During the Beta phase of vCloud Director (aka Redwood) I put together a small deck called "Redwood Networking for Dummies". I have received a number of positive feedbacks so I decided to turn that document into a blog post. Networking in vCloud Director is certainly a controversial matter. I believe it is fair to describe it both complex and rich at the same time. There have been many attempts lately to describe it from the like of Duncan Epping and Hany Michael on their own Blogs. They have done a great job in getting into the details. However I'd like to try to give a different perspective on the subject. While I won't be able to avoid all of the technicalities, I'd like to give you a sense of the philosophy behind what we have built into the product. Last but not least note that there are a couple of approaches to describe networking in vCloud Director. The first approach starts with the cloud end-user in mind and describes how networking works in support of certain application deployments use cases. From there you can walk all the way down to describe what happens at the vSphere platform level. The second approach starts with the vSphere administrator in mind and describes how networking works building up from the vSphere constructs, all the way to what gets exposed to the cloud end-user.

In this post I am going to use the second approach. This is not because I believe it is the right one but simply because it is the one I am more comfortable with and the one that may serve better the readers of this blog. So let's get started.

Introduction to vCloud Director Networking

Before we get into the matter, you need to step back and think about the vCloud Director philosophy for a moment. Cloud is all about giving the end-users an unprecedented level of flexibility that allows them to do things that were only available to vSphere administrators before. In a way you can think of vCloud Director as an interface (or a proxy) into the virtual infrastructure. This allows vSphere administrators to give end-users a lot more flexibility, but at the very same time it allows them to keep full control of what end-users can do.

Achieving this level of cooperation and flexibility in the networking subsystem is no trivial task. Think about how it is difficult to implement something that allows an end-user to create, in self-service mode, separate layer 2 network segments, define custom layer 3 IP policies, configure services such as DHCP, NAT and Firewall... all without having to ask the vSphere / cloud administrator to do that for you, all without messing up with the cloud-wide setup, all without causing conflicts with the other tenants on the cloud. This is a titanic effort, believe me.

Explaining how networking in vCloud Director (vCD from now on) works is really like pealing an onion. If I was to explain it with the cloud end-user in mind I'd start from the outer part getting into the middle of it. Since I am going to explain it from the vSphere administrator point of view, I will have to start from the inner part of the onion building up the abstraction levels that the end-user will see in the end. This document will try to explain the three major networking levels within vCD. They are External Networks, Organization Networks and vApp Networks. These are in fact the type of networks you can instantiate.

Before we start discussing these three network layers we need to introduce another concept that is of paramount importance for vCloud Director operations: Network Pools. Think about it for a minute. How can we give an end-user a controlled way to deploy layer 2 networks? Layer 2 networks are usually vSphere PortGroups with an associated VLAN. How can you keep control of that? How can you let different tenants deploy these PortGroups keeping track of what's going on in the cloud (and in turn on the vSphere layer) and, in doing so, avoiding conflicts with similar deployments in other tenants (aka Organizations)? vCloud Director solves this problem using what we call Network Pools. A Network Pool is in fact a set of layer 2 networks that the cloud administrator has declared as "available". Think about it: in the old days when the end-user needed something like this he/she had to go to the vSphere admin which would in turns look into his/her VLAN CMDB (typically an EXCEL spreadsheet :-) ), he/she would chose an available VLAN and would create a PortGroup based on that. He/she would then connect the vNIC of this end-user's VM to the newly created PortGroup and advise the end-user that the change was made to the VM. In a cloud self-service model it doesn't work like this. If the end-user can deploy layer 2 networks there must be a CMDB that can be programmatically be accessed under the covers (by the end-user). Yes Network Pools are, in a way, that CMDB. More on this later.

External Networks

The vCD inner networking component is called External Networks. If you want your Organization (and in turns your vApps) to have connectivity to the external world you need to have External Networks. As the word implies, these are networks that are managed by someone that is typically external to the vCD environment and are identified by a vSphere PortGroup. That's in fact what you do when you create a vCD External Network: you point to an existing vSphere PortGroup. Essentially you are telling vCloud Director that there is a PortGroup that is able to provide external connectivity to your cloud environment. The typical example is a PortGroup with VLAN 233 (for instance) which can support native Internet traffic. For naming convention you will be calling this External Network something like Internet or Ext-Net-Internet. I usually suggest to name the vCD External Network after the vSphere PortGroup for ease of tracking. This is a picture that shows what it is. It's easy:

One of the most confusing points about the creation of the External Network is that the wizard asks for some layer 3 configuration parameters. In particular the wizard asks for a subnet mask, a default gateway, a DNS address and a pool of IP addresses. What are these parameters for? Well, remember that we said External Networks are networks that are built and maintained by an external entity; we are just "registering" these networks into vCD. What we are doing while filling this wizard is essentially telling vCD what layer 3 information to use when VMs will be connected to this network directly. In particular the IP pool that you need to configure is a pool of IP addresses that vCD will use to distribute IP addresses (and related layer 3 info) to VMs connecting to this network. So how do you fill that pool? You have to turn to the folks that administer that specific network segment and you need to ask them something like "can you reserve me a set of IP addresses that no one else will be using on your network and that I can dedicate to the vApps I will be instantiating on vCD?". In other words, how would you be able to instantiate vApps directly onto that network if you don't know which IP address to use? That's what that Static IP pool is. This doesn't tell the whole story on how VMs get their IPs when deployed. For this you need to be patient. We will get there.

Organization Networks

External Networks are easy. With Organization Networks things start to become more "interesting". In the previous section we have created cloud-wide external connectivity (i.e. External Networks). Now we are zooming inside an Organization. An Organization (or Org) is a logical construct within vCD that describes a tenant or a customer. Cloud end-users are defined inside each Organization. Each tenant can have three type of networks configured as you can depict from the picture below (you may not immediately get some of the acronyms and colored labels - no worries - it will be more clear later):

The first network is called External Organization Network (Direct Connect) and it's the simplest way to connect to the external world through an External Network. In this case nothing happens at the vSphere layer, this type of Org Network is a logical construct created inside vCD but doesn't really have any counterpart in the vSphere world: if you connect a vApp to this Org Network, the vNIC gets configured to connect to the Internet PortGroup (VLAN 233) in the example above. Not a big deal.

The second network is called External Organization Network (NAT / Routed). This network really represents a dedicated layer 2 segment that has its own private IP schema that the Organization can chose arbitrarily (for example 192.168.x.x). This private network is then routed to the External Network I have chosen to route to. Note that in this case you are still asked for those layer 3 IP information, however this time you can create them based on your specific needs because this segment is private and its layer 3 info are not going to overlap nor to be shared with anything else anyway. So how is this implemented at the vSphere layer? When you create such a network a good learning exercise is to switch to the vSphere client interface. There you will see a number of things happening: first a new PortGroup is deployed on-the-fly; this is the layer 2 dedicated segment that will support my Org Network. Consequently a new vShield Edge appliance is automatically deployed by vShield Manager. This appliance is effectively the routing device connecting your dedicated layer 2 network to the External Network (see picture). The Edge device is then configured with the appropriate layer 3 info you have filled in the wizard when creating this Organization Network. The Edge license provided with vCD supports NAT, Firewall and DHCP functionalities to protect and serve this dedicated layer 2 segment. At this point you may wonder how vCD and vSphere can "deploy a new PortGroup on the fly" to back this dedicated layer 2 network we need to create. They come from the Network Pools we have briefly mentioned above. When creating this network in fact the wizard will ask you for the layer 3 private schema as well as the Network Pool where to grab an available layer 2 network (think of it as an available VLAN for the moment).

The third network that it is possible to created within an Organization is called Internal Organization Network. As the name implies this network is only available internally to the Organization. vApps that are deployed to connect to this network cannot go outside through the External Network. In fact this type of network is similar to the External Organization Network (NAT / Routed) with the only exception that it doesn't connect to the external world. At this point you may wonder why there is an Edge deployed onto that PortGroup since there is no need to do routing. Remember that Edge also provides DHCP service to that segment so that's why Edge is optionally used if the Organization Administrator decides to enable DHCP on that segment (note DHCP is disabled by default).

So, in summary, this is how you can connect your VM vNIC:

Note that, for simplicity, the picture shows a VM that can connect to different Organization Networks. Most of the time VMs will have only one vNIC connected to either one of the Org Networks. However it is possible for a VM to have two or more vNICs. Also consider that vCD treats everything as a vApp. A single VM is in fact a vApp with one VM in it. Sometimes you will be using more VMs in a single vApp. Which brings us to the third type of network.

vApp Networks

So far we have seen cloud-wide networks (aka External Networks), Organization-wise networks (aka Organization Networks) and now we are going to investigate what we call vApp Networks which are, guess what, networks that are only available within a single vApp. This is something that you may want to do to either create and support secure n-tier applications deployments or to fence a vApp to an Organization Network. Fencing a vApp allows you to instantiate many times the same vApp onto an Organization Network preserving layer 2 and layer 3 information. In a way, fencing is a shortcut given to end-users in the vCD user interface to achieve transparently this cloning operation. From a vSphere perspective creating a vApp Network explicitly or taking the "fence shortcut" in the UI translates into the deployment of Edge devices as well as separate layer 2 networks from a vCD pre-defined Network Pool. Note I am oversimplifying a matter that is more complex than what I am trying to picture. That's because, right now, I am focusing more on what happens at the vSphere layer rather than focusing on the different end-user options vApp Networks and Fenced vApps have to offer.

While the configuration wizards may seem to be slightly different, note that the relationship between vApp Networks and Organization Networks is somewhat similar compared to the relationship between Organization Networks and External Networks. By this I mean that vApp Networks can connect directly to an Org Network (in which case the VM connects to the Organization Network PortGroup), the vApp Network can connect using NAT technologies to the Org Network (in which case a new layer 2 network is being deployed from a specified Network Pool and a new Edge is instantiated to connect to the Organization Network) or the vApp Network can be left isolated from the rest of the world (in which case a new layer 2 network is being deployed from a specified Network Pool and a new Edge is instantiated only if DHCP gets enabled). This sounds familiar if you think at the different Organization Network options. As a matter of fact we are effectively creating a similar stack at the vApp level and we could then plug this stack on top of the other stacks we created at the Org level. You remember the onion?

The picture below shows a VM that connects to a vApp Network where DHCP was enabled (note the presence of the Edge device):

This picture below, on the other hand, shows a VM connected to a vApp Network with external connectivity. Don't be confused by this picture: an Edge can Route/NAT to one and only one network at any point in time. In fact the Edge system vm always have a maximum of two vNICs: one that connects to the network to be protected and the other one connected to the network it needs to route/NAT too. The picture below shows all the possible configurations for the second Edge vNIC: External Org Net (Direct Connect), External Org Net (NAT/Routed) and Internal Org Net. Again: only one of these three connections can be active at any point in time. Note how a VM can be potentially NATted twice if connected to a NATted vApp Networks which in turns connect to a NATted Organization Network.

It may be interesting to call out that there are a few philosophical differences between how you create, configure and deploy vApp Networks compared to Organization Networks. Org Networks are created by the cloud administrator (on behalf of the Organization administrator) and when the cloud admin starts the creation process the wizard asks interactively for "which Network Pool to use to grab an available layer 2 segment". We do not want to expose that question to the end-user when he/she creates a vApp Network. After all this end-user may not even have a clue what a Network Pool is and perhaps it may not even know what a layer 2 network is. To overcome this we associate a network pool to the Organization vDC. In this case when a user creates a vApp Network a layer 2 network is grabbed from the Network Pool associated to the Organization vDC the user is deploying the vApp to. This also comes handy to keep control and keep track of layer 2 network usage. When you associated a Network Pool to an Organization vDC you can set a limit on the number of segments any user in that organization can grab. In fact if you associate a Network Pool with 100 networks in it, you don't want someone creating 100 vApp Networks in half a day and consume the entire Network Pool immediately. This is helpful to set limits on what an Organization can do (and possibly charge accordingly).

I am not going to cover use cases of where and how to use combinations of vApp and Org Networks to create secure deployments because in this post I wanted to give you more the sense of what happens from a vSphere and cloud administrator perspective rather than from a cloud end-user perspective.

Network Pools

At this point you may have an overall understanding of what a Network Pool is and why it is used. In summary it is a small CMDB that contains layer 2 segments available to vCD administrators and end-users. Note Network Pools need to be created before we start deploying the actual networks we have described above (with the exception of the External Networks because they don't use Networks Pools).

So far we kept referring to a "layer 2 segment" as a PortGroup with an associated VLAN id. This is correct but it doesn't tell the whole story. There are really three different type of Network Pools one can create.

VLAN-backed Network Pools: this is the easiest to get. You can, for example, create a Network Pool and give it a range of VLAN ID 100 to 199. Whenever you grab one of these IDs because you need to deploy a new layer 2 segment, vCD will tell vCenter "please create on the fly a PortGroup, and give it VLAN ID 100". The next time there is a need for another layer 2 segment vCD will tell vCenter "please create on the fly a PortGroup, and give it VLAN ID 101". And so on. Of course if one of these networks is destroyed during the lifecycle of the cloud, the corresponding VLAN ID gets put back into the pool of available networks to be deployed.

PortGroup-backed Network Pools: it is similar to the VLAN-backed. The difference is that the PortGroups need to be pre-provisioned on the vSphere infrastructure and they need to be imported into vCloud Director. So vCD won't tell vCenter to create these on the fly, they are already there pre-provisioned. Why using this? Well there are some circumstances where vCenter cannot easily (programmatically) create PortGroups on the fly. This is the case when you use vSphere Standard Switches (as opposed to Distributed Switches) or when you use the Nexus 1000v (at the moment vCD cannot manipulate programmatically Port Profiles).

vCloud Director Network Isolation Network Pools: This is when things start to get interesting (again). We use a technique called Mac-in-Mac to create layer 2 separated networks without using VLANs. Yeah that's right. This is extremely useful for big environments where VLAN management is problematic, either because there is a limited number of VLANs available or because keeping track of VLANs is a big management overhead (especially if you use an excel spreadsheet to do that :-) ). When you create such a Network Pool you only specify how many of these layer 2 networks you want this Network Pool to have and you are done. When vCD starts to deploy PortGroups from this Network Pool you won't see any VLAN associated to them but they are indeed different layer 2 segments.

Now the acronym VCD-NI and the labels Preprovisioned and Created-on-the-fly in the pictures above should make more sense to you. Try to go back and have a look at them again.

Virtual Machines IP management

First of all note you cannot connect a vNIC to an External Network directly. You can however connect the vNIC to either an Organization Network or a vApp Network.

Now the question is: what happens when you connect a vNIC to either an Organization Network or a vApp Network? How do you control the layer 3 behavior? As we said, you have a choice of connecting each vNIC of the VM to an Organization Network, a vApp Network or leave the vNIC not connected. In the example below I have connected it to a vApp Network as you can depict from the name (vAppInternal). If you chose to connect it to a network you have three choices on how to get an IP. See the "IP Mode" drop-down in the picture:

Static IP Pool: this is the pool of IP addresses that you have configured when you created the network you are connecting to. This is the private IP Pool range you had to configure when creating a vApp Network, an External Organization Network Routed/NAT or an Internal Organization Network. In case of an External Organization Network Direct Connect the IP Pool range configured when creating the External Network it connects to will be used. It is important to understand that from a VM perspective this is considered a Static IP Address, it only happens to come from a pool that vCD controls. The first IP available in the Static IP Pool gets "plugged" into the VM (as a static address) at Guest Customization time.

DHCP: I guess this is self-explanatory. In that case the vNIC will search for a DHCP lease on the network it connects to. If it's a vApp Network, an External Organization Network Routed/NAT or an Internal Organization Network, this will have to come from the Edge DHCP service. If it's an External Organization Network Direct Connect it will have to be a DHCP that is available on that PortGroup associated to the External Network (in which case this would be out of the scope of vCloud Director).

Static Manual: This is used in those situations where you do not want or cannot use either one of the two above. You have to manually enter the IP address into the vCD interface and make sure it is the same you have entered into the Guest OS of the VM you are working on. It goes without saying that this manual IP address cannot fall into the same range of the DHCP scope nor the Static IP Pool if you want to avoid potential IP conflicts.

Conclusions

In conclusion I hope I managed to give you a different perspective on how vCD networking works and especially the logic behind it. I covered the three main network layers and I have then focused a bit on the concept of the Network Pools and how Virtual Machines can be configured to connect to the available networks inside the Organization (including vApp Networks). Remember that complexity, in this case, is directly proportional to the richness of configurations and options available to the cloud end-user to consume "self-service".

Massimo.