Backup and Restore of vCloud Director Consumer Workloads

Backup and restore (of consumer workloads) in a vCloud Director environment is a hot topic. When you deal with Pets (Vs. Cattle) it is important that you take care of your little lovely friends workloads. Part of the effort of taking care of them includes backing them up regularly and, more importantly, restoring them when needed.

This industry has achieved a high level of maturity in terms of best practices (and tooling) for backing up and restoring workloads running on vSphere virtual infrastructures. As we introduced an additional layer on top of vSphere (vCD) we broke, so to speak, some of the tools and many of the best practices. Even more challenging, we introduced concepts that didn't exist before in a virtualization scenario (cloud providers and cloud consumers).

People tend to always give a crisp yes / no when faced with the question "can you backup/restore workloads running in vCloud Director"? I think the matter is more complex than that. It really boils down to what you want to do (more on this later).

I was tasked (I actually volunteered) to double click on this. Admittedly I started this effort with a short minded view that was (on the line of) "let's find out which backup and restore tools integrate with vCloud Director". As I started to lay out the content it became very clear that I was trying to find out the micro-details without having clear the potential macro-architectures and big picture. I started to lay out the context and I thought that making it public would help gathering more feedbacks and getting valuable inputs on how to proceed. What you will see next is (more or less) part of the content I am working on. It goes without saying that this are the informal rants of a single cloud architect. This is not a VMware paper (as is) and you shouldn't refer it as such when pointing to this blog post.

Introduction to the vCloud Director Storage Layout

The figure below shows a high level view of the vCloud Director storage architecture.

There are a lot of considerations missing in the picture above in terms of how the storage stack is constructed in vCloud Director 5.1 (for example Storage Profiles, Provider vDCs, vSphere clusters, etc.) but there is enough information to describe the backup and restore process (and associated challenges).

First of all one can depict the multi-tenancy nature of vCloud Director where a single datastore/LUN (and host, for that matter) can be securely shared among different tenants (aka organizations).

vCloud Director presents a certain amount of (abstracted) storage to the tenant as a property of the organization vDC (aka Org vDC) the tenant has been assigned to. The tenant can consume that storage by creating VM disks as a property of a VM. The tenant does not care where that abstracted pool of storage resources are coming from.

Another important thing to notice in this simplified diagram is the fact that different actors can access the same resources at different levels. For example:

  • A tenant can access and can manipulate resources in its organization vDC whereas a cloud administrator can manipulate all resources across all tenants
  • A tenant can access a file on the VM file system by means of a Guest OS operation whereas a cloud administrator can access the same file mounting the VMDK at the ESXi host level
  • A tenant can perform limited manipulation on VMDK files via the vCloud APIs (e.g. independent disks, new in vCD 5.1) whereas the cloud administrator can fully manipulate them using traditional vSphere mechanisms

Infrastructure Visibility

This parameter, later used to characterize backup and recovery solutions, describes the level of access a given individual may have in a vCloud Director stack.

vCD uses a role-based model to assign proper rights to users. In the context of this document we will divide the cloud world in two macro roles: providers and consumers.

In vCD language, they are the cloud administrator and the organization administrator.

Note: We will consider roles like vApp user and vApp author being a subset of the organization administrator role and, as such, with a slightly limited visibility compared to the latter. We will just consider the organization administrator as the cloud consumer.

We introduce here two key concepts in cloud operations. These may be relevant in general for cloud but they are indeed very relevant for vCD cloud deployments.

These concepts are above-water visibility and below-water visibility. The water line alluded here is the line that separates cloud tenants from cloud administrators.

It is important for cloud administrator and cloud consumers to pay attention to this parameter (visibility) because that determines whether a given backup solution they are(respectively) building or consuming is available out of the box without customizations and on any vCloud Director deployment available.

“Above-water” Visibility

With above-water visibility (or consumer space) we refer to all of those operations that can be performed by a vCD tenant (specifically by an organization administrator) with an out of the box vCD. The emphasis here is on vanilla and out of the box.

These are all standard operations that any vCD tenant can perform regardless of the vCloud Director implementation (private or public that is).

This is a list of operations that, for example, an organization administrator can do above-water:

  • Creating a “backup server” inside the tenant to backup locally the files (inside the OS) of the production VMs

  • Manually copying vApps either in the same PvDC or in different PvDCs

  • Programmatically copying vApps either in the same PvDC or in different PvDCs

  • Leveraging independent disks to attach / detach VMDK files to stateless VMs

  • Leveraging independent disks (through attach / detach) to create Guest OS mirrors of production VMs.

Many of these approaches are usually typical of “design for fail” cloud models and don’t usually fly very well with customers with an Enterprise mind set.

Also, a missing out of the box object storage service in vCD limits the above-water backup and recovery use cases. An alternative workaround would be to setup a proxy inside the tenant that can backup to a third party public object storage service.

For example an object storage can be configured as a target in some traditional backup and restore tools or some third party public object storage services provide appliances (aka storage gateways) that can act as a proxy between a private set of servers and the public object storage service.

All of the above is considered above-water since this is something the tenant can implement without any interaction with the cloud provider and, more importantly, without any particular vCloud Director customization or extension.

This applies to any vCloud Director based cloud instance. “Below-water” Visibility

Describing below water visibility (or provider space) is fairly easy because it is, essentially, full visibility into the cloud stack. This is only available to the cloud administrator and, assuming the vCloud Director administrator is also the administrator of the infrastructure underpinning it (which is often the case), this includes visibility into a variety of tools and layers including, obviously, vCenter Servers.

The cloud administrator is the owner of the entire stack and can perform any operation at any level in the stack. This is obviously true within the boundaries of what it is supported by the integration of the various products in the vCloud Suite.

There are for example tasks that, while the cloud administrator can perform at a lower level, are not supported as they may break the layers above. Some of these tasks, for example, include (source: vCAT 3.0.2):

  • Editing virtual machine properties

  • Renaming virtual machine

  • Disabling DRS

  • Deleting or renaming resource pools

  • Changing networking properties

  • Renaming datastores

  • Changing or renaming folders.

In the context of backup and recovery of consumer workloads, operating at this level of the stack requires careful planning by the cloud administrator.

This is a list of operations that, for example, a cloud administrator can theoretically do below-water:

  • Backing up / restoring files inside tenants via VMware VADP

  • Backing up / restoring VMDKs inside tenants via VMware VADP

  • Backing up / restoring VMs inside tenants via VMware VADP

  • Backing up / restoring vCloud vApps inside tenants via VMware VADP

  • Other objects manipulation aimed at saving the state of those objects using vCenter administration level of access.

Some of the operations above, particularly the restore of vCloud objects, require particular attention and best practices.

Most vCloud implementations will vary below-water. This is true for many other operations but it is certainly true for backup and recovery operations. While there is a set of basic core functionalities a cloud admin can perform using VMware tools at this layer, most implementations will be complemented by peculiar backup and restore software products and, perhaps, particular configurations of the same backup and restore software products.

So while we consider the above-water zone to be consistent and standard across all vCloud Director deployments, we anticipate the below-water zone to be specific and peculiar for every deployment.

Backup and Restore levels

This is the second parameter that we will use later to characterize and segment backup and recovery solutions.

This is straightforward and describes the “what” in the backup and restore equation. What objects do tenants need to backup (and be able to restore)?

These objects and levels are discussed below in this section. The following picture summarizes them graphically.

File Level

This is the most atomic thing in the cloud consumer space that the tenant may want and can backup (and restore). It can’t get more granular than that. There isn’t a lot to say about it. A file inside a Guest OS file system is just a file.

Disk Level

This refers to the VMDK file associated to a given VM. It’s fair to see the VMDK as the drive of the VM. Note that by backing up the VMDK you are essentially backing up the entire state on disk of that Guest OS. In Microsoft Windows parlance, it’s like backing up the entire c:\ drive. The relationship between the VMDK and the files discussed above is 1:many.

VM level

This object includes the VMDK content as well the metadata describing the virtual machine. A VM is really the collection of the content of the (virtual) disk as well as surrounding data that describes the characteristic of the VM (number of vCPUs, amount of memory, number of vNICs, etc.). This information is saved in the vmx file (which sits next to the VMDK file, in the same folder). The relationship between the VM and the VMDK can be 1:many (limits apply, albeit it is often 1:1).

vApp level

This object describes the service (or the workload). A vApp is usually referred to as a collection of VMs but there are more to it than that. A vApp includes information such as vApp Networks (and associated network and security levels), VMs start and stop order, etc. vCD vApp metadata and vCD VMs metadata are also part of the properties of the vApps. The relationship between the vApp and the VM can be 1:many (limits apply)

Managed Service Vs. Self Service

This is the last parameter that we will use to characterize a backup and restore solution for vCloud Director consumer workloads.

At first this may sound like a duplicate of the above-water and below-water segmentation but it is not.

The infrastructure visibility parameter speaks more to the implementation of the cloud environment and the out of the box capabilities.

This segmentation speaks more to the operational aspect of performing backup and recovery of consumer workloads.

While it would be easy to mapping the above-water concept with self-service and mapping the below-water concept to managed services the reality may be more complex.

For example a given cloud service provider may offer managed services using above-water capabilities.

Or, even more interesting, a cloud consumer could experience a self-service experience using below-water capabilities (by means of third party portals or API extensions that the cloud administrator can expose to the tenant and that are not available out of the box with a vanilla vCloud Director setup).

Cloud Provider Managed Service

This is the scenario where the cloud administrator owns the operational aspects of backing up (regularly) and restoring (on a need basis) consumer workloads on behalf of the cloud consumer.

This is true regardless of:

  • Whether the cloud administrator uses an above-water (less likely) or a below-water (more-likely) strategy

  • What level of backup and restore is required (file, disk, VM or vApp)

In this scenario the cloud administrator usually have a set of policies in place to backup the consumer workloads (depending on the agreed SLAs) and the cloud administrator personnel perform the restore. Depending on the contract in place this could happen without consumer interaction or the consumer, by opening a ticket with the cloud service provider, could trigger the restore. In this scenario the self-service aspect of cloud is not leveraged and exploited.

Cloud Consumer Self Service

In this scenario the tenant is fully in control of the backup and restore operations.

This is true regardless of:

  • Whether the cloud consumer uses an above-water or a below-water strategy

  • What level of backup and restore is required (file, disk, VM or vApp)

There is typically no interaction between the cloud administrator and the tenant and every backup and restore operational aspect is available to the cloud consumer.

Note the nature of backup operations may vary depending on the implementation details.

For example in an above-water backup and restore strategy the tenants are responsible for building and consuming their own solution.

However, when a tenant is consuming, in self-service, a below-water solution implemented by the cloud service provider, backup operations may be driven by:

  • Pre-defined policies (e.g. all vApps placed in a given virtual datacenter will have a pre-defined backup policy)

  • Self-service policies (e.g. the tenant can interactively assign vApps to particular policies interacting with the cloud via third party service portals or API extensions)

Backup and Restore: Solutions Characterization

Why is this important? Ideally every backup and restore solution we will discuss in the context of this document can be characterized by this triplet we have defined:

  • Where? (above-water or below-water)

  • What? (files, disks, VMs or vApps)

  • Who? (tenant self-service or provider managed services)

The triplet above isn’t useful to describe the inner technical details of any backup and restore product. However it is very useful to describe the outer characteristics of any backup and restore solution.

Ideally, before talking about the actual implementation, cloud architects should be able to characterize a solution by the where / what / who parameters.

This is true for architects building clouds (e.g. “our vCloud Director based backup and restore strategy will allow tenants to restore VMs and vApps by opening a ticket with us. We will then leverage some of our below-water features not exposed to the tenants”).

Similarly, architects consuming clouds should be able to query potential cloud service providers about their backup and restore services using this framework (e.g. “we are looking for a vCloud Director based service that would allow us to restore files, disks and VMs in self-service leveraging below-water features”).

Note that, for the most part, the infrastructure visibility aspect (below-water, above-water) isn’t usually something a consumer may want to call out as a “requirement”. Ideally the consumer would always want something to be “above-water” because that means the solution could be implemented on any vCD based cloud should they choose another cloud provider. However, the reason for which a tenant may specifically ask for a below-water functionality is because they have enough know-how of the vCloud stack to require a particular and more efficient solution than what a tenant may be able to achieve above-water.

In summary, we have been introducing the concept of above-water and below-water.

We have then introduced the list of objects that could be a target for backup and restore operations.

Last but not least we have introduced the notion of self-service and managed services.

The following picture represents a self-service solution.

The following picture represents a managed services solution.

That's all (I can disclose). This is the framework I have been working on lately. As often happens to me, I can't tackle a very simple problem without having to put it into the bigger picture to contextualize it. Sorry about that.

While I do understand that many people are interested in "does backup product xyz talk to the vCloud APIs", I fear a simple yes or no doesn't cut it and doesn't put those people in a position to build a proper backup and restore solution for their vCloud Director based cloud.

Now, the next challenge is how to lay out (in a meaningful way) the research and unstructured work I have been doing to double click on actual solutions. What I have in mind right now (subject to change) is to describe in greater details a certain number of solutions and architectures (4? 6? 10?) that could be considered most common and best practices and characterize each of them with the "where" / "what" / "who" framework I discussed above.

This would let VMware customers and partners come up with their own additional solutions / combinations that they could characterize with the same framework. Just a thought at the moment.

Any comment or feedback that you may have, I am all ears.

Massimo.