Exposing Physical Layouts to Virtual Machines

There have been quite a lot of discussions lately on the VMware forums about topics related to exposing physical hardware layouts to virtual environments. Specifically I am referring to things like: RDM's, NPIV's and Virtualization of I/O. There might be other stuff being discussed but these three are those about which I'd like to throw my two cents.

I am sure most of the readers know what an RDM is: Raw Disk Mapping is a method by which you expose to a virtual machine an entire SAN LUN instead of letting ESX create a VMFS volume. People usually do this for performance reasons (is there really a performance gain?) and also to use SAN specific tools so that your virtual machines could interact directly with your storage area network for things like snapshot commands and similar tasks.

NPIV (or N_Port ID Virtualization) is a new SAN concept/technology that allows you to associate more than a single ID per port. Some IHV vendors (and VMware) are enabling NPIV type of features so that a virtual machine can become a visible object for SAN administrators and our SAN folks can have a virtual-machine aware deployment (while usually the SAN folks would only have knowledge of the ESX servers as of today).

Virtualization of I/O is the next step for hardware assists technologies (such as Intel VT and AMD-V) that will allow virtual infrastructure users to map directly physical I/O adapters from within a given virtual machines. The idea would be to map directly into a Windows/Linux virtual machine a physical ethernet/scsi/san adapter and the reason for doing this is primarily for performance increase.

I strongly believe that these three (and other potentially similar technologies/ideas) will not be very relevant for the future of virtual infrastructures. This is of course my own opinion and it comes from this very simple concept: the main idea of a virtual infrastructure is to de-couple the various subsystems. So that, in practice, you want to reach a state-of-the-art where you deploy a workload without being bound to the specific hardware resources layout. All the advantages of the virtual infrastructure (D/R, flexibility, consolidation, fast deployments of new workloads, portability etc etc) are derived from this simple concept. So why would you want to expose, within a virtual machine (that is "the workload"), how your physical resources are being laid out? In order to understand why I am so cold about these technologies aimed at exposing the physical layout within the virtual machine objects we also need to touch briefly on a new concept that is being discussed: Virtual Appliances. Without even getting into the details of the concept around virtual appliances, it is enough to say that virtual infrastructures are going to evolve in a way for which most of the "infrastructure" related functions that we are used to see today in standard OS'es (clustering, security, resource scheduling etc etc) are going to drain into the virtual infrastructure letting the OS included in the virtual machine only provide application API's support. So my very basic idea is that this industry is moving (or should be moving) towards developing technologies aimed at injecting into the core virtual infrastructures the functions that are today available in standard Windows/Linux operating systems instead of working to develop technologies that are able to pretend to manage a virtual machine as you manage a physical Windows/Linux host today.

If you will, think about the virtual infrastructure as a sort of Datacenter OS whereas you should consider your virtual machines as the business logic or the applications. This is where, in my opinion, we are heading to. So let me ask you again my question: why would you want to expose (to your applications) how you have laid out your physical resources?

Back to the point, in reality there is not such a huge performance advantage in using RDM's Vs minidisks on a VMFS LUN's. It's interesting to notice that many think that using a big file system on top of which you put big files (minidisks) is not as performing as using raw disks. Most are missing that the vast majority of the performance overheads happen because of the virtualized OS/drivers stack and not because of the presence of the VMFS file system per se. And even assuming that you have between 0-10% of a performance advantage, would you be prepared to trade off for this the flexibility that encapsulating a vm into a minidisk provide? Not to mention that using RDM's requires a direct relationship between the virtual machines and the storage layout which simply means breaking rule #1 of virtualization that is de-coupling the dependencies of the various objects and subsystems. I do appreciate that today an organization might need to use RDM's because of more tight storage integrations (which is not available directly on the virtual infrastructure) but if things go as they should in the future these integration features should be available from within the virtual infrastructure and there won't be any need to expose the physical storage API's interfaces into each guest OS to take advantage of these.

For the same concept you wouldn't even (ideally) want your SAN administrators to be dealing with virtual machines. The fact that SAN administrators today get frustrated because they "can't see which virtual Windows/Linux servers use what" is due to a legacy because yesterday, in the physical world, they would put a couple of HBA's into each Windows/Linux server and they would see every mapping. I don't think that, because yesterday we were doing it that way, we should do it tomorrow as well. I would want my SAN admin to concentrate and only see the nodes that comprise my virtual infrastructure and whatever runs above it it's a matter of creating the right policies and priorities to better use the SAN bandwidth they are providing us with.

Last but not least virtualization of I/O. Certainly this could boost the performance of a given I/O bound vm but in this case "more performance" really means "less overhead" which in turns means less CPU cycles wasted trying to virtualize an I/O device rather than exposing it directly to a virtual machines. While, ideally, I find very appealing that, at the same performance rate, this solution would allow me to save 300% of CPU cycles.... do you really bother since AMD and Intel started shipping cores like one would give away peanuts? When you get to 8-cores CPU's in a couple of years (perhaps less) will you primary concern be "how can I save CPU cycles"? Don't get me wrong: I am not stupid and I am not saying that to me consuming 20% of a core or consuming 100% of two cores is the same thing. My point is: what will that low CPU consumption cost me in terms of flexibility? If I want to use this my virtual machine would require to be bound to the hardware I put it on (unless I start doing odd things such as populating all hosts with the very same adapter in case I want to move around my vm). Quite frankly if this is the trade-off I have to pay I would rather let this application on a physical host rather than moving it to a virtual machines that has constraints similar to that of a physical host.

Closing this post I want to share with you my own rule of thumb: whenever possible try to use technologies that allows you to fully de-couple the workloads you are going to host on you virtual infrastructure from the physical layouts and technologies used to implement it. I guess it was already clear though. Just think about how the vmotion constraints are limiting the potential flexibilities. As you know the only thing that, in a virtual environment, is not "virtualizable" by design is the CPU (i.e. the CPU gets physically exposed to the virtual machine). This is causing all sort of compatibility issues because you can't migrate a virtual machine from CPU a to CPU b unless they belong to the same compatibility group. This is just an example of what happens when you expose the physical layout of a device all the way through the virtual machine. It's interesting the fact that now everybody is trying to look at how to make these different CPU technologies compatible so that we can achieve these transparent migrations without bothering to buy the same CPU one has bought 2 years ago.

Exposing physical hardware technologies and/or physical hardware layouts to virtual environments is not, in my own opinion, a very good practice and should be avoided unless tactical deployments to achieve specific results today require you so. So in summary I usually suggest not to use RDM's (unless strictly necessary for tactical integrations) and I am very cold regarding future technologies such as NPIV (at least in a virtual environment) and virtualized I/O (at least in the way it's being proposed today to be implemented).

This is of course just my personal opinion.

Massimo.