Tuesday, June 16, 2009

Thinking about Cloud to Cloud Interoperability Use Cases

Cloud computing is a term applied to large, hosted datacenters, usually geographically distributed, which offer various computational services on a “utility” basis. Most typically the configuration and provisioning of these datacenters, as far as the services for the subscribers go, is highly automated, to the point of the service being delivered within seconds of the subscriber request. Additionally, the datacenters typically use hypervisor based virtualization as a technique to deliver these services. The concept of a cloud operated by one service provider or enterprise interoperating with a clouds operated by another is a powerful idea. So far that is limited to use cases where code running on one cloud explicitly references a service on another cloud. There is no implicit and transparent interoperability. In this article, I write about use cases for interoperability, and an architecture for Intercloud standards.

Of course from within one cloud, explicit instructions can be issued over the Internet to another cloud. For example, code executing within Google AppEngine can also reference storage residing on AWS. However there are no implicit ways that clouds resources and services can be exported or caused to interoperate.

In this Blog I come up with two main use cases for Cloud Interoperability; the first is a use case involving a physical metaphor (servers, disks, network segments, etc). The second is a use case involving an abstract metaphor (blob storage functions, message queue, email functions, multicast functions, etc). We look at cloud interoperability challenges using use cases illustrating the two major personality types of clouds.

Virtual Machine Instantiation and Mobility

One of the most basic resources which cloud computing delivers is the Virtual Machine, which is a physical metaphor type of resource. One way or another, a subscriber requests the provisioning of a particularly configured virtual machine with certain quantities of resources such as memory processor speeds and quantities.. The format of this request varies widely by cloud computing platform and also is somewhat specific to the type of hypervisor (the virtualization layer of the operating system inside the cloud computing platform). In a few seconds they receive pointers and credentials with which to access it. The pointers are usually the MAC and IP addresses and sometimes a DNS name given to the VM. The credentials are usually a pair of RSA keys (a public key and a private key, which one uses in the API to speak with the VM). Most often, the VM presents an x86 PC machine architecture. On that VM, one boots a system image yielding a running system, and uses it in a similar manner as one would use a running system in your own datacenter.

VM Mobility is that feature in a particular hypervisor which allows a running system to be moved from one VM to another VM. As far as the running system is concerned it does not need to be reconfigured, all of the elements such as MAC and IP address and DNS name stay the same; any of the ways storage may be referenced (such as a World Wide Name in a SAN) stay the same. Whatever needs to happen to make this work is not the concern of the running system.

VM Mobility has been implemented with several hypervisors but there are limitations. Usually these limitations are a result of the “scope” of applicability of the network and storage addressing. Typically, VM Mobility is restricted to a Layer 3 subnet and a Layer 2 domain (for VLANs) because the underlying network will support the VM operating outside of the local scope of those addresses. Needless to say, the network addressing scheme in a cloud operated by an entirely different service provider is not only a different subnet but a different class B or class A network altogether. Routers and switches simply would not know how to cope with the “rogue” running system.

Another aspect is that, the instantiation instructions of the VM for the running system are very specific to that cloud computing platform and the hypervisor which it uses. We would want to re-issue some of these instructions to the new cloud so that the VM it delivered onto which the VM would move, was as suitable as the first VM which was provisioned for us. If the new Cloud takes an entirely different set of instructions, this is another barrier to VM Mobility.

All of this assumed that in the universe of cloud computing systems out there, we were able to find another cloud, which was ready, willing, and able to accept a VM mobility transaction with me. And that I was able to have a reliable conversation with that cloud, perhaps exchanging whatever subscription or usage related information which might have been needed as a pre-cursor to the transaction, and finally that I had a reliable transport on which to move the VM itself.

Storage Interoperability and Federation

Now let us consider an interoperability use case involving an abstract metaphor. In this case, we are running script or code in my datacenter or in the cloud, which is utilizing Cloud based storage functions. In cloud computing, storage is not like disk access, there are several parameters around the storage which are inherent to the system, and one decides if they meet your needs or not For example, object storage is typically replicated to several places in the cloud, In AWS and in Azure it is replicated three places. The storage API is not explicit in this, but implicitly, we know that a write will return as successful when one replicate of the storage has been affected, and then a “lazy” internal algorithm is used to replicate the object to two additional places. If one or two of the object replicates are lost the cloud platform will replicate it to another place or two such that it is now in three places. A user has some control over where the storage is, physically, for example, one can restrict the storage to replicate entirely in North America or in Europe.

There is no ability to vary from these parameters; that is what the storage system provides. One would have thought that there might be several API’s each with a different underlying characteristic, and, you could always use a “better” service implementation than the API demanded. To this end, we do envision other providers implementing say, five replicates, or a deterministic replication algorithm, or a replicated (DR) write which doesn’t return until and unless n replicates are persisted. One can create a large number of variations around “quality of storage” for Cloud.

In the interoperability scenario, suppose AWS is running short of storage, or wants to provide a geographic storage location for an AWS customer, where AWS does not have a datacenter, it would be sub-contracting the storage to another service provider. In either of these scenarios, AWS would need to find another cloud, which was ready, willing, and able to accept a storage subcontracting transaction with them. AWS would have to be able to have a reliable conversation with that cloud, again exchanging whatever subscription or usage related information which might have been needed as a pre-cursor to the transaction, and finally have a reliable transport on which to move the storage itself.

Although the addressing issues are not as severe in this case where an abstract metaphor is used, the naming, discovery, conversation setup items challenges all remain.

1 comment: