Virtualization The Future: How Does vSphere Replication Work?

Tuesday 29 April 2014

How Does vSphere Replication Work?

With SRM 5 we introduced a new alternative for replication of virtual machines called "vSphere Replication" or "VR" for short. There has been some excellent conversation about VR generated by presentations at VMworld and the release of SRM on the 15th of September.

We've also received a lot of questions about the details of VR, and thought this would be an excellent venue and opportunity to give you some more detail on how it actually works behind the scenes to protect your VMs.

What is vSphere Replication?

It is an engine that provides replication of virtual machine disk files that tracks changes to VMs and ensures that blocks that differ within a specified recovery point objective are replicated to a remote site.

How does VR work?

Fundamentally, VR is designed to continually track I/O destined for a VMDK file and keep track of what blocks are being changed. There is a user-configured Recovery Point Objective for every VMDK, and the job of VR is to ensure that the blocks that change are copied across the network to the remote site at a rate sufficient to keep the replica in synch with the primary in accordance with the configured RPO.

If VR is successful in doing so, the replica at the remote site will be able to be recovered as part of a recovery plan within SRM.

How do you configure VR?

This is not very difficult at all! There are a few places you can configure VR for a VM or set of VMs, either from within SRM or even by directly editing the properties of the VM from within the vSphere Client.

For example, you can right-click on a VM and select "vSphere Replication" as one of the popup menu items:

Once you select VR properties you can choose an RPO, a source VMDK, target folder, or even a pre seeded copy of the VM at the remote site to act as the replica!

How VR determines what is different and what needs to be replicated

There are two forms of synchronization that VR will use to keep systems synchronized. When VR is first configured for a virtual machine you can choose a primary disk file or set of disk files and a remote target location to hold the replica. This can be an empty folder, or it can be a copy of the VMDK that has the same UUID as the primary protected system.

The first thing VR will do when synchronizing is read the entire disk of both the protected and recovery site and generate a checksum for each block. It then compares the checksum mapping between the two disk files and thereby creates an initial block bundle that needs to be replicated on the first pass to bring the block checksums into alignment. This happens on port 31031.

This is called a "full synch" and only happens very rarely: Usually just on the first pass when the VM is configured for VR, but can also happen occasionally during other situations such as when recovering from a crash.

The ongoing replication is by use of an agent and vSCSI filter that reside within the kernel of an ESXi 5.0 host that tracks the I/O and keeps a bitmap in memory of changed blocks and backs this with a "persistent state file" (.psf) in the home directory of the VM. The psf file contains only pointers to the changed blocks. When it is time to replicate the changes to the remote site, as dictated by the RPO set for the vmdk, a bundle is created with the blocks that are changed and this is sent to the remote site for committing to disk. This replication happens on port 44046.

How is the schedule for replication determined? Can I create my own schedules?

You can not create your own schedules for replication, because there is a lot of intelligence built into the algorithm used by VR to ship blocks.

Based on the RPO that acts as the outside window for replication, VR will attempt to send blocks using some dynamic computation to figure out how aggressively it needs to send data.

If, for example, the RPO is set for 1 hour and there is a very small historical change rate to blocks, VR does not need to act aggressively. We take into account the last 15 transfers to the remote site to calculate on average how much data is likely to be shipped in the current bundle. If the data took on average for example 10 minutes to ship and commit we estimate that we will not need more than 10 minutes for the next set of data and can schedule a start time to initiate the next transfer some time below 49 minutes to stay within the 1 hour RPO.

If, however, the RPO is set to 1 hour and we historically are taking 35 minutes to ship and commit, then we know that eventually we will exceed our RPO as that extra 5 minutes beyond the half-way point will eventually catch up to our RPO even if we start shipping blocks immediately on completion of the previous bundle!

So the point is that VR takes all of these factors into account and will set its own schedule to ship changed block bundles, depending on a number of factors such as how large the transfer size is, how much change is taking place, how long it has taken in the past to ship, and so forth, and will adjust or set alerts accordingly.

How data gets transferred and how it gets written

Because the VR agent works with a passive filter that tracks changes, all we worry about is changed blocks, not the format of the disk or file system or anything.

At the recovery site you will need to deploy a virtual appliance called the "vSphere Replication Server" (VRS) that acts as the target for the VR agent. The VRS receives the blocks from the agents at the protected site and waits until the bundle is completely received and consistent, then passes it off to the ESXi's network file copy (NFC) service to write the blocks to its target storage that we specified when we configured VR for the protected VM. The result is that the entire process is abstracted from the storage until the blocks are given to the NFC and that means we can mix and match storage: We can have thick or thin provisioned VMDKs on either site, and use any type of storage we choose at either site. The NFC of the host the VRS interacts with just writes to a VMDK. In essence, the VRS receives the block transfer, the NFC writes it out. It's important to note that the traffic from the VR agent is sent across the vmkernel management NICs of your ESXi hosts, so be aware you will see a lot more traffic on those switches.

Hopefully this gives you a little more insight into how vSphere Replication works. If you've got questions or want more detail, please leave a note in the comments! If you think vSphere Replication is great or 'not' please let me know that as well, and let's talk about why you think what you do.

We've got high hopes that VR will give our smaller customers a new capability to approach DR, and our larger customers the ability to tier out their replication offerings.

Info taken from blogs.vmware.com

Thanks to Ken Werneburg

Virtualization The Future

Pages

Translate

Total Pageviews

My YouTube Channel