VMware vCenter Site Recovery Manager Release Notes
VMware vCenter Site Recovery Manager 5.0 | 14 SEP 2011 | Build 474459
Last updated: 09 MAR 2016
Check for additions and updates to these release notes.
What's in the Release Notes
These release notes cover the following topics:
VMware vCenter Site Recovery Manager 5.0 enhances your ability to build, manage and execute reliable disaster recovery plans for your virtual environment. With the release of version 5.0, VMware has expanded the capabilities of Site Recovery Manager to provide unprecedented levels of protection. New use cases have been made possible through the addition of the following capabilities:
vSphere Replication. When used in conjunction with VMware vSphere 5.0, Site Recovery Manager 5.0 introduces a new capability to utilize the vSphere 5.0 host to perform replication of powered-on virtual machines over the network to another vSphere 5.0 host, without the requirement for storage array-based replication. As virtual machines change with use, the changed blocks are replicated to a shadow copy of the virtual machine resident at the recovery site, in accordance with a Recovery Point Objective set as a property of the virtual machine itself.
Planned Migration. A new workflow designed to deliver migration while minimizing the risk of data loss. Planned migration will stop the workflow from continuing if an error is encountered, providing an opportunity to fix the problem, ensuring that systems are properly quiescent and that all data changes have been completely replicated.
Automated Re-Protection. Re-protection is a new extension to recovery plans for use only with array-based replication. Automated re-protect enables the environment at the recovery site to establish replication and protection of the environment back to the original protected site through a single click.
Automated Failback. Automated failback returns the entire environment to the originally protected primary site. This can only happen after re-protection has ensured that data replication and synchronization have been established to the original primary site. Failback will run the same workflow that was used to migrate the environment to the protected site, ensuring that the critical systems encapsulated by the recovery plan are returned to their original environment. Automated failback, like re-protection, is only available for use with array-based replication protected virtual machines.
Enhanced Dependency Definition. This includes the addition of more (5) priority groups, and the ability to set virtual machine dependencies within a priority group. Virtual machine dependencies can be defined to ensure that required systems are available before dependent virtual machines are powered on. This enables highly organized workflow control, ensuring that required services are available before dependent virtual machines are powered on.
VMware vCenter Site Recovery Manager 5.0 is available in the following languages:
- Simplified Chinese
SRM Compatibility Matrix
For the current interoperability and product compatibility matrix, please see the Compatibility Matrixes for VMware vCenter Site Recovery Manager.
Compatible Storage Arrays and Storage Replication Adapters
For the current list of supported compatible storage arrays and SRAs, please see the Site Recovery Manager Storage Partner Compatibility Matrix.
VMware VSA Support
Virtual machines that reside on the vSphere Storage Appliance (VSA) can be
protected by SRM 5.0 using vSphere Replication (VR). VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.0.
Installation and Upgrade Notes
For administration, installation, and upgrade documentation, please see the Site Recovery Manager Administration Guide.
For an evaluation guide to assist with a technical walkthrough of major features and capabilities of Site Recovery Manager 5.0 please see the VMware vCenter Site Recovery Manager Resources for Business Continuity.
Upgrading from a Previous Release:
SRM 5.0 can be upgraded in-place from SRM 4.1 and 4.1.1 only. For the supported upgrade paths of SRM 5.0 update releases, see the release notes for those update releases. In-place upgrades are recommended rather than fresh installation as this will preserve all history reports, recovery plans, protection groups and customizations of recovery plans.
SRM 5.0 can run with previous versions of vSphere (4.0, 4.1) and Virtual Infrastructure (3.5) only if using array-based replication. If vSphere Replication is to be used (either alone or in conjunction with array-based replication) then vSphere hosts must be upgraded to version 5.0 as part of the upgrade process.
High-level upgrade steps for SRM 5.0 involve:
- Uninstall the SRM plugin from the vSphere Client.
- Halt, and upgrade the protected site vCenter Server service to version 5.0.
- If the intent is to use vSphere Replication you must upgrade vSphere hosts to ESXi 5.0 at this point.
- Halt, and upgrade the protected site SRM service to version 5.0.
- Upgrade or install the Storage Replication Adapter compatible with SRM 5.0. This SRA must be certified for use with SRM 5.0 by VMware.
- Repeat this process for the recovery site after ensuring base vSphere and vCenter Server functionality is working correctly. SRM will not function until both sites have been upgraded completely.
- Download and install the SRM plugin to the vSphere Client.
- Launch SRM and ensure the SRA is correctly refreshed.
- Pair the sites, array managers, and devices. Enable array pairs.
- Execute the srm-migration command line utility to import previous recovery plans, protection groups, history reports, scripts, and IP customization.
For full documentation on the upgrade process, please see pp. 33-37 of the Site Recovery Manager Administration Guide.
SRM 5.0 includes 33 new API operations, many of which are site-specific and allow for automation of processes unique to each site. New API operations in SRM 5.0 include:
Please note VMware will be publishing an updated comprehensive guide to using the SRM SOAP-based API to be found at VMware vCenter Site Recovery Manager API.
Open Source Components
The copyright statements and licenses applicable to the open source software components distributed in vCenter Site Recovery Manager 5.0 are available at Download VMware vCenter Site Recovery Manager for IT Disaster Recovery. You can also download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent generally available release of vCenter Site Recovery Manager.
Caveats and Limitations
Interoperability with Storage vMotion and Storage DRS
Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.0 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters.
Interoperability with vCloud Director
Site Recovery Manager 5.0 offers limited support for vCloud Director environments. Using SRM to protect virtual machines within vCloud resource pools (virtual machines deployed to an Organization) is not supported. Using SRM to protect the management structure of vCD is supported. For information about how to use SRM to protect the vCD Server instances, vCenter Server instances, and databases that provide the management infrastructure for vCloud Director, see VMware vCloud Director Infrastructure Resiliency Case Study.
Re-protect and automated failback not supported with vSphere Replication
Re-Protect and Automated Failback is only supported with array-replicated virtual machines. Virtual machines configured with vSphere Replication cannot be failed back automatically to the original site using existing recovery plans.
SRM conversion from per-CPU to per-VM license count is incorrect
Some customers who purchased SRM 1.x and SRM 4.0 may still be using per-CPU allocated licenses. There is a known issue in the conversion process from these older per-CPU licenses to per-VM SRM 5.0 licenses wherein the formula for counting how many CPU licenses are used is too lenient. It is possible in this scenario that the conversion will incorrectly grant too many per-VM licenses for SRM 5.0. This is incorrect behavior and will be fixed with a patch for SRM. Please note your license conversion count carefully to ensure you are compliant with the terms and conditions of your license agreement and have not exceeded the number of protected VMs for which you are licensed.
Array Based Replication (ABR) & vSphere Replication (VR) Scalability Limits
The scalability limits listed in the Site Recovery Manager Administrator's Guide are incorrect, but will be updated soon. The correct scalability limits are as follows:
Array Based Replication (ABR) Scalability Limits
|Protected VMs per protection group
|Protection groups per recovery plan
vSphere Replication (VR) Scalability Limits
|Protected VMs per single protection group
|Protection groups per single recovery plan
The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.
- NEW A vulnerability in the glibc library allows remote code execution
Your vSphere Replication appliance might be impacted by a vulnerability in the glibc that allows remote code execution.
Workaround: For more information about how to work around the issue, see http://kb.vmware.com/kb/2144289
- SRM May Encounter Errors Mounting Datastores During Recoveries
During a test failover or actual failover, SRM waits for recovered datastores to become available. After datastores become available, SRM attempts to mount any datastores that are not mounted. In rare instances, these datastores are automatically mounted before SRM can mount them. If this occurs during a test failover, the failover does not complete. If this occurs during an actual failover, the failover completes with an error. To resolve this issue, retry the failover.
- Virtual Machines Replicated to Sub-Folders Create Unexpected Folders
When configuring vSphere Replication (VR) for a virtual machine, it is possible to configure the virtual machine to be replicated to any directory on a datastore at the recovery site. If the directory selected is at the root directory, replication completes as expected. If the directory selected is below the root level, the virtual machine is not replicated to the specified directory. Instead, a new directory is created with a name composed of the different folder and sub-folder names. For example, /path1/path2/path3 becomes /path1path2path3. The virtual machine is replicated, but not to the intended location. To avoid this issue, replicate virtual machines to folders at the root level.
- Stopping the SRM Service While Mapping Resources Causes vSphere Client Failure
If the SRM service is stopped while a user is configuring resource mappings using the vSphere Client, the client should present a connection error message. Instead, the client enters an unresponsive state. To resolve this issue, terminate the vSphere Client process using Windows Task Manager. To avoid this problem, ensure the SRM service is running while configuring resource mappings.
- Users With Insufficient Privileges May Create Misleading Conditions Replication
The vSphere Replication Wizard can be used to configure replication for virtual machines. Use of this wizard is restricted to users with the appropriate permissions. If a user with insufficient permissions attempts to reconfigure replication using the vSphere Replication Wizard, an error appears at the end of the wizard indicating that the operation is not permitted. The following error appears:
Unable to determine the controller type for the disk '[Disk Name] VMDK name.' Permission to perform this operation was denied.
Replication for the virtual machine continues as expected, despite the vSphere Client indicating problems. To resolve this issue, a user with sufficient permissions can re-run the operation to resolve the messages in the client.
- Pairing or Breaking Pairing of vSphere Replication Management Servers (VRMS) Fails With LockingFailedException
In rare cases, when pairing or breaking pairing between VRMS servers, if the vSphere service is being stopped at the same time, the operation fails with the following exception:
LockingFailedException: Failed write-locking object: com.vmware.hms.db.entities.HmsLocalServerEntity:VRM Server GUID
To resolve this issue, restart VRMS.
- Temporary Loss of vCenter Server Connections May Create Recovery Problems for Virtual Machines with Raw Disk Mappings
If the connection to the vCenter Server is lost during a failover, one of the following may occur:
- The vCenter Server remains unavailable, the failover fails. To resolve this issue re-establish the connection with the vCenter Server and re-run the recovery.
- In rare cases, the vCenter Server becomes available again and the virtual machine is recovered. In such a case, if the virtual machine has raw disk mappings (RDMs), the RDMs may not be mapped properly. As a result of the failure to properly map RDMs, it may not be possible to power on the virtual machine or errors related to the guest operating system or applications running on the guest operating system may occur.
- If this is a test failover, complete a cleanup operation and run the test again.
- If this is an actual failover, you must manually attach the correct RDM to the recovered virtual machine.
Refer to the vSphere documentation about editing VM settings for more information on adding raw disk mappings.
- Cancellation of Recovery Plan not Completed
When a recovery plan is executed, an attempt is made to synchronize virtual machines. It is possible to cancel the recovery plan, but attempts to cancel the recovery plan execution do not complete until the synchronization either completes or expires. The default expiration is 60 minutes. The following options can be used to complete cancellation of the recovery plan:
- Pause vSphere Replication, causing synchronization to fail. After failover enters an error state, use the vSphere Client to restart vSphere Replication in the vSphere replication tab. After replication is restarted, the recovery plan can be run again, if desired.
- Wait for synchronization to complete or timeout. This may take considerable time, but does eventually finish. After synchronization finishes or expires, cancellation of the recovery plan continues.
- Valid Certificates Produce Warnings
When uploading and installing certificates to the vSphere Replication Management Server (VRMS), the following error occurs:
The certificate installed with warnings. Remote VRM systems with the 'Accept only SSL certificate signed by a trusted CA'
option enabled may be unable to connect to this site for the following reason:
The certificate was not issued for use with the given hostname: VRM hostname
This error can be ignored, or you can avoid this error by using a supported browser other than Internet Explorer.
- Valid Certificates Produce Warnings
vSphere Replication Management Servers (VRMS) fail to pair due to issues with certificates when both of the following conditions are met:
- The server's certificate has a hostname value that does not match the VRMS address.
Hostname values might include subject name or alternative subject names. The VRMS address is specified in the Virtual Appliance Management Infrastructure (VAMI) during application configuration.
- The peer VRMS server has a valid chain of trust to a certificate authority (CA) for the certificate.
This happens if the certificate was issued by a trusted CA such as Verisign or Go Daddy or by an unknown CA whose certificate was added to the peer VRMS server's hms-truststore.jks file, thereby establishing trust.
To resolve this issue, use one of the following techniques:
- Use the VAMI's Generate and Install functionality to create and use a new certificate.
- Use the VAMI to upload a certificate with hostname values that match the VRMS server.
- Remove the certificate from the self-signed certificate authority from the peer VRMS's hms-truststore.jks file.
- Per-Plan Option to Ignore VMware Tools Heartbeat not Available
Recovery plans typically wait for virtual machines to issue VMware Tools heartbeats. In previous versions of SRM, it was possible to disable this check on a per-recovery plan basis. Enabling or disabling this option at the recovery plan level is no longer available. Disabling waiting for tools heartbeats can now be achieved for the entire system using advanced settings or for individual virtual machines.
- Installing Multiple vSphere Client SRM Plug-Ins Creates Problems
Only one version of the vSphere Client SRM plug-in should be installed at any given time. Users are not prevented from installing the client plug-in for version 4.0.x or 4.1.0 over the plug-in for SRM 5.0. Before installing any new versions of the client plug-in, uninstall existing SRM plug-ins. If multiple vSphere Client plug-ins have been installed, uninstall all client plug-ins, download the plug-in from the server to be managed, and reinstall.
- Protection Groups Report Errors Due to Virtual Machines Being not Synchronized
When a virtual machine is protected by vSphere Replication, that virtual machine is replicated to the recovery site. Replication operations have timeout values that cancel operations that appear not to be making progress. If a virtual machine is not replicated, the protection group it belongs to enters an error state. In some cases, virtual machines cannot be replicated before the timeout, despite the process proceeding as expected. For example, when a large virtual machine is created or many changes are made to a virtual machine, the replication required may exceed the available time. To resolve this issue:
- Increase the recovery point objective until replication completes.
- Increase the timeout in the SRM configuration file.
For example, to change the replication timeout to two hours or 7200 seconds, the configuration file would be modified as follows:
- SRM Service Stops After Waiting for Database Response
SRM waits for responses to database queries. If network bandwidth is insufficient or the database is overloaded, a response may not be returned before a timeout occurs. In such cases, one of the following errors is entered into the SRM log:
- Panic: Timed out while waiting for 'DrReplicationProviderInterface' (300 seconds)
- Panic: Timed out while waiting for 'DrReplicationRecoveryInterface' (300 seconds)
- Panic: Timed out while waiting for 'DrStorageManager' (300 seconds)
To avoid this issue, you can increase the amount of time that SRM waits for responses to queries. For example, to change the timeout to fifteen minutes or 900 seconds, the vmware-dr.xml configuration file would be modified as follows:
- Licensing For Evaluation Mode SRM Displays Incorrect Information
Licenses can be managed using the Manage vSphere Licenses wizard. When this wizard is used in an SRM installation running in Evaluation Mode, the information displayed about the number of licenses may be incorrect. In evaluation mode, the vSphere Client may indicate that there are unlimited licenses available. vCenter Server tracks the number of licenses available and used, and restricts licensing appropriately.
- vSphere Client Displays Outdated Virtual Machine Recovery Status
As a recovery plan executes, the virtual machine recovery status changes are not automatically reflected in the vSphere Client. To resolve this issue, click Refresh.
- Users Not Notified About Rejected Operations
If a user attempts to remove a virtual machine from a protection group, but does not have sufficient privileges, the operation fails, but the user is not notified of this failure. The system functions as expected, but not seeing an error message may lead some users to believe the task completed.
- Reprotect May Allow Users to Complete Unpermitted Tasks
When users execute a reprotect operation, tasks may be completed that exceed what granted privileges should allow them to do. Tasks users might complete, despite their lack of permissions include creating virtual machines on the reprotect destination site or protecting new devices on the reprotect source site. This may occur if a reprotect is initiated after other users have deleted virtual machines on the old production site or modified those virtual machines to which they were failed over.
- User Interfaces Changes Unexpectedly During Planned Migrations
During planned migrations, virtual machines are deactivated and synchronized to the recovery site. During this process, protection groups and their virtual machines are temporarily displayed in the Partially Recovered state. The vSphere Client enables buttons such as Edit Protection Group and Remove Protection Group, as would be appropriate for these states, but operations triggered by these buttons do not succeed. These varying states and button availability may be confusing to users. These changes are normal, should be expected during any planned migration, and can be ignored.
- Recovery Histories May Lost After Upgrade to SRM 5.0
During a server migration upgrade, the installer offers the default site name, rather than the site name currently in use. If you do not replace the default site name with the current site name, recovery histories may be lost. To avoid this issue, change the site name to the name used in your deployment. If you do complete an upgrade with the default site name and the recovery history is lost, you can restore the 4.1 database from backup and then perform the upgrade again, this time providing the correct value for the site name.
- VRM server Does Not Support PKCS#12 Files That Are Not Password Protected
PKCS#12 files are typically protected with a password. VRM server does not support certificates that are not password protected. It is possible to browse for and select a file that is not password protected, but the certificates are not installed. To avoid this issue, use password protected PKCS#12 files.
- License Information Uninformative with Default Site Names
SRM license information is labeled in vCenter using the SRM site name. If the default site name is used and multiple SRM servers are registered in the same vCenter group, the default SRM server naming makes the licensing information indistinguishable. To avoid this issue, specify unique site names during installation.
- Getting Started Pages not Discoverable
The Getting Started pages include links and information that will help you configure SRM and VR. If you have previously closed all Getting Started tabs, you can restore them by selecting the option in the Client Settings dialog under the Edit menu.
- Installing SRM on Machines with non-ASCII Characters Causes Access Problems
If SRM is installed on a machine with a name that includes non-ASCII characters, problems with user access may occur. Because the machine name is used to construct URLs, those URLs that contain non-ASCII characters may not be valid. To avoid this issue, install SRM on machines with names composed entirely of ASCII characters.
- Non-ASCII Passwords Not Accepted For Log In To Virtual Appliance Management Infrastructure (VAMI)
Users can manage the vSphere Replication Management Server (VRMS) using VAMI. Attempts to log on to VAMI with an account with a password that uses non-ASCII character fails. This occurs even when correct authentication information is provided. This issue occurs in all cases where non-ASCII passwords are used with VAMI. To avoid this issue, use ASCII passwords or connect using SSH.
- Non-ASCII DNS Suffixes Are Not Set Correctly After Customizing Windows XP And Windows 2003
If you enter a non-ASCII DNS suffix in the DNS tab of the IP Settings section of VM Recovery Properties dialog to customize Windows XP or Windows 2003, the customization is reported as successful but the non-ASCII DNS suffix is not set correctly. To avoid this issue, set the DNS suffix manually in Windows XP and Windows 2003.
- Custom Recovery Steps with Non-ASCII Character Output Cause SRM Servers to Crash
Do not create custom recovery steps that produce non-ASCII output. If you add such steps to a plan and then complete a test or recovery, the SRM server crashes. If a plan with a step that produces non-ASCII output is used, the plan enters a state where it cannot be modified or deleted. Create a new plan that does not produce non-ASCII output, use that recovery plan for protection groups, and remember not to use the non-ASCII plan in future tests or recoveries. Restart any servers that have crashed due to the non-ASCII character output.
- Powering Off Virtual Machines During Test Cleanup Fails
During test cleanup, virtual machines are powered off, which can be a storage intensive operation. In rare cases, the power off may take 30 minutes or more, exceeding the 15 minutes that SRM waits for virtual machines to power off. If the 15 minute timeout is exceeded, the message Error - Operation timed out: 900 seconds appears. To resolve this issue, wait for the power off to complete, and then run the cleanup again. You may also manually power off the test virtual machines.
- Running vSphere Clients Unable to Manage vSphere Replication Servers After Installation
The vSphere Replication (VR) Server is managed using the vSphere Client. vSphere Clients enable VR management for installed VR Servers when the SRM plug-in is enabled. When a vSphere Client with the SRM plug-in starts, it finds all available VR Servers, and presents them as possible choices to be managed. If VR Servers are configured after a vSphere Client is started, those newly configured servers and not automatically discovered. To ensure all available VR Servers are presented as choices for administering VR, after installing or configuring a VR Server:
- Activate the SRM plug-in.
- Restart the client.
Either choice causes the client to refresh the list of available VR Servers.
- Recovery Plan Configuration Interface Displays Invalid Network Options
When configuring a recovery plan, networks must be selected for association with virtual machines. The recovery plan user interface displays DVS uplink port groups as possible selections, even though these are not valid options. Do not select DVS uplink port groups.
- Outdated Replication Status Displayed if Datastore Becomes Unavailable
It is possible that after virtual machine synchronization begins, the target datastore becomes unavailable. In such a case, the group status should display information about this failure, but the status remains unchanged. To identify issues related to datastore unavailability, use the events generated by the target datastore. The following events are generated in such a case:
- Datastore is not accessible for VR Server... Generated immediately after datastore becomes inaccessible
- Virtual machine vSphere Replication RPO is violated... Replica can not be generated within the specified RPO
- Virtual Machine Status Does not Update After vSphere Replication Management Servers Disconnection
vSphere Replication Management Servers (VRMS) exchange information between sites about virtual machine replication. If the VRMS connection is broken by the Break VRMS connection option, exchange of information about replicated virtual machines is not automatically re-established when you reconfigure the VRMS connection. This may result in incomplete information about virtual machine replication, including which virtual machines are available to add to protection groups. To resolve this issue, restart the SRM service.
- Information Displayed in vSphere Client May Be Outdated
During modification of SRM data, such as the contents of recovery plans, information displayed may not update as SRM information is modified. For example, after adding a non-critical virtual machine to suspend to a recovery plan, the information about this change is not automatically displayed. To resolve these sorts of issues, click Refresh, if available, or navigate away from the page with the outdated information and then navigate back. This causes the contents of the screen to be refreshed, providing the latest information.
- Stopping Datastore Replication for Protected Virtual Machines Produces Incorrect Error Messages
It is possible to protect a virtual machine that has disks on multiple datastores and then subsequently disable replication for one of the datastores. In such a case, the virtual machine's status in the protection group changes to Invalid: Virtual machine 'VM' is no longer protected. Internal error: Cannot create locator for disk'2001'... This information is incorrect. The status should change to Datastore '[datastore name]' is no longer replicated.
- Invalid Certificates Are Rejected During Authentication
SRM supports certificate-based authentication. If an SRM server has an invalid certificate, when the vSphere Client uses the SRM plug-in to connect to the SRM server, errors occur. Invalid certificates can occur if any certificate in the certificate chain is expired or not yet valid. In such a case, the following message appears: The SSL connection to the remote host has terminated. The remote host certificate has these problems: A certificate in the host's chain is not time-valid. To resolve this issue, install a valid certificate on the SRM Server.
- Rapid Deletion and Recreation of Placeholders Creates Problems
If you delete all the placeholder virtual machines from a datastore, unmount the datastore, and then remount the datastore, you may have to wait for 10 minutes before recreating the placeholder virtual machines. Recreating the placeholders within 10 minutes of unmounting the datastore may cause the operation to fail and the fault NoCompatibleHostFound to occur. To avoid or resolve this issue, wait more than 10 minutes before attempting to recreate the placeholder virtual machines.
- Installation Fails Due to Policy Restrictions
Installation of SRM may fail with the error The system administrator has set policies to prevent this installation. This is typically the result of new settings applied either during operating system upgrade or patch application. Resolve this issue using the solution that is relevant to your situation:
- If updates to Windows have resulted in increased restrictions that are preventing the installation, complete the installation with an account with sufficient permissions. Gaining the required permissions can be achieved by logging in to an account with sufficient permissions and either completing the installation using that account or granting permissions to another account, which is then used to complete the installation.
- If errors are due to the file being rejected due to the digital signature policy, install hotfix 925336.
- Attempts To Use User-Generated Certificate During Modify Operation Fails
Attempts to perform an installation's Modify operation fail when using a certificate with a subject alternative name (SAN) other than the SRM machine address used during the initial installation. The SRM machine address used in certificate's SAN must match the exact value entered in the SRM installation. The value could be the machine name, a FQDN, or an IP address.
If this problem occurs, generate a certificate with a SAN that matches the value used during the initial installation, and then use this certificate during the installation's Modify operation. If a certificate cannot be generated with a matching value, you must reinstall the server preserving database contents.
- Recovery of Powered Off Virtual Machines May Fail
Before you can recover a virtual machine, SRM must replicate that virtual machine from the protected site to the recovery site. In certain cases, virtual machines protected by vSphere Replication (VR) are not replicated, and, as a result, test recovery is not possible. These cases are as follows:
- When replication is first established, there is a brief time before replication is completed. If a disaster occurs before the virtual machine is replicated, it is not possible to recover the virtual machine. There is no way to avoid this issue.
- During a test recovery, only powered on virtual machines are synchronized when the Replicate recent changes to recovery site option is checked. This means virtual machines that have not been powered on since VR protection was established or those that have not completed their initial synchronization cannot be recovered using a test recovery. To avoid this issue, make sure the initial synchronization is complete before running a test recovery.
- For a powered off virtual machine that has not completed its initial synchronization, if Planned Migration is run without the Replicate recent changes to recovery site option checked, that Planned Migration fails because there is no virtual machine data on the recovery site. To avoid this issue always enable Replicate recent changes to recovery site for such virtual machines during Planned Migration.
- Virtual Machine Recovery Fails Due to Disk Configuration Error
It is possible to place different disks and configuration files for a single protected virtual machine on multiple datastores. During recovery, SRM must have access to raw disk mapping and parent disk files. Without this access, SRM cannot determine disk types during recovery. In such a case, SRM may assume that a Raw Disk Mapping (RDM) disk is a non-RDM disk, resulting in a failed reconfiguration. To avoid this issue, ensure all hosts that can access recovered virtual machine configuration files can also access RDM mapping files and any parent disks, if such disks exist.
- User Supplied Certificates for VRMS Must Include a Subject Alternative Name That Is the FQDN of VRMS
When using a user provided certificate, the Subject Alternative Name must be the fully-qualified domain name (FQDN) of the vSphere Replication Management Server (VRMS). If an OpenSSL certificate authority is being used, modify the OpenSSL configuration file to include this information. An example of this required configuration information is:
subjectAltName = DNS:VRMS.example.com
- SRM Server Fails to Start After Network Reconfiguration
Changing the SRM server name or IP address is not supported. The SRM server machine name and IP address is saved during the initial installation, and this value cannot be changed within the SRM system. To resolve this issue, uninstall SRM, ensuring that the SRM database contents are preserved, and then reinstall SRM using the current machine name, address, and the saved database.
- Pairing Sites Fails Due to Different Certificate Trust Methods
When pairing SRM sites, the error Local and Remote servers are using different certificate trust methods appears. This occurs when the root certificate for the Certificate Authority (CA) signing the certificate is missing on the SRM server. To resolve this issue, install the root certificate for the SRM certificate's signing Certificate Authority using Microsoft Management Console. After installing the certificate, perform an SRM installation Modify operation to provide the user-generated certificate again.
- Operations May Fail When Many vSphere Clients Connect to SRM
SRM uses a large fixed-size pool of connections to communicate with vCenter Servers and with the SRM server at the other site to carry out tasks. The number of available connections decreases as the number of vSphere Clients connected increases. In many cases, this does not negatively affect system performance, but in the event of many simultaneous operations, such as protecting or unprotecting a large number of virtual machines, or running or testing a large recovery plan, problems may occur. Operations may timeout or fail with various explanations. If this issue occurs it can be addressed by closing extraneous client connections with the SRM server. Note that connections that are abandoned but not cleanly closed may take up to 30 minutes to timeout. To more quickly terminate abandoned connections, restart the SRM server.
- Incorrect Icon May Be Used for Placeholder Virtual Machines
Under some circumstances, the vSphere Client fails to download the placeholder virtual machine icon. If this happens, a generic solution virtual machine icon is displayed instead. Restarting the vSphere Client may resolve this issue.
- Recovery Fails to Progress After Connection to Protected Site Fails
If the protection site becomes unreachable during a deactivate operation or during RemoteOnlineSync or RemotePostReprotectCleanup, both of which occur during reprotect, then the recovery plan may fail to progress. In such a case, the system waits for the virtual machines or groups that were part of the protection site to complete those interrupted tasks. If this issue occurs during a reprotect operation, you must reconnect the original protection site and then cancel and restart the recovery plan. If this issue occurs during a failover, it is sufficient to cancel and restart the recovery plan.
- Attempts to Reprotect Mixed Mode Recovery Plans Result In Reprotect Incomplete State
Protection groups may be SAN-based or vSphere Replication (VR) based, and recovery plans may be composed of both types of protection groups. Reprotect cannot be executed on a plan that includes a mix of both SAN-based and VR-based protection groups. Reprotection functionality does not check to ensure that the protection groups are not solely SAN-based or VR-based. If an attempt is made to reprotect protection groups that are composed of a mix of protection group types, the process fails, leaving the recovery plan in a Reprotect Incomplete state. To avoid this issue, do not create recovery plans that mix protection types. To avoid this issue remove the VR-based protection group before reprotecting.
- VRM Server Generic Error
The following error message may appear: VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: hostname. This error occurs when the vSphere Replication Management Server cannot resolve the IP address of the host listed in the error message. To resolve this issue, see if there are any DNS configuration issues that can be resolved. Alternately check if non-fully qualified domain names (non-FQDN) names are used when FQDN names are required.
- User Enabled Alarm Generates no Results
Users can enable the Reconfigured protection settings for VM alarm, but this alarm is never triggered. This occurs because users cannot edit protection settings in this release. To track user-initiated changes to virtual machine settings in SRM, administrators must enable the Reconfigured recovery location settings for VM alarm.
- User Attempts to Configure vSphere Replication Fail
Users with administrative privileges may still be unable to configure vSphere Replication. This is often the result of not being granted the VRM Datastore Mapper > View privilege. To resolve this issue, grant to appropriate user accounts this privilege at the protected site.
- vSphere Replication Management Server (VRMS) Fails to Support Valid ESX Hosts
During vSphere Replication (VR) configuration, when a datastore is being selected on a supported version of ESX, the message VR server Server Name has no hosts through which to access destination datastore ... appears. This occurs due to a valid host being tagged as UNSUPPORTED, even though the host is a valid selection. During VR Server registration, the event Host is configured for vSphere Replication is not triggered on the host. Tagging hosts as unsupported occurs because of an error in or interruption of communication among VRM Servers and VR Servers during VRM Server registration or when connecting a new host to vCenter Server inventory. Communication problems typically arise due to temporary loss of connectivity or the Server services being stopped.
To resolve this issue and enable hosts for use with VR, you must complete the following steps:
- Browse the VRM database at the recovery site.
- Remove any records from the HostEntity table with state 4 (UNSUPPORTED), which are not referenced from the HbrHostEntity table (vcMoId value). If there are existing references from the HbrHostEntity table (the vcMoId value), do not remove the record, but update the state value from 4 to 1 (ACTIVE).
- Disconnect the host from the vCenter inventory and then re-connect it.
- Verify that the event Host is configured for vSphere Replication is triggered on the host.
An example of the sort of queries you might use to fix this issue in the VRM Server database are as follows:
- To query all hosts tagged UNSUPPORTED, which are not associated with a VR Server:
select * from HostEntity h where state=4 and not exists (select * from HbrHostEntity where h.vcMoId=HbrHostEntity.vcHost_vcMoId).
- To query all hosts tagged UNSUPPORTED, which are associated with a VR Server:
select * from HostEntity h where state=4 and exists (select * from HbrHostEntity where h.vcMoId=HbrHostEntity.vcHost_vcMoId).
- To clean up UNSUPPORTED records for hosts which are not associated with a VR Server:
delete from HostEntity where state=4 and not exists (select * from HbrHostEntity where vcMoId=HbrHostEntity.vcHost_vcMoId).
- To change state to ACTIVE of hosts tagged UNSUPPORTED, which are associated with a VR Server:
update HostEntity set state=1 where state=4 and exists (select * from HbrHostEntity where vcMoId=HbrHostEntity.vcHost_vcMoId).
- Stopping and Restarting Replication for Virtual Machine Disks Leaves Disks Unreplicated
After rapidly disabling and enabling replication for vSphere Replication-protected disks on a virtual machine, their replication status may be Not Active. To resolve this issue, unconfigure and reconfigure replication for the virtual machine.
- User Customized Advanced Setting Lost During Upgrade
When upgrading from SRM 4.1 to SRM 5.0, all user customized advanced settings are replaced with initial defaults. To address this issue, reapply customized settings after the upgrade. For in-place upgrades, reference information about pre-upgrade custom settings can be found in C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\vmware-dr.xml.bak.
- NFS Identified by Variable Means Do not Provide Consistent Protection
NFS volumes are mounted as datastores with an IP address or hostname. vCenter uses the IP address or hostname to track which NFS mounting maps to which datastores. As a result, the same volume from the same NFS host can be registered as a different datastore when mounted with a different IP address or hostname.
Storage replication adapters (SRAs) return a list of addresses during device discovery. SRM makes a best effort to match the addresses the SRA provides with mounts and datastores. SRAs return a group of settings to SRM. SRM presents these values as elements of the array configuration, but does not control these settings in any way. The storage address and port is one of these values, and this value is used when configuring and detecting mount points. If an SRA only presents a storage port value, SRM does uses this port value alone in looking for matches, but other valid values may also exist.
In such a case where the identifier used by an array manager is not the same as the identifier used by SRM, datastores are not protected as expected. Virtual machines on such shares are not protected and are not shut down during failover.
To avoid this issue, always use the same IP address or hostname for mounting NFS datastores and for storage port filtering by the array manager.
- Network Information Fields Are not Populated with Valid Information When IPv6 is Required
If the vSphere Replication Management Server (VRMS) is configured to use IPv6 alone, the following problems occur:
- In the Virtual Appliance Management Infrastructure (VAMI) startup page, the vCenter Server address field is pre-populated with an IPv4 literal address. This address is unusable in IPv6-only appliances. To resolve this issue, edit that field and enter a valid DNS name or IPv6 literal address.
- The Site Name field is not be pre-populated. This field is typically pre-populated with the virtual machine name of the appliance in the vCenter Server inventory. To resolve this issue, enter a valid site name. A valid site name is different from the site name of the peer site.
- Datastores Fail to Unmount When on Distributed Power Management (DPM) Enabled Clusters
Planned migrations and disaster recoveries fail to unmount datastores from hosts that are attached to a DPM cluster if the host enters standby mode. The error Error: Cannot unmount datastore datastorename from host hostname. Unable to communicate with the remote host, since it is disconnected may appear. To resolve this issue, turn off DPM at the protected site before completing planned migrations or disaster recoveries. You can choose to turn DPM back on after completing recovery tasks.
- Synchronize Storage During a Recovery Produces Errors
In rare cases, if synchronization is enabled, during the Synchronize Storage step, the following error may occur: VR synchronization failed for VRM group groupname. VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: 'Optimistic locking failure'. This is more likely to occur when there is a large amount of data to transmit during running a recovery plan, such as when there some combination of a greater number of virtual machines and a greater amount of data to synchronize. Despite the error, the synchronization does succeed. Running the recovery plan again should result in completion without errors.
- Failure to Connect to NFC or Copy Error During Recovery
During a recovery, the following error messages may occur:
- Failed to connect to NFC service at host.
- Failed to copy file filePath to newFilePath: errorCode - errorMsg.
This occurs when completing a large number of simultaneous operations. This typically occurs under the following conditions:
- When recovering 40 or more virtual machines that are protected using vSphere Replication (VR).
- When recovering 10 or more datastores on an ESX 3.5 host.
- When running a large number of recovery plans when the recovery plans affect virtual machines that have Raw Disk Mapping (RDM) disks. This typically occurs when running 20 or more recovery plans on 4.x hosts or when running 40 or more recovery plans on 5.0 hosts.
To resolve this issue, re-run the affected recovery plans.
- Next Button on Configure Replication Wizard May Not Advance the Wizard
The Configure Replication wizard contains a page titled VR Server. This page includes a Next button that is always enabled, but which may not advance the wizard to the next page. This occurs when SRM is configured with a single VR Server, and that server is in the disconnected state. To resolve this issue, register another VR Server or change the status of the current server to Connected.
- vSphere Replication Servers Do not Have Specified IPv6 Addresses
During the deployment of the vSphere Replication (VR) Server and the vSphere Replication Management (VRM) Server, users can specify IPv6 addresses to be used by those servers. If those addresses are surrounded by brackets, the address is not properly applied. To resolve this issue, provide the IPv6 address without brackets.
- vSphere Replication Management (VRM) Server Does not Support Replicating Virtual Machine Templates
VRM Server does not support replicating virtual machine templates. The option to configure replication for templates is not available. Virtual machines that are replicated by VR but are subsequently converted to templates do not continue to be replicated. To avoid ongoing replication issues for virtual machines that are converted to templates, complete the following process:
- Connect the vSphere Client to the vCenter Server and click the Site Recovery Manager plug-in.
- Select vSphere Replication in the left pane, select the remote VRM server, and click the Virtual Machines tab.
- Select the virtual machine that has been converted to a template, click Remove Replication, and confirm this choice.
- Virtual Machines Choosable After Replication Configuration Fails
When using the wizard to configure replication for multiple virtual machines, some virtual machines may not be configured successfully. Despite the fact that their configuration failed, the virtual machines are listed as choices in the Create Protection Group wizard. This is because the information that the configuration failed is not communicated. To resolve this issue, restart the SRM server at the protected site.
- Custom Recovery Step with Start Command Fails to Complete
Custom recovery steps can execute commands. If a custom recovery step executes a Start command with no parameters, the command fails to complete. For example, the command c:\windows\system32\cmd.exe /C start does not complete. To resolve this issue, manually close the command window with the start command.
- vSphere Replication (VR) Servers Deployed With an Unspecified Network Configuration Malfunction
VR Servers are deployed from an OVF file using the OVF deployment wizard. The deployment wizard includes a page for specifying the VR Server's network configuration. If no network settings are specified for the network configuration, DHCP addressing is used, but VR Servers do not support DHCP addressing. To avoid this issue, specify valid network settings for the VR Server during deployment.
- Large Disks Perform Full Sync Unnecessarily
When disks larger than 256GB are protected using vSphere Replication (VR), any operation that causes an internal restart of the virtual disk device causes the disk to complete a full sync. Internal restarts occur any time:
- A virtual machine is restarted
- A virtual machine is vMotioned
- A virtual machine is reconfigured
- A snapshot is taken of a virtual machine
- Replication is paused and resumed
The full sync is initiated by ESX, and any resolution to this issue would involve an update to ESX. These syncs involve additional I/O to both the protected and recovery site disks, which often takes longer than the Recovery Point Objective (RPO), resulting in a missed RPO target. To avoid this issue, use smaller disks.
NOTE: This has been fixed in ESXi Server 5.0 update 1.
- Recovery Plan Fails to Complete After Disconnecting Storage
While testing or running a recovery plan, the recovery plan may fail with the error Error creating test bubble image for group... at the steps where virtual machines are being powered on. This error occurs after a LUN that stores replicated data is disconnected from the ESX server and then reconnected. After the reconnection occurs, replication continues as before, but the LUN is assigned a new internal identifier. This new identifier is not associated with protected virtual machines. Because of this, SRM can neither synchronize the storage nor create an image of the replicated virtual machine. As a result, the entire power on step fails. To resolve this issue, you must reconfigure replication using the following procedure:
- Clean up the recovery plan that has just failed.
- Start the Reconfigure wizard for affected the protection group.
- Change the location of files that are on the datastore that has been disconnected. Select the same datastore and folder locations.
- Agree to reuse the existing disks, as suggested by the wizard. Reconfigure the virtual machine.
The protection group enters a full sync state, during which data consistency is checked. Wait for the process to complete.
- Planned Migration May Result in Slowed ESX Hosts
During planned migration, SRM first instructs ESX hosts to unmount replicated datastores and detach the LUNs backing these datastores. Next, SRM instructs storage array software to make the detached LUNs read-only. This process helps ensure that devices on ESX hosts do not encounter an All Paths Down (APD) condition for the datastores and LUNs being migrated. Migrating a virtual machine with RDMs may result in the RDM LUNs entering an APD condition. After RDMs enter an APD condition, ESX hosts continue to reattempt to establish connectivity with the lost RDM LUNs. As the number of unavailable RDMs increases, the number of ESX host attempts to reconnect to the lost RDMs increases correspondingly. As this proceeds, the ESX host may become slow to respond and vCenter Server may eventually find the hosts unresponsive. This is more likely to occur with certain storage arrays. For example, this is more likely when an SRA supports on iSCSI target per LUN. To resolve these issues, reboot the ESX host.
- Failed Recovery Results in Repeating SRM Server Failures
After a recovery plan fails to make progress, users may choose to re-run the recovery plan. In rare cases, rerunning the recovery plan causes the SRM server to crash. After restarting the server, the recovery plan continues to cause the server to crash. To resolve this issue, begin by reviewing the contents of the logs to find the virtual machine that was being modified just before the crash. The entry typically takes the form IP customization succeeded for VM VM Name. Once you know which virtual machine completed IP customization just before the crash, take one of the following steps:
- If this failure occurred during a test, unregister the placeholder virtual machine that was affected.
- If this failure occurred during a real recovery, unregister the recovered virtual machine, and then manually customize the network settings.
After resolving the issue for the virtual machine, restart the SRM server and re-run the recovery plan. In rare cases, other virtual machines may be in the same state, meaning that even after one virtual machine is repaired, failures may continue and further action is required. If failures continue, repeat the process of reviewing the logs, finding the affected virtual machine, and resolving the issue. If there are errors during the cleanup workflow, select the Force cleanup checkbox to return the system to the ready state.
- Events not Properly Displayed for Korean Operating Systems
When the vSphere Client starts, it determines the locale on which it is running, and then chooses the set of messages to display based on the locale. When the vSphere Client is installed on a Korean operating system, the client requests messages from the ko folder from the vCenter Server installation because the vCenter Server and the vSphere Client are localized for Korean. While the vCenter Server and vSphere Client are localized for Korean, SRM is not. Therefore, XXX messages are displayed, instead of SRM server messages. To resolve this issue, create copy of the en folder which is in C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vcDr\locale\. Rename the folder from en to ko and restart the vCenter Server and SRM services.
- Generic Error Message Is Displayed When Server Pairing Fails Due to Certificate Policy Strictness
Attempts to pair servers between sites may fail, displaying the following error message: Site pairing or break operation failed. Details: VRM Server generic error. This error may occur when one site being configured to use a strict certificate policy and the other site being configured to use a lenient certificate policy. In such a case, the pairing should fail, as it does. After such a failure, modify the lenient certificate policy to use strict certificate policy and provide a valid certificate.
- Using SRM 5.0 With Older Versions of ESX May Result In Permanent Device Loss
Permanent Device Loss (PDL) may occur during planned migrations and test failovers. When SRM is protecting virtual machines running on ESX 5.0 hosts, many cases where PDL could occur are properly handled, and PDL is avoided.
- Some SRAs handle certain timezones incorrectly during failover
Test and real failovers can stop with the error
Failed to create snapshots of replica devices for group 'protection-group-999' using array pair 'array-pair-999': Vmacore::SystemException "The parameter is incorrect. " (87). This error is due to a mishandling of the time zone returned by the storage array to the SRA. All timestamps earlier than January 1 1970 will experience this issue. For details and a workaround, see KB 2018597.
- A recovery or test workflow fails for a virtual machine with the following message: Error - Unexpected error '3008' when communicating with ESX or guest VM: Cannot connect to the virtual machine.
Under rare circumstances this error might occur when you configure IP customization or an in-guest callout for the virtual machine and the recovery site cluster is in fully-automated DRS mode. An unexpected vMotion might cause a temporary communication failure with the virtual machine, resulting in the customization script error.
Workaround: Rerun the recovery plan. If the error persists, configure the recovery site cluster DRS to manual mode and rerun the recovery plan.
- Recovery fails with
Error creating test bubble image for group ... The detailed exception is
Error while getting host mounts for datastore:managed-object-id... or
The object has already been deleted or has not been completely created.
If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:
- vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.
- The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.
You must manually reconfigure the replication to refresh the new ID.
Workaround: If the primary site is no longer available, contact VMware Support for instructions on manually updating the VRMS database with the new datastore managed object id. If the primary site is still available:
- Run a cleanup operation on the recovery plan that failed.
- In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.
- Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.
- Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.
- Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.