VMware

VMware vCenter Site Recovery Manager 5.1 Release Notes

VMware vCenter Site Recovery Manager 5.1.0.1 | 20 DEC 2012 | Build 941848
VMware vCenter Site Recovery Manager 5.1 (ISO only) | 10 SEP 2012 | Build 820150

Last updated: 19 NOV 2013

Check for additions and updates to these release notes.

What's in the Release Notes

These release notes cover the following topics:

What's New in SRM 5.1.0.1

SRM 5.1.0.1 provides the following improvements:

  • Resolves critical issues in SRM 5.1. If you have installed SRM 5.1 (build 820150), you must upgrade your installation to SRM 5.1.0.1 (build 941848).
  • Includes vSphere Replication 5.1.0.1. If you have installed vSphere Replication 5.1, after you upgrade SRM to 5.1.0.1, you must also upgrade vSphere Replication to vSphere Replication 5.1.0.1.

See Installation and Upgrade Notes for instructions about upgrading.

Resolved Issues in SRM 5.1.0.1

  • Installing SRM 5.1 or upgrading to SRM 5.1 using an imported certificate fails
    If you attempt to install SRM 5.1 or upgrade to SRM 5.1 using an imported PKCS12 certificate rather than an auto-generated certificate, the installer runs to completion but then fails with the error Failed to install certificate. See KB 2036909. This issue has been fixed in SRM 5.1.0.1.
  • SRM Server on the recovery site fails during cleanup of recovery plans
    SRM Server on the recovery site fails repeatedly during cleanup if there is nothing to clean up, for example if there are no LUNs to detach, or no datastores to unmount. This problem occurs when the command to start the test recovery from SRM Server to the SRA reports success with at least one LUN, but finds no LUNs when the ESXi hosts on the recovery site run a rescan. This issue has been fixed in SRM 5.1.0.1.

What's New in SRM 5.1

VMware vCenter Site Recovery Manager 5.1 adds the following new features and improvements.

  • SRM 5.1 supports reprotect and failback with vSphere Replication. Previously, you could only perform reprotect and failback on array-based protection groups. In SRM 5.1 you can perform reprotect and failback on vSphere Replication protection groups.
  • The SRM Server in SRM 5.1 is now a fully 64-bit application.
  • Improved handling of datastores in the all paths down (APD) state. If SRM detects that a datastore on the protected site is in the all paths down (APD) state and is preventing a virtual machine from shutting down, SRM waits for a period before attempting to shut down the virtual machine again. The APD state is usually transient, so by waiting for a datastore in the APD state to come back online, SRM can gracefully shut down the protected virtual machines on that datastore.
  • Improved disk resignaturing for VMFS disks.

Localization

VMware vCenter Site Recovery Manager 5.1 is available in the following languages:

  • English
  • French
  • German
  • Japanese
  • Korean
  • Simplified Chinese

Compatibility

SRM Compatibility Matrix

For the current interoperability and product compatibility matrix, see the Compatibility Matrixes for VMware vCenter Site Recovery Manager 5.1.

Compatible Storage Arrays and Storage Replication Adapters

For the current list of supported compatible storage arrays and SRAs, see the Site Recovery Manager Storage Partner Compatibility Guide.

VMware VSA Support

SRM 5.1 can protect virtual machines that reside on the vSphere Storage Appliance (VSA) by using vSphere Replication. VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.1.

Installation and Upgrade Notes

For an evaluation guide to assist with a technical walkthrough of major features and capabilities of Site Recovery Manager 5.1, see the VMware vCenter Site Recovery Manager Resources for Business Continuity.

Install SRM 5.1.0.1

To create a new installation of SRM 5.1.0.1, download and run the installer VMware-srm-5.1.0-941848.exe. See Installing SRM in Site Recovery Manager Installation and Configuration.

Upgrade an Existing SRM 4.1.x Installation to SRM 5.1.0.1

Upgrade versions of SRM earlier than 5.0 to SRM 5.0 or 5.0.1 before you upgrade to SRM 5.1.0.1.

IMPORTANT: Upgrading vCenter Server directly from 4.1.x to 5.1 is a supported upgrade path. However, upgrading SRM directly from 4.1.x to 5.1 is not a supported upgrade path. When upgrading a vCenter Server 4.1.x instance that includes an SRM 4.1.x installation, you must upgrade vCenter Server to version 5.0 or 5.0 u1 before you upgrade SRM to 5.0 or 5.0.1. If you upgrade vCenter Server from 4.1.x to 5.1 directly, when you attempt to upgrade SRM from 4.1.x to 5.0 or 5.0.1, the SRM upgrade fails. SRM 5.0.x cannot connect to a vCenter Server 5.1 instance.

Upgrade an Existing SRM 5.0 or 5.0.1 Installation to SRM 5.1.0.1

To upgrade an existing SRM 5.0 or 5.0.1 installation to SRM 5.1.0.1, download and run the installer VMware-srm-5.1.0-941848.exe. See Upgrading SRM in Site Recovery Manager Installation and Configuration. Upgrading from SRM 5.0.2 to 5.1.0.1 is not supported.

Upgrade an Existing SRM 5.1 Installation to SRM 5.1.0.1

You perform the following steps to upgrade an existing SRM 5.1 installation to SRM 5.1.0.1.

  1. Log into the machine on which you are running SRM Server on the protected site.
  2. Back up the SRM database using the tools that your database software provides.
  3. Download and run the installer VMware-srm-5.1.0-941848.exe.
  4. Click Yes when prompted for confirmation that you want to upgrade SRM.
  5. Click Yes to confirm that you have backed up the SRM database.
  6. Click Finish when the installation completes.
  7. Repeat the upgrade process on the recovery site.

After you have upgraded SRM Server, you must reinstall the SRM client plug-in.

  1. Log into a machine on which you are running a vSphere Client instance that you use to connect to SRM.
  2. Uninstall the SRM 5.1 client plug-in.
  3. Log into a vSphere Client instance and connect to the vCenter Server to which SRM Server is connected.
  4. Select Plug-ins > Manage Plug-ins.
  5. Click Download and Install to install the SRM 5.1.0 client plug-in.
  6. When the plug-in installation completes, log into SRM and verify that the configuration from the previous version has been retained.
  7. Repeat the process for all vSphere Client instances that you use to connect to SRM Server.

If you have installed vSphere Replication 5.1, you must also upgrade vSphere Replication to vSphere Replication 5.1.0.1. See Upgrade vSphere Replication in Site Recovery Manager Installation and Configuration.

Operational Limits for SRM and vSphere Replication

For the operational limits of SRM 5.1 and vSphere Replication 5.1, see http://kb.vmware.com/kb/2034768.

SRM SDKs

For a guide to using the SRM SOAP-based API, see VMware vCenter Site Recovery Manager API.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in Site Recovery Manager 5.1 are available at Download VMware vCenter Site Recovery Manager. You can also download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent generally available release of vCenter Site Recovery Manager.

Caveats and Limitations

  • Installing or upgrading to SRM 5.1 using an imported certificate fails
    If you attempt to install SRM 5.1 or upgrade to SRM 5.1 using an imported PKCS12 certificate rather than an auto-generated certificate, the installer runs to completion but then fails with the error Failed to install certificate. See KB 2036909. This is fixed in the SRM 5.1.0.1 release. To use an imported PKCS12 certificate, upgrade to SRM 5.1.0.1.

  • Interoperability with Storage vMotion and Storage DRS
    Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.1 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters.

  • Interoperability with vCloud Director
    Site Recovery Manager 5.1 offers limited support for vCloud Director environments. Using SRM to protect virtual machines within vCloud resource pools (virtual machines deployed to an Organization) is not supported. Using SRM to protect the management structure of vCD is supported. For information about how to use SRM to protect the vCD Server instances, vCenter Server instances, and databases that provide the management infrastructure for vCloud Director, see VMware vCloud Director Infrastructure Resiliency Case Study.

  • Interoperability with vSphere Replication
    vSphere Replication supports a maximum disk size of 2032GB.

  • SRM 5.1 supports 2 Microsoft Cluster Server (MSCS) nodes
    vSphere 5.1 supports up to 5 MSCS nodes. SRM 5.1 supports 2 MSCS nodes.

Resolved Issues in SRM 5.1

The following issues from SRM 5.0 have been resolved in this release.

  • Pairing or Breaking Pairing of vSphere Replication appliances Fails With LockingFailedException

    In rare cases, when pairing or breaking pairing between vSphere Replication appliance servers, if the vSphere service is being stopped at the same time, the operation fails with the following exception:

    LockingFailedException: Failed write-locking object: com.vmware.hms.db.entities.HmsLocalServerEntity:VRM Server GUID

    This has been fixed.

  • Valid Certificates Produce Warnings

    vSphere Replication appliances fail to pair due to issues with certificates when both of the following conditions are met:

    • The server's certificate has a hostname value that does not match the vSphere Replication appliance address.
      Hostname values might include subject name or alternative subject names. The vSphere Replication appliance address is specified in the Virtual Appliance Management Infrastructure (VAMI) during application configuration.
    • The peer vSphere Replication appliance server has a valid chain of trust to a certificate authority (CA) for the certificate.
      This happens if the certificate was issued by a trusted CA such as Verisign or Go Daddy or by an unknown CA whose certificate was added to the peer vSphere Replication appliance server's hms-truststore.jks file, thereby establishing trust.

    This has been fixed.

  • Installing Multiple vSphere Client SRM Plug-Ins Creates Problems

    Only one version of the vSphere Client SRM plug-in should be installed at any given time. Users are not prevented from installing the client plug-in for version 4.0.x or 4.1.0 over the plug-in for SRM 5.0. This has been fixed.

  • Reprotect Might Allow Users to Complete Unpermitted Tasks

    When users run a reprotect operation, tasks might be completed that exceed what granted privileges should allow them to do. Tasks users might complete, despite their lack of permissions include creating virtual machines on the reprotect destination site or protecting new devices on the reprotect source site. This might occur if a reprotect is initiated after other users have deleted virtual machines on the old production site or modified those virtual machines to which they were failed over. This has been fixed.

  • User Interfaces Changes Unexpectedly During Planned Migrations

    During planned migrations, virtual machines are deactivated and synchronized to the recovery site. During this process, protection groups and their virtual machines are temporarily displayed in the Partially Recovered state. The vSphere Client enables buttons such as Edit Protection Group and Remove Protection Group, as would be appropriate for these states, but operations triggered by these buttons do not succeed. These varying states and button availability might be confusing to users. This has been fixed.

  • Recovery Histories Might Lost After Upgrade to SRM 5.0

    During a server migration upgrade, the installer offers the default site name, rather than the site name currently in use. If you do not replace the default site name with the current site name, recovery histories might be lost. This has been fixed.

  • License Information Uninformative with Default Site Names

    SRM license information is labeled in vCenter using the SRM site name. If the default site name is used and multiple SRM Servers are registered in the same vCenter group, the default SRM Server naming makes the licensing information indistinguishable. This has been fixed.

  • Installing SRM on Machines with non-ASCII Characters Causes Access Problems

    If SRM is installed on a machine with a name that includes non-ASCII characters, problems with user access might occur. Because the machine name is used to construct URLs, those URLs that contain non-ASCII characters might not be valid. This has been fixed.

  • Non-ASCII DNS Suffixes Are Not Set Correctly After Customizing Windows XP And Windows 2003

    If you enter a non-ASCII DNS suffix in the DNS tab of the IP Settings section of virtual machine Recovery Properties dialog to customize Windows XP or Windows 2003, the customization is reported as successful but the non-ASCII DNS suffix is not set correctly. This has been fixed.

  • Custom Recovery Steps with Non-ASCII Character Output Cause SRM Servers to Crash

    Do not create custom recovery steps that produce non-ASCII output. If you add such steps to a plan and then complete a test or recovery, SRM Server crashes. If a plan with a step that produces non-ASCII output is used, the plan enters a state where it cannot be modified or deleted. This has been fixed.

  • Recovery Plan Configuration Interface Displays Invalid Network Options

    When configuring a recovery plan, networks must be selected for association with virtual machines. The recovery plan user interface displays DVS uplink port groups as possible selections, even though these are not valid options. This has been fixed.

  • Virtual Machine Status Does not Update After vSphere Replication appliances Disconnection

    vSphere Replication appliances exchange information between sites about virtual machine replication. If the vSphere Replication appliance connection is broken by the Break vSphere Replication appliance connection option, exchange of information about replicated virtual machines is not automatically re-established when you reconfigure the vSphere Replication appliance connection. This might result in incomplete information about virtual machine replication, including which virtual machines are available to add to protection groups. This has been fixed.

  • Information Displayed in vSphere Client Might Be Outdated

    During modification of SRM data, such as the contents of recovery plans, information displayed might not update as SRM information is modified. For example, after adding a non-critical virtual machine to suspend to a recovery plan, the information about this change is not automatically displayed. This has been fixed.

  • Invalid Certificates Are Rejected During Authentication

    SRM supports certificate-based authentication. If an SRM Server has an invalid certificate, when the vSphere Client uses the SRM plug-in to connect to SRM Server, errors occur. Invalid certificates can occur if any certificate in the certificate chain is expired or not yet valid. In such a case, the following message appears: The SSL connection to the remote host has terminated. The remote host certificate has these problems: A certificate in the host's chain is not time-valid.This has been fixed.

  • Operations Might Fail When Many vSphere Clients Connect to SRM

    SRM uses a large fixed-size pool of connections to communicate with vCenter Servers and with SRM Server at the other site to carry out tasks. The number of available connections decreases as the number of vSphere Clients connected increases. In many cases, this does not negatively affect system performance, but in the event of many simultaneous operations, such as protecting or unprotecting a large number of virtual machines, or running or testing a large recovery plan, problems might occur. Operations might timeout or fail with various explanations. This has been fixed.

  • Attempts to Reprotect Mixed Mode Recovery Plans Result In Reprotect Incomplete State

    Protection groups might be SAN-based or vSphere Replication (VR) based, and recovery plans might be composed of both types of protection groups. Reprotect cannot be run on a plan that includes a mix of both SAN-based and VR-based protection groups. Reprotection functionality does not check to ensure that the protection groups are not solely SAN-based or VR-based. If an attempt is made to reprotect protection groups that are composed of a mix of protection group types, the process fails, leaving the recovery plan in a Reprotect Incomplete state. To avoid this issue, do not create recovery plans that mix protection types. This has been fixed.

  • User Enabled Alarm Generates no Results

    Users can enable the Reconfigured protection settings for VM alarm, but this alarm is never triggered. This occurs because users cannot edit protection settings in this release. To track user-initiated changes to virtual machine settings in SRM, administrators must enable the Reconfigured recovery location settings for VM alarm. This has been fixed.

  • User Attempts to Configure vSphere Replication Fail

    Users with administrative privileges might still be unable to configure vSphere Replication. This is often the result of not being granted the VRM Datastore Mapper > View privilege. This has been fixed.

  • Synchronize Storage During a Recovery Produces Errors

    In rare cases, if synchronization is enabled, during the Synchronize Storage step, the following error might occur: VR synchronization failed for VRM group groupname. VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: 'Optimistic locking failure'. This is more likely to occur when there is a large amount of data to transmit during running a recovery plan, such as when there some combination of a greater number of virtual machines and a greater amount of data to synchronize. This has been fixed.

  • Failure to Connect to NFC or Copy Error During Recovery

    During a recovery, the following error messages might occur:

    • Failed to connect to NFC service at host.
    • Failed to copy file filePath to newFilePath: errorCode - errorMsg.

    This occurs when completing a large number of simultaneous operations. This typically occurs under the following conditions:

    • When recovering 40 or more virtual machines that are protected using vSphere Replication (VR).
    • When recovering 10 or more datastores on an ESX 3.5 host.
    • When running a large number of recovery plans when the recovery plans affect virtual machines that have Raw Disk Mapping (RDM) disks. This typically occurs when running 20 or more recovery plans on 4.x hosts or when running 40 or more recovery plans on 5.0 hosts.

    This has been fixed.

  • Virtual Machines Choosable After Replication Configuration Fails

    When using the wizard to configure replication for multiple virtual machines, some virtual machines might not be configured successfully. Despite the fact that their configuration failed, the virtual machines are listed as choices in the Create Protection Group wizard. This is because the information that the configuration failed is not communicated. This has been fixed.

  • Network Information Fields Are not Populated with Valid Information When IPv6 is Required

    If the vSphere Replication appliance is configured to use IPv6 alone, the following problems occur:

    • In the Virtual Appliance Management Infrastructure (VAMI) startup page, the vCenter Server address field is pre-populated with an IPv4 literal address. This address is unusable in IPv6-only appliances. To resolve this issue, edit that field and enter a valid DNS name or IPv6 literal address.
    • The Site Name field is not be pre-populated. This field is typically pre-populated with the virtual machine name of the appliance in the vCenter Server inventory. To resolve this issue, enter a valid site name. A valid site name is different from the site name of the peer site.

    This has been fixed.

  • Large Disks Perform Full Sync Unnecessarily

    When disks larger than 256GB are protected using vSphere Replication (VR), any operation that causes an internal restart of the virtual disk device causes the disk to complete a full sync. Internal restarts occur any time:

    • A virtual machine is restarted
    • A virtual machine is vMotioned
    • A virtual machine is reconfigured
    • A snapshot is taken of a virtual machine
    • Replication is paused and resumed

    The full sync is initiated by ESX, and any resolution to this issue would involve an update to ESX. These syncs involve additional I/O to both the protected and recovery site disks, which often takes longer than the Recovery Point Objective (RPO), resulting in a missed RPO target. This has been fixed.

  • Planned Migration Might Result in Slowed ESX Hosts

    During planned migration, SRM first instructs ESX hosts to unmount replicated datastores and detach the LUNs backing these datastores. Next, SRM instructs storage array software to make the detached LUNs read-only. This process helps ensure that devices on ESX hosts do not encounter an All Paths Down (APD) condition for the datastores and LUNs being migrated. Migrating a virtual machine with RDMs might result in the RDM LUNs entering an APD condition. After RDMs enter an APD condition, ESX hosts continue to reattempt to establish connectivity with the lost RDM LUNs. As the number of unavailable RDMs increases, the number of ESX host attempts to reconnect to the lost RDMs increases correspondingly. As this proceeds, the ESX host might become slow to respond and vCenter Server might eventually find the hosts unresponsive. This is more likely to occur with certain storage arrays. For example, this is more likely when an SRA supports on iSCSI target per LUN. To resolve these issues, reboot the ESX host. This has been fixed.

Known Issues in SRM 5.1

The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.

  • SRM Might Encounter Errors Mounting Datastores During Recoveries

    During a test recovery or actual failover, SRM waits for recovered datastores to become available. After datastores become available, SRM attempts to mount any datastores that are not mounted. In rare instances, these datastores are automatically mounted before SRM can mount them. If this occurs during a test failover, the failover does not complete. If this occurs during an actual recovery, the recovery completes with an error. To resolve this issue, retry the recovery.

  • Temporary Loss of vCenter Server Connections Might Create Recovery Problems for Virtual Machines with Raw Disk Mappings

    If the connection to the vCenter Server is lost during a recovery, one of the following might occur:

    • The vCenter Server remains unavailable, the recovery fails. To resolve this issue re-establish the connection with the vCenter Server and re-run the recovery.
    • In rare cases, the vCenter Server becomes available again and the virtual machine is recovered. In such a case, if the virtual machine has raw disk mappings (RDMs), the RDMs might not be mapped properly. As a result of the failure to properly map RDMs, it might not be possible to power on the virtual machine or errors related to the guest operating system or applications running on the guest operating system might occur.
      • If this is a test recovery, complete a cleanup operation and run the test again.
      • If this is an actual recovery, you must manually attach the correct RDM to the recovered virtual machine.

    Refer to the vSphere documentation about editing virtual machine settings for more information on adding raw disk mappings.

  • Cancellation of Recovery Plan Not Completed

    When a recovery plan is run, an attempt is made to synchronize virtual machines. It is possible to cancel the recovery plan, but attempts to cancel the recovery plan run do not complete until the synchronization either completes or expires. The default expiration is 60 minutes. The following options can be used to complete cancellation of the recovery plan:

    • Pause vSphere Replication, causing synchronization to fail. After recovery enters an error state, use the vSphere Client to restart vSphere Replication in the vSphere Replication tab. After replication is restarted, the recovery plan can be run again, if desired.
    • Wait for synchronization to complete or time out. This might take considerable time, but does eventually finish. After synchronization finishes or expires, cancellation of the recovery plan continues.

  • Valid Certificates Produce Warnings

    When uploading and installing certificates to the vSphere Replication appliance, the following error occurs:

    The certificate installed with warnings. Remote VRM systems with the 'Accept only SSL certificate signed by a trusted CA' option enabled might be unable to connect to this site for the following reason: The certificate was not issued for use with the given hostname: VRM hostname

    This error can be ignored, or you can avoid this error by using a supported browser other than Internet Explorer.

  • Non-ASCII Passwords Not Accepted For Log In To Virtual Appliance Management Infrastructure (VAMI)

    Users can manage the vSphere Replication appliance using VAMI. Attempts to log on to VAMI with an account with a password that uses non-ASCII character fails. This occurs even when correct authentication information is provided. This issue occurs in all cases where non-ASCII passwords are used with VAMI. To avoid this issue, use ASCII passwords or connect using SSH.

  • Outdated Replication Status Displayed if Datastore Becomes Unavailable

    It is possible that after virtual machine synchronization begins, the target datastore becomes unavailable. In such a case, the group status should display information about this failure, but the status remains unchanged. To identify issues related to datastore unavailability, use the events generated by the target datastore. The following events are generated in such a case:

    • Datastore is not accessible for VR Server... Generated immediately after datastore becomes inaccessible
    • Virtual machine vSphere Replication RPO is violated... Replica can not be generated within the specified RPO

  • Stopping Datastore Replication for Protected Virtual Machines Produces Incorrect Error Messages

    It is possible to protect a virtual machine that has disks on multiple datastores and then subsequently disable replication for one of the datastores. In such a case, the virtual machine's status in the protection group changes to Invalid: Virtual machine 'VM' is no longer protected. Internal error: Cannot create locator for disk'2001'... This information is incorrect. The status should change to Datastore '[datastore name]' is no longer replicated.

  • Virtual Machine Recovery Fails Due to Disk Configuration Error

    It is possible to place different disks and configuration files for a single protected virtual machine on multiple datastores. During recovery, SRM must have access to raw disk mapping and parent disk files. Without this access, SRM cannot determine disk types during recovery. In such a case, SRM might assume that a Raw Disk Mapping (RDM) disk is a non-RDM disk, resulting in a failed reconfiguration. To avoid this issue, ensure all hosts that can access recovered virtual machine configuration files can also access RDM mapping files and any parent disks, if such disks exist.

  • Pairing Sites Fails Due to Different Certificate Trust Methods

    When pairing SRM sites, the error Local and Remote servers are using different certificate trust methods appears. This occurs when the root certificate for the Certificate Authority (CA) signing the certificate is missing on SRM Server. To resolve this issue, install the root certificate for the SRM certificate's signing Certificate Authority using Microsoft Management Console. After installing the certificate, perform an SRM installation Modify operation to provide the user-generated certificate again.

  • Recovery Fails to Progress After Connection to Protected Site Fails

    If the protection site becomes unreachable during a deactivate operation or during RemoteOnlineSync or RemotePostReprotectCleanup, both of which occur during reprotect, then the recovery plan might fail to progress. In such a case, the system waits for the virtual machines or groups that were part of the protection site to complete those interrupted tasks. If this issue occurs during a reprotect operation, you must reconnect the original protection site and then cancel and restart the recovery plan. If this issue occurs during a recovery, it is sufficient to cancel and restart the recovery plan.

  • vSphere Replication Appliance Fails to Support Valid ESX Hosts

    During vSphere Replication configuration, when a datastore is being selected on a supported version of ESX, the message VR server Server Name has no hosts through which to access destination datastore ... appears. This occurs when adding a new host to vCenter Server or during registration of vSphere Replication server, if there is a temporary interruption of communication between the vSphere Replication appliance and the vSphere Replication server. Communication problems typically arise due to temporary loss of connectivity or to the server services being stopped.

    To resolve this issue, restart the vSphere Replication management server service.

    1. Log into the virtual appliance management interface (VAMI) of the vSphere Replication appliance at https://vr_applliance_address:5480.
    2. Click Configuration > Restart under Service Status.

  • Datastores Fail to Unmount When on Distributed Power Management (DPM) Enabled Clusters

    Planned migrations and disaster recoveries fail to unmount datastores from hosts that are attached to a DPM cluster if the host enters standby mode. The error Error: Cannot unmount datastore datastorename from host hostname. Unable to communicate with the remote host, since it is disconnected might appear. To resolve this issue, turn off DPM at the protected site before completing planned migrations or disaster recoveries. You can choose to turn DPM back on after completing recovery tasks.

  • vSphere Replication Servers Deployed With an Unspecified Network Configuration Malfunction

    vSphere Replication servers are deployed from an OVF file using the OVF deployment wizard. The deployment wizard includes a page for specifying the vSphere Replication server's network configuration. If no network settings are specified for the network configuration, DHCP addressing is used, but vSphere Replication servers do not support DHCP addressing. To avoid this issue, specify valid network settings for the vSphere Replication server during deployment.

  • Generic Error Message Is Displayed When Server Pairing Fails Due to Certificate Policy Strictness

    Attempts to pair servers between sites might fail, displaying the following error message: Site pairing or break operation failed. Details: VRM Server generic error. This error might occur when one site being configured to use a strict certificate policy and the other site being configured to use a lenient certificate policy. In such a case, the pairing should fail, as it does. After such a failure, modify the lenient certificate policy to use strict certificate policy and provide a valid certificate.

  • Including a percent (%) symbol in a folder name on the recovery site creates a new folder during replication.

    If you include a percent (%) symbol in the folder name on the recovery site and try to configure replication to that folder, the replication might be created in an incorrect folder with additional encoding. For example, if you create the folder %3dTest, vSphere Replication creates a new folder %253dTest and places the replication in this folder.

  • Context-sensitive help is not accessible in Internet Explorer 7.

    See KB 1009801.

  • SRM fails to recover virtual machines after RDM failures.

    Raw Disk Mapping (RDM) LUNs might fail while LUNs that back datastores are unaffected. In such a case, SRM cannot recover virtual machines with RDMs.

    Workaround: Recover affected virtual machines manually. Failover the RDM LUN and reattach it as an RDM disk on the recovered virtual machine.

  • vSphere Replication appliance status is Disconnected when running the SRM client plug-in on Windows XP or Windows 2003.

    The status of the vSphere Replication appliance shows as Disconnected in the Summary tab for a vSphere Replication site. Attempting to reconfigure the connection results in the error Lost connection to local VRMS server at server_address:8043. (The client could not send a complete request to the server 'server_address'. (The underlying connection was closed: An unexpected error occurred on a send.)). This problem occurs because the SRM client plug-in and vSphere Client cannot negotiate cryptography when the SRM client plug-in runs on older versions of Windows. If you run the desktop version of vSphere Client and SRM client plug-in on Windows XP 64-bit or Windows Server 2003 SP2, you might encounter incompatibilities between server and client cryptography support.

    Workaround: Download and install the Microsoft Hotfix from Microsoft KB 948963. This hotfix is not applied in any regular Windows updates so you must manually download and apply the fix.

  • Recovery takes a long time to finish and reprotect fails with error Cannot check login credentials. Authentication service infrastructure failed.

    This error occurs due to the exhaustion of ephemeral ports in vCenter Server running on Windows 2003 server. The SRM Server cannot communicate with vCenter Server.

    Workaround:

    1. Install the Microsoft hotfix from KB 979230 to fix a problem in the tcpip.sys driver.
    2. Set the following regedit values, either by making the changes manually or by importing the following .reg file: Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "MaxUserPort"=dword:00002710 "TcpTimedWaitDelay"=dword:0000001E
    3. If the registry values do not exist, create them.
    4. Restart the Windows 2003 Server machine after making the changes.

  • Error in recovery plan when shutting down protected virtual machines: Error - Operation timed out: 900 seconds during Shutdown VMs at Protected Site step.

    If you use SRM to protect datastores on arrays that support dynamic swap, for example Clariion, running a disaster recovery when the protected site is partially down or running a force recovery can lead to errors when re-running the recovery plan to complete protected site operations. One such error occurs when the protected site comes back online, but SRM is unable to shut down the protected virtual machines. This error usually occurs when certain arrays make the protected LUNs read-only, making ESXi unable to complete I/O for powered on protected virtual machines.

    Workaround: Reboot ESXi hosts on the protected site that are affected by read-only LUNs.

  • Generating support bundles on a heavily loaded environment might disrupt ongoing vSphere Replication operations.

    Generating support bundles in heavily loaded environments can cause vSphere Replication connection problems during recovery operations. This specifically occurs if the storage for the vSphere Replication virtual machine is overloaded.

    Workaround: If an operation fails to start when the vSphere Replication server is blocked by generation of the support bundle, attempt to rerun the operation. Re-evaluate the expected storage bandwidth requirements of the cluster, as well as the network bandwidth if the storage is NAS.

  • Rerunning reprotect fails with error: Protection Group '{protectionGroupName}' has protected VMs with placeholders which need to be repaired.

    If a ReloadFromPath operation does not succeed during the first reprotect, the corresponding protected virtual machines enter a repairNeeded state. When SRM runs a reprotect on the protection group, SRM cannot repair the protected virtual machines nor restore the placeholder virtual machines. The error occurs when the first reprotect operation fails for a virtual machine because the corresponding ReloadFromPath operation failed.

    Workaround: Rerun reprotect with the force cleanup option enabled. This option completes the reprotect operation and enables the Recreate placeholder option. Click Recreate placeholder to repair the protected virtual machines and to restore the placeholder virtual machines.

  • Protect virtual machine task appears to remain at 100%.

    The VI Client Recent Tasks pane shows a virtual machine stuck at 100% during the Protect VM task. SRM marks the virtual machine as Configured, indicating that it was protected. You do not need to take action as SRM successfully protected the virtual machine.

  • Cleanup fails if attempted within 10 minutes after restarting recovery site ESXi hosts from maintenance mode.

    The cleanup operation attempts to swap placeholders and relies on the host resilience cache which has a 10 minute refresh period. If you attempt a swap operation on ESXi hosts that have been restarted within the 10 minute window, SRM does not update the information in the SRM host resiliency cache, and the swap operation fails. The cleanup operation also fails.

    Workaround: Wait for 10 minutes and attempt cleanup again.

  • SRM stops during an attempt to protect an already reprotected array-based virtual machine using vSphere Replication.

    If you run a recovery, then try to use vSphere Replication to protect a virtual machine already protected by an array-based protection group, SRM Server asserts.

    Workaround: Restart SRM Server and unprotect the array-based protected virtual machine first before protecting with vSphere Replication. Alternatively, continue with array-based protection and do not not protect with vSphere Replication. SRM does not support protecting with both providers.

  • Reprotect fails with error: Operation timed out: 3600 seconds VR synchronization failed for VRM group <Unavailable>. Operation timed out: 3600 seconds.

    When you run reprotect, SRM performs an online sync for the replication group which might time out the operation. The default timeout value is 2 hours.

    Workaround: Increase the timeout value in Advanced Settings in SRM.

  • Cannot configure a virtual machine with physical mode RDM disk even if the disk is excluded from replication.

    If you configure a replication for a virtual machine with physical mode, you might see the following error:

    VRM Server generic error. Check the documentation for any troubleshooting information. The detailed exception is: HMS can not set disk UUID for disks of VM : MoRef: type = VirtualMachine, value = , serverGuid = null'.

    Workaround: None.

  • Planned migration fails with Error: Unable to copy the configuration file...

    If there are two ESXi hosts in a cluster and one host loses connectivity to the storage, the other host can usually recover replicated virtual machines. In some cases the other host might not recover the virtual machines and recovery fails with the following error: Error: Unable to copy the configuration file...

    Workaround: Rerun recovery.

  • While reprotecting a virtual machine, the following error might occur during the "Configure protection to reverse direction" step: Error - The operation was only partially completed for the protection group 'pg_name' since a protected VM belonging to it was not successful in completing the operation. VM 'vm_name' is not replicated by VR.

    This error occurs during the second reprotect run if the first run failed with Operation Timed out error during "Configure storage to reverse direction" step.

    Workaround: Manually configure reverse replication for the affected virtual machines and rerun reprotect. For information on reverse replication, see vSphere Replication Administration: Failback of Virtual Machines in vSphere Replication.

  • vSphere Replication cannot access datastores through hosts with multiple management virtual NICs and posts DatastoreInaccessibleEvent in vCenter Server: vSphere Replication cannot access datastore.

    If a host is configured with multiple virtual NICs and you select more than one NIC for management traffic, vSphere Replication registers only the first NIC and uses it to access target datastores. If the vSphere Replication server address is not on the first management network of the host, vSphere Replication does not communicate with the host.

    Workaround: Use a host with a single virtual NIC selected for management traffic for datastores at the secondary site. You can also reconfigure the host networking so that the address of the first management virtual NIC is from a network that vSphere Replication can access.

  • A virtual machine cannot power off due to a pending question error.

    If you create a permanent device loss (PDL) situation, accidentally or deliberately, by dropping an initiator from the SAN to the host where the virtual machine is registered, you might see the following error:

    Error: The operation cannot be allowed at the current time because the VM has a question pending...

    This error occurs if hardware fails on the recovery site during PDL while running a clean up after you ran a recovery plan in test recovery mode.

    Workaround: Answer the question in the virtual machine Summary tab. Then rerun clean up in force clean up mode. After the clean up operation completes, the virtual machine might still exist on the recovery site, in which case, remove it manually.

  • SRM version 5.0 can communicate with upgraded SRM Server version 5.1 while running recovery.

    If you upgrade the recovery site from version 5.0 to version 5.1 and attempt a disaster recovery on the upgraded site, SRM Servers version 5.0 on the protected site and SRM Server version 5.1 on the recovery site can communicate with each other and can perform operations on the protected site. If you run a reprotect operation before you upgrade the protected site, the operation runs for a very long time without any progress.

    Before running a recovery on an upgraded site, stop all SRM 5.0 services that are still running on the remote site. Otherwise, SRM Servers with incompatible versions can still communicate with each other.

  • Internal error occurs during recovery.

    SRM retrieves various information from vCenter during the recovery process. If it does not receive critical information required to proceed, an internal error CannotFetchVcObjectProperty can occur. This error might occur when vCenter is under heavy stress or an ESXi host becomes unavailable due to heavy stress. This error might also occur when SRM tries to look up information of an ESXi host that is in a disconnected state or has been removed from vCenter inventory.

    Workaround: Rerun the recovery plan.

  • Virtual machine VNIC's MAC address is usually preserved during recovery.

    Under very rare circumstances, test or recovery might fail to recover a specific virtual machine because vCenter unexpectedly assigns a new MAC address to the virtual machine's VNIC on the recovery site. The error message in the result column in the recovery steps is the following: Error - Cannot complete customization, possibly due to a scripting runtime error or invalid script parameters (Error code: 255). IP settings might have been partially applied. The SRM logs contain a message: Error finding the specified NIC for MAC address = xx::xx:xx:xx:xx where xx::xx:xx:xx:xx is the expected MAC address.

    Workaround: Modify the affected virtual machine's MAC address manually in the vSphere Client virtual machine Properties to "xx::xx:xx:xx:xx" and restart the recovery plan.

  • vSphere Replication reports "Datastore is not accessible" for datastores at a host added to vCenter Server inventory while registering vSphere Replication server.

    vSphere Replication selects all supported hosts from vCenter inventory and enables them as part of vSphere Replication registration. If you add a host to vCenter while vSphere Replication is still being registered, vSphere Replication does not select this host and it cannot access datastores on the recovery site.

    Workaround: Disconnect and reconnect the host in the vCenter inventory for vSphere Replication to enable it.

  • Synchronize virtual machine, recovery, or reprotect operations fail with vSphere Replication generic error: The requested instance with Id=<...> was not found on the remote site.

    Although the operation reports failure, vSphere Replication successfully synchronizes the virtual machine state to the remote site. This error can occur when you request a synchronize operation or when you run main operations such as recovery or reprotect which use this operation.

    Workaround: Rerun the failed operation.

  • Recovered VMFS volume fails to mount with error: Failed to recover datastore.

    This error might occur due to a latency between vCenter, ESXi and SRM Server.

    Workaround: Rerun the recovery plan.

  • vSphere Replication server registration might take a long time depending on the number of hosts in the vCenter Server inventory.

    If the vCenter Server inventory contains a few hundred or more hosts, the Register VR server task takes an hour or more to complete, as vSphere Replication updates each host's SSL thumbprint registry. The vCenter Server Events pane displays Host is configured for vSphere Replication for each host as the vSphere Replication server registration task progresses.

    Workaround: Wait for the registration task to complete. After it finishes, you can use vSphere Replication for incoming replication traffic.

  • vSphere Replication registration might fail with error: VRM server generic error ... Row was updated or deleted by another transaction ... HostEntity #<host-managed-object-id>.

    The Register VR server operation might fail with this error if vCenter Server has a large number of hosts in its inventory and you perform the following actions while registration is in progress:

    • Remove a host from the vCenter Server inventory.
    • Remove and reconnect a host from the inventory.
    • Change the host's SSL thumbprint.

    Workaround: Retry the Register VR server operation.

  • Test recovery, planned migration, or re-protect workflow operations might fail with error: Operation timed out.

    This error can occur when running multiple operations with multiple primary sites.

    Workaround: Re-run the failed operation.

  • A recovery or test workflow fails for a virtual machine with the following message: Error - Unexpected error '3008' when communicating with ESX or guest VM: Cannot connect to the virtual machine.

    Under rare circumstances this error might occur when you configure IP customization or an in-guest callout for the virtual machine and the recovery site cluster is in fully-automated DRS mode. An unexpected vMotion might cause a temporary communication failure with the virtual machine, resulting in the customization script error.

    Workaround: Rerun the recovery plan. If the error persists, configure the recovery site cluster DRS to manual mode and rerun the recovery plan.

  • Some SRM initiated tasks that fail with a NoPermission error and displays Internal Error: vim.fault.NoPermission instead of Permission to perform this operation was denied.

    The vSphere Client asserts if a mirrored task contains a MoRef to an object that is not a vCenter Server or SRM object.

    Workaround: If the failed SRM task is a recovery task, consult the recovery task pane for a more specific error. For a vCenter Server task failure, see the subtasks which contain more information.

  • Reprotect operation for multiple virtual machines targeting multiple remote sites fails with Unable to reverse replication for the virtual machine vm_name. Operation timed out.

    vSphere Replication stops responding to SRM requests when reprotecting multiple virtual machines to multiple remote sites.

    Workaround: Change several vSphere Replication parameters:

    1. Stop the vSphere Replication management server: /etc/init.d/hms stop
    2. Edit /opt/vmware/hms/conf/hms-configuration.xml and change hms-db-max-connections from 99 to 500.
    3. Edit /var/lib/vrmsdb/postgresql.conf and change max_connections from 100 to 501.
    4. Restart the embedded vPostgres database: /etc/init.d/hms-vpostgres stop /etc/init.d/hms-vpostgres start
    5. Change hms-vlsi-server thread pool size: /opt/vmware/vpostgres/1.0/bin/psql -U vrmsdb vrmsdb update ConfigEntryEntity set configValue='250' where configKey = 'hms-vlsi-server-threadpool-size'
    6. Increase heap for vSphere Replication management server process: edit /etc/init.d/hms and add -Xmx1536M in JAVA_TOOL_OPTIONS.
    7. Start vSphere Replication management server: /etc/init.d/hms start
    8. Rerun the failed operation.

  • Last Sync Size value for a virtual machine protected by vSphere Replication is the amount of data that has changed since the last synchronization.

    Even if you perform a full synchronization on a virtual machine that vSphere Replication protects, the Last Sync Size value shows the amount of data that has changed since the last synchronization, and not the size of the full virtual machine. This can be misinterpreted as meaning that the synchronization was not complete. After the initial synchronization, during a full synchronization of a virtual machine, vSphere Replication compares entire disks, but only transfers data that has changed, not the entire disk.

    To see the size and duration of the initial synchronization, you can check the Events that vSphere Replication posts to vCenter Server. This issue only occurs on ESXi 5.0.x hosts. This behavior has been clarified on ESXi 5.1 hosts.

  • Recovery or test recovery might fail with the error "No host with hardware version '7' and datastore 'ds_id' which are powered on and not in maintenance mode are available..." in cases in which very recent changes occur in the host inventory.

    SRM Server keeps a cache of the host inventory state. Sometimes when there are recent changes to the inventory, for example if a host becomes inaccessible, is disconnected, or loses its connection to some of the datastores, SRM Server can require up to 15 minutes to update its cache. If SRM Server has the incorrect host inventory state in its cache, a recovery or test recovery might fail.

    Workaround: Wait for 15 minutes before running a recovery if you have made changes to the host inventory. If you observe the error above, wait for 15 minutes then re-run the recovery.

  • Reprotect fails with an error message that contains Unable to communicate with the remote host, since it is disconnected.

    This error might be due to the fact that the protected side cluster has been configured to use Distributed Power Management (DPM), and one of the ESX hosts required for the operation was put into standby mode. This could happen if DPM detected that the host had been idle, and put it in the standby mode. SRM had to communicate to the host in order to access the replicated datastore managed by this host. SRM does not manage the DPM state on the protected site but does, however, manage the DPM state during recovery, test, and cleanup on the recovery site.

    Workaround: If the error persists, temporarily turn off DPM and ensure the ESX hosts managing the replicated datastores on the protected side are turned on before attempting to run reprotect.

  • Test recovery cleanup might fail if one of the hosts loses connection to a placeholder datastore.

    If you ran a test recovery on a cluster with two hosts on a recovery site and one of the hosts in the cluster loses connection to a placeholder datastore, cleanup of the test recovery might fail.

    Workaround: Run cleanup in force mode. On the recovery site, manually remove placeholder virtual machines created on the host that lost connection to the placeholder datastore. Remove the virtual machine replication configuration and reconfigure the replication. Reconfigure virtual machine protection from protection group properties.

  • Reprotect fails with an error when running multiple recovery plans concurrently.

    When running multiple recovery plans conconcurrently, reprotect can fail with the error Error - The operation was only partially completed for the protection group 'protection_group' since a protected VM belonging to it was not successful in completing the operation.

    Workaround: Run the reprotect operation again.

  • After restarting vCenter Server, when using vSphere Replication, reprotect operations fail with Error - Unable to reverse replication for the virtual machine 'virtual_machine'. The session is not authenticated.

    After vCenter Server restarts, it fails to refresh some sessions that SRM uses to communicate with vSphere Replication and causes reprotect to fail.

    Workaround: Restart the SRM services on both the sites.

  • When protection site LUNs encounter All Paths Down (APD) or Permanent Device Loss (PDL), SRM might not recover raw disk mapping (RDM) LUNs in certain cases.

    During the first attempt at planned migration you might see the following error message when SRM attempts to shut down the protected virtual machine:

    Error - The operation cannot be allowed at the current time because the virtual machine has a question pending: 'msg.hbacommon.askonpermanentdeviceloss:The storage backing virtual disk VM1-1.vmdk has permanent device loss. You might be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.

    If the protected virtual machines have RDM devices, in some cases SRM does not recover the RDM LUN.

    Workaround:

    1. When LUNs enter APD/PDL, ESXi Server marks all corresponding virtual machines with a question that blocks virtual machine operations.
      1. In the case of PDL, click Cancel to power off the virtual machine.
      2. In the case of APD, click Retry.

      If you run planned migration, SRM fails to power off production virtual machines.
    2. If the virtual machines have RDM devices, SRM might lose track of the RDM device and not recover it. Rescan all HBAs and make sure that the status for all of the affected LUNs has returned from the APD/PDL state.
    3. Check the vCenter Server inventory and answer the PDL question that is blocking the virtual machine.
    4. If you answer the PDL question before the LUNs come back online, SRM Server on the protected site incorrectly detects that the RDM device is no longer attached to this virtual machine and removes the RDM device. The next time you run a recovery, SRM does not recover this LUN.
    5. Rescan all HBAs to make sure that all LUNs are online in vCenter Server inventory and power on all affected virtual machines. vCenter Server associates the lost RDMs with protected virtual machines.
    6. Check the Array Managers tab in the SRM interface. If all the protected datastores and RDM devices do not display, click Refresh to discover the devices and recompute the datastore groups.
    7. Make sure that Edit Group Settings shows all of the protected datastores and RDM devices and that the virtual machine protection status does not show any errors.
    8. Start a planned migration to recover all protected LUNs, including the RDM devices.
  • Recovery fails with Error creating test bubble image for group ... The detailed exception is Error while getting host mounts for datastore:managed-object-id... or The object has already been deleted or has not been completely created.

    If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:

    • vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.
    • The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.

    You must manually reconfigure the replication to refresh the new ID.

    Workaround: If the primary site is no longer available, contact VMware Support for instructions about adding a special configuration entry in the vSphere Replication appliance database that triggers an automatic fix of the changed internal datastore ID to allow recovery. If the primary site is still available:

    1. Run a cleanup operation on the recovery plan that failed.
    2. In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.
    3. Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.
    4. Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.
    5. Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.

  • Running the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in an error.

    If you installed SRM by using the CUSTOM_SETUP option, for example to create a shared recovery site setup, attempting to run the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in the error CUSTOM_SETUP command line not supported when standard installation already exists.

    Workaround: Use Windows control panel to start the SRM installer in Modify mode.

  • SRM stops unexpectedly during planned migration if ESXi Server is disconnected from vCenter Server on the protected site.

    If the ESXi Server on the protected site is disconnected from vCenter Server or if it loses its connection to vCenter Server due to a problem, SRM stops unexpectedly if you attempt to perform a planned migration. The planned migration fails with an error.

    Workaround: Reconnect the ESX Server.