VMware

VMware vCenter Site Recovery Manager 5.1.1 Release Notes

VMware vCenter Site Recovery Manager 5.1.1 | 25 APR 2013 | Build 1082082

Last updated: 21 OCT 2013

Check for additions and updates to these release notes.

What's in the Release Notes

These release notes cover the following topics:

What's New in SRM 5.1.1

VMware vCenter Site Recovery Manager 5.1.1 adds the bug fixes described in Resolved Issues.

Localization

VMware vCenter Site Recovery Manager 5.1.1 is available in the following languages:

  • English
  • French
  • German
  • Japanese
  • Korean
  • Simplified Chinese

Compatibility

SRM Compatibility Matrix

For interoperability and product compatibility information, including supported guest operating systems and support for guest operating system customization, see the Compatibility Matrixes for VMware vCenter Site Recovery Manager 5.1.

Compatible Storage Arrays and Storage Replication Adapters

For the current list of supported compatible storage arrays and SRAs, see the Site Recovery Manager Storage Partner Compatibility Guide.

VMware VSA Support

SRM 5.1.1 can protect virtual machines that reside on the vSphere Storage Appliance (VSA) by using vSphere Replication. VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.1.1.

Installation and Upgrade

For an evaluation guide to assist with a technical walkthrough of major features and capabilities of Site Recovery Manager 5.1.1, see the VMware vCenter Site Recovery Manager Resources for Business Continuity.

For the supported upgrade paths for SRM, see the VMware Product Interoperability Matrixes and select Solution Upgrade Path and VMware vCenter Site Recovery Manager.

Install SRM 5.1.1

To create a new installation of SRM 5.1.1, download and run the installer VMware-srm-5.1.1-1082082.exe.

See Installing SRM in Site Recovery Manager 5.1 Installation and Configuration.

Upgrade an Existing SRM 4.1.2 Installation to SRM 5.1.1

Upgrade SRM 4.1.2 to SRM 5.0.2 before you upgrade to SRM 5.1.1.

See Upgrading SRM in the Site Recovery Manager 5.0 Administration Guide.

IMPORTANT: Upgrading vCenter Server directly from 4.1.2 to 5.1 or 5.1u1 is a supported upgrade path. However, upgrading SRM directly from 4.1.2 to 5.1.1 is not a supported upgrade path. When upgrading a vCenter Server 4.1.2 instance that includes an SRM 4.1.2 installation, you must upgrade vCenter Server to version 5.0 or 5.0 u1 before you upgrade SRM to 5.0.2. If you upgrade vCenter Server from 4.1.2 to 5.1 or 5.1u1 directly, when you attempt to upgrade SRM from 4.1.2 to 5.0.2, the SRM upgrade fails. SRM 5.0.2 cannot connect to a vCenter Server 5.1 instance.

Upgrade an Existing SRM 5.0.2 Installation to SRM 5.1.1

To upgrade an existing SRM 5.0.2 installation to SRM 5.1.1, download and run the installer VMware-srm-5.1.1-1082082.exe.

See Upgrading SRM in Site Recovery Manager 5.1 Installation and Configuration.

Upgrade an Existing SRM 5.1 or 5.1.0.1 Installation to SRM 5.1.1

You perform the following steps to upgrade an existing SRM 5.1 or 5.1.0.1 installation to SRM 5.1.1.

  1. Log into the machine on which you are running SRM Server on the protected site.
  2. Back up the SRM database using the tools that your database software provides.
  3. Download and run the installer VMware-srm-5.1.1-1082082.exe.
  4. Click Yes when prompted for confirmation that you want to upgrade SRM.
  5. Click Yes to confirm that you have backed up the SRM database.
  6. Click Finish when the installation completes.
  7. Repeat the upgrade process on the recovery site.

After you have upgraded SRM Server, you must reinstall the SRM client plug-in.

  1. Log into a machine on which you are running a vSphere Client instance that you use to connect to SRM.
  2. Uninstall the SRM 5.1 client plug-in.
  3. Log into a vSphere Client instance and connect to the vCenter Server to which SRM Server is connected.
  4. Select Plug-ins > Manage Plug-ins.
  5. Click Download and Install to install the SRM 5.1.1 client plug-in.
  6. When the plug-in installation completes, log into SRM and verify that the configuration from the previous version has been retained.
  7. Repeat the process for all vSphere Client instances that you use to connect to SRM Server.

Upgrade vSphere Replication to vSphere Replication 5.1.1

If you have installed vSphere Replication with a previous release and you upgrade to SRM 5.1.1, you must also upgrade vSphere Replication to version 5.1.1. You must also upgrade vSphere Replication servers to version 5.1.1.

See Upgrade vSphere Replication in Site Recovery Manager Installation and Configuration.

To upgrade the vSphere Replication appliance and vSphere Replication server to version 5.1.1 via the virtual appliance management interface (VAMI), use the following URL:

http://vapp-updates.vmware.com/vai-catalog/valm/vmw/05d561bc-f3c8-4115-bd9d-22baf13f7178/5.1.1.0

IMPORTANT: Do not select the option in Update > Settings in the VAMI to automatically update vSphere Replication. If you select automatic updates, VAMI updates vSphere Replication to the latest 5.x version, which is incompatible with SRM and vCenter Server 5.1.x. Leave the update setting set to No automatic updates.

Operational Limits for SRM and vSphere Replication

For the operational limits of SRM 5.1.x and vSphere Replication 5.1.x, see http://kb.vmware.com/kb/2034768.

SRM SDKs

For a guide to using the SRM SOAP-based API, see VMware vCenter Site Recovery Manager API.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in Site Recovery Manager 5.1.1 are available at Download VMware vCenter Site Recovery Manager. You can also download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent generally available release of vCenter Site Recovery Manager.

Caveats and Limitations

  • Interoperability with Storage vMotion and Storage DRS
    Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.1.1 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters.

  • Interoperability with vCloud Director
    Site Recovery Manager 5.1.1 offers limited support for vCloud Director environments. Using SRM to protect virtual machines within vCloud resource pools (virtual machines deployed to an Organization) is not supported. Using SRM to protect the management structure of vCD is supported. For information about how to use SRM to protect the vCD Server instances, vCenter Server instances, and databases that provide the management infrastructure for vCloud Director, see VMware vCloud Director Infrastructure Resiliency Case Study.

  • Interoperability with vSphere Replication
    vSphere Replication supports a maximum disk size of 2032GB.

  • SRM 5.1.1 supports 2 Microsoft Cluster Server (MSCS) nodes
    vSphere 5.1.x supports up to 5 MSCS nodes. SRM 5.1.x supports 2 MSCS nodes. See Protecting MSCS and Fault Tolerant Virtual Machines in Site Recovery Manager Administration.

  • vSphere Replication appliance and vSphere Replication server appliances are subject to Novell Security Advisory CVE-2008-5161
    Novell Security Advisory CVE-2008-5161 relates to the SUSE Linux Enterprise Server (SLES) SP1, the operating system for the vSphere Replication appliance and vSphere Replication server appliances. Novell states in the advisory document at http://support.novell.com/security/cve/CVE-2008-5161.html that the security risks are low. If necessary, to further mitigate the security risks, you can follow the Novell advisory to modify your SSH configurations. For the vSphere Replication appliance and vSphere Replication server appliances, you can retain just the AES ciphers by adding the following directive in the sshd_config and ssh_config files:

  • Ciphers aes128-ctr,aes256-ctr

    Disable RC4 and all other CBC ciphers, including arcfour256, arcfour, aes128-cbc, and aes256-cbc.

  • Recovering a Windows Server 2012 Domain Controller (DC) virtual machine breaks the DC safeguard mechanism when using vSphere Replication.
    Recovery of a Windows Server 2012 DC virtual machine is possible by using array-based replication. However, when performing recovery by using vSphere Replication in a recovery plan that you run from the SRM UI, vSphere Replication does not remove the vm.genid and vm.genidX from the VMX file. Performing recovery by using vSphere Replication from the vSphere Web Client succeeds.

Resolved Issues

The following issues from previous releases have been resolved in this release.

  • Test recovery, planned migration, or re-protect workflow operations might fail with error: Operation timed out.

    This error can occur when running multiple operations with multiple primary sites. This has been fixed.

  • Reprotect operation for multiple virtual machines targeting multiple remote sites fails with Unable to reverse replication for the virtual machine vm_name. Operation timed out.

    vSphere Replication stops responding to SRM requests when reprotecting multiple virtual machines to multiple remote sites. This has been fixed.

  • IP customization fails when testing a recovery plan with a name in Japanese.

    If the name of a recovery plan uses Japanese characters, and if you configure IP customization on a Windows virtual machine that is running in the Japanese locale, the customization step fails with a scripting error. This has been fixed.

  • Duplicate volumes appear in the Devices tab in the Array Managers view in the SRM UI.

    This problem occurs when the target number of a LUN is the same as the source number of another LUN. This UI problem has been fixed.

  • Performing a recovery by using the vSphere Replication interface fails with the error Processing recovered virtual machine ... configuration file failed ... Http request failed: org.apache.http.conn.HttpHostConnectException: Connection to https://vCenter_Server_hostname refused.

    When vCenter Server is installed with a custom port for HTTPS, for example port 444, recovery fails when trying to update the replicated the VMX file. This has been fixed.

  • Configuring replication in advanced mode when the target location is on an NFS datastore results in the error message Select virtual disk format.

    This has been fixed.

  • Rescans of datastores on recovery site fail due to storage devices not being ready.

    SRAs can send responses to SRM before a promoted storage device on the recovery site has become available to the ESXi hosts. When SRM receives a response from an SRA, it performs a rescan of the storage devices. If the storage devices are not fully available yet, ESXi Server does not detect them and SRM does not find the replicated devices when it performs rescans. Datastores are not created and recovered virtual machines cannot be found.

    Workaround: If you experience problems with unavailable datastores, SRM 5.1.1 provides a new setting to allow you to delay the start of rescans after an SRA promotes a storage device.

    1. Right-click an SRM site and select Advanced Settings.
    2. Click storageProvider.
    3. Set the storageProvider.hostRescanDelaySec parameter to delay the start of storage rescans by a number of seconds. A value from 20 to 180 is reasonable.
    4. Restart the SRM service.

    NOTE: In previous releases, you might have used the storageProvider.hostRescanRepeatCnt parameter to introduce a delay in recoveries. Use the new storageProvider.hostRescanDelaySec parameter instead.

  • SRM stops unexpectedly during planned migration if ESXi Server is disconnected from vCenter Server on the protected site.

    If the ESXi Server on the protected site is disconnected from vCenter Server or if it loses its connection to vCenter Server due to a problem, SRM stops unexpectedly if you attempt to perform a planned migration. This has been fixed so that SRM does not stop when you run a planned migration. The planned migration fails with an error.

    Workaround:

    1. Remove the ESXi Server from vCenter Server inventory.
      If the host was powered off intentionally or was put into maintenance mode, it should not have any virtual machines running on it. In this case, only resource pools and folders are lost. If the host was part of a cluster, resource pools and folders will be recreated once when you restore the host.
    2. Restore the host and reconnect it to vCenter Server.
    3. Run a Disaster Recovery to migrate the virtual machines to the recovery site.
  • Timeouts occur with the error to "Cannot find replicated datastore due to timeout of HBA rescan operation".

    This has been fixed to improve detection of the error and the error message for timeouts during host rescan operations.

  • Installing or upgrading to SRM 5.1 using an imported certificate fails.

    If you attempt to install SRM 5.1 or upgrade to SRM 5.1 using an imported PKCS12 certificate rather than an auto-generated certificate, the installer runs to completion but then fails with the error Failed to install certificate. This has been fixed.

  • SRM Server on the recovery site stops unexpectedly during cleanup of an array-based recovery plan test.

    When running a cleanup operation after testing an array-based recovery plan, SRM Server on the recovery site stops unexpectedly. The logs contain the error Panic: Win32 exception: Access Violation. This has been fixed.

  • Custom recovery steps do not stop virtual machines from powering on before IP customization.

    If you insert a custom recovery step after the "create writable storage snapshot" step in a test recovery or after the "change recovery site storage to writable" step in a real recovery, and if you have configured the virtual machine for IP customization, the recovery plan does not wait for the custom recovery step to complete before powering on the virtual machine. This has been fixed.

  • Pairing SRM Servers fails when using custom certificates with VCVA.

    Pairing SRM Servers when using custom certificates for SRM Server and the vCenter Server virtual appliance fails with the error: Permission to perform this operation was denied. This has been fixed.

  • If SRM stops unexpectedly while testing a recovery plan, SRM stops again when you attempt to rerun the test.

    SRM stopping unexpectedly when testing a recovery plan results in SRM always stopping when you attempt to rerun the plan. This is due to an assertion check on the state of a virtual machine, which, as the result of the prematurely terminated test recovery, is in an invalid state. This has been fixed.

  • Performing reprotect by using vSphere Replication fails with an authentication error.

    Performing reprotect by using vSphere Replication completes to approximately 89%, waits, then fails with the error runHbrReprotect: com.vmware.vim.binding.vim.fault.NotAuthenticated. The virtual machines on the secondary site remain in an invalid state. This has been fixed.

  • SRM fails during recovery while preparing virtual machines for migration on the protected site.

    If a protected virtual machine is on a datastore that is shared between multiple hosts, and if those hosts are in different datacenters, SRM can fail during a recovery at the step Prepare Protected Site VMs for Migration. The logs show the message SRM Panic: Assert Failed: "ok" @ path/deactivateStorage.cpp:1170. This has been fixed.

Known Issues

The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.

  • Virtual machine VNIC's MAC address is usually preserved during recovery.

    Under very rare circumstances, test or recovery might fail to recover a specific virtual machine because vCenter unexpectedly assigns a new MAC address to the virtual machine's VNIC on the recovery site. The error message in the result column in the recovery steps is the following: Error - Cannot complete customization, possibly due to a scripting runtime error or invalid script parameters (Error code: 255). IP settings might have been partially applied. The SRM logs contain a message: Error finding the specified NIC for MAC address = xx::xx:xx:xx:xx where xx::xx:xx:xx:xx is the expected MAC address.

    Workaround: Modify the affected virtual machine's MAC address manually in the vSphere Client virtual machine Properties to "xx::xx:xx:xx:xx" and restart the recovery plan.

  • SRM Might Encounter Errors Mounting Datastores During Recoveries

    During a test recovery or actual failover, SRM waits for recovered datastores to become available. After datastores become available, SRM attempts to mount any datastores that are not mounted. In rare instances, these datastores are automatically mounted before SRM can mount them. If this occurs during a test failover, the failover does not complete. If this occurs during an actual recovery, the recovery completes with an error. To resolve this issue, retry the recovery.

  • Temporary Loss of vCenter Server Connections Might Create Recovery Problems for Virtual Machines with Raw Disk Mappings

    If the connection to the vCenter Server is lost during a recovery, one of the following might occur:

    • The vCenter Server remains unavailable, the recovery fails. To resolve this issue re-establish the connection with the vCenter Server and re-run the recovery.
    • In rare cases, the vCenter Server becomes available again and the virtual machine is recovered. In such a case, if the virtual machine has raw disk mappings (RDMs), the RDMs might not be mapped properly. As a result of the failure to properly map RDMs, it might not be possible to power on the virtual machine or errors related to the guest operating system or applications running on the guest operating system might occur.
      • If this is a test recovery, complete a cleanup operation and run the test again.
      • If this is an actual recovery, you must manually attach the correct RDM to the recovered virtual machine.

    Refer to the vSphere documentation about editing virtual machine settings for more information on adding raw disk mappings.

  • Cancellation of Recovery Plan Not Completed

    When a recovery plan is run, an attempt is made to synchronize virtual machines. It is possible to cancel the recovery plan, but attempts to cancel the recovery plan run do not complete until the synchronization either completes or expires. The default expiration is 60 minutes. The following options can be used to complete cancellation of the recovery plan:

    • Pause vSphere Replication, causing synchronization to fail. After recovery enters an error state, use the vSphere Client to restart vSphere Replication in the vSphere Replication tab. After replication is restarted, the recovery plan can be run again, if desired.
    • Wait for synchronization to complete or time out. This might take considerable time, but does eventually finish. After synchronization finishes or expires, cancellation of the recovery plan continues.

  • Non-ASCII Passwords Not Accepted For Log In To Virtual Appliance Management Infrastructure (VAMI)

    Users can manage the vSphere Replication appliance using VAMI. Attempts to log on to VAMI with an account with a password that uses non-ASCII character fails. This occurs even when correct authentication information is provided. This issue occurs in all cases where non-ASCII passwords are used with VAMI. To avoid this issue, use ASCII passwords or connect using SSH.

  • Stopping Datastore Replication for Protected Virtual Machines Produces Incorrect Error Messages

    It is possible to protect a virtual machine that has disks on multiple datastores and then subsequently disable replication for one of the datastores. In such a case, the virtual machine's status in the protection group changes to Invalid: Virtual machine 'VM' is no longer protected. Internal error: Cannot create locator for disk'2001'... This information is incorrect. The status should change to Datastore '[datastore name]' is no longer replicated.

  • Virtual Machine Recovery Fails Due to Disk Configuration Error

    It is possible to place different disks and configuration files for a single protected virtual machine on multiple datastores. During recovery, SRM must have access to raw disk mapping and parent disk files. Without this access, SRM cannot determine disk types during recovery. In such a case, SRM might assume that a Raw Disk Mapping (RDM) disk is a non-RDM disk, resulting in a failed reconfiguration. To avoid this issue, ensure all hosts that can access recovered virtual machine configuration files can also access RDM mapping files and any parent disks, if such disks exist.

  • Recovery Fails to Progress After Connection to Protected Site Fails

    If the protection site becomes unreachable during a deactivate operation or during RemoteOnlineSync or RemotePostReprotectCleanup, both of which occur during reprotect, then the recovery plan might fail to progress. In such a case, the system waits for the virtual machines or groups that were part of the protection site to complete those interrupted tasks. If this issue occurs during a reprotect operation, you must reconnect the original protection site and then cancel and restart the recovery plan. If this issue occurs during a recovery, it is sufficient to cancel and restart the recovery plan.

  • vSphere Replication Appliance Fails to Support Valid ESX Hosts

    During vSphere Replication configuration, when a datastore is being selected on a supported version of ESX, the message VR server Server Name has no hosts through which to access destination datastore ... appears. This occurs when adding a new host to vCenter Server or during registration of vSphere Replication server, if there is a temporary interruption of communication between the vSphere Replication appliance and the vSphere Replication server. Communication problems typically arise due to temporary loss of connectivity or to the server services being stopped.

    To resolve this issue, restart the vSphere Replication management server service.

    1. Log into the virtual appliance management interface (VAMI) of the vSphere Replication appliance at https://vr_applliance_address:5480.
    2. Click Configuration > Restart under Service Status.

  • Datastores Fail to Unmount When on Distributed Power Management (DPM) Enabled Clusters

    Planned migrations and disaster recoveries fail to unmount datastores from hosts that are attached to a DPM cluster if the host enters standby mode. The error Error: Cannot unmount datastore datastorename from host hostname. Unable to communicate with the remote host, since it is disconnected might appear. To resolve this issue, turn off DPM at the protected site before completing planned migrations or disaster recoveries. You can choose to turn DPM back on after completing recovery tasks.

  • SRM fails to recover virtual machines after RDM failures.

    Raw Disk Mapping (RDM) LUNs might fail while LUNs that back datastores are unaffected. In such a case, SRM cannot recover virtual machines with RDMs.

    Workaround: Recover affected virtual machines manually. Failover the RDM LUN and reattach it as an RDM disk on the recovered virtual machine.

  • Error in recovery plan when shutting down protected virtual machines: Error - Operation timed out: 900 seconds during Shutdown VMs at Protected Site step.

    If you use SRM to protect datastores on arrays that support dynamic swap, for example Clariion, running a disaster recovery when the protected site is partially down or running a force recovery can lead to errors when re-running the recovery plan to complete protected site operations. One such error occurs when the protected site comes back online, but SRM is unable to shut down the protected virtual machines. This error usually occurs when certain arrays make the protected LUNs read-only, making ESXi unable to complete I/O for powered on protected virtual machines.

    Workaround: Reboot ESXi hosts on the protected site that are affected by read-only LUNs.

  • Protect virtual machine task appears to remain at 100%.

    The VI Client Recent Tasks pane shows a virtual machine stuck at 100% during the Protect VM task. SRM marks the virtual machine as Configured, indicating that it was protected. You do not need to take action as SRM successfully protected the virtual machine.

  • Cleanup fails if attempted within 10 minutes after restarting recovery site ESXi hosts from maintenance mode.

    The cleanup operation attempts to swap placeholders and relies on the host resilience cache which has a 10 minute refresh period. If you attempt a swap operation on ESXi hosts that have been restarted within the 10 minute window, SRM does not update the information in the SRM host resiliency cache, and the swap operation fails. The cleanup operation also fails.

    Workaround: Wait for 10 minutes and attempt cleanup again.

  • SRM stops during an attempt to protect an already reprotected array-based virtual machine using vSphere Replication.

    If you run a recovery, then try to use vSphere Replication to protect a virtual machine already protected by an array-based protection group, SRM Server asserts.

    Workaround: Restart SRM Server and unprotect the array-based protected virtual machine first before protecting with vSphere Replication. Alternatively, continue with array-based protection and do not not protect with vSphere Replication. SRM does not support protecting with both providers.

  • Cannot configure a virtual machine with physical mode RDM disk even if the disk is excluded from replication.

    If you configure a replication for a virtual machine with physical mode, you might see the following error:

    VRM Server generic error. Check the documentation for any troubleshooting information. The detailed exception is: HMS can not set disk UUID for disks of VM : MoRef: type = VirtualMachine, value = , serverGuid = null'.

    Workaround: None.

  • Planned migration fails with Error: Unable to copy the configuration file...

    If there are two ESXi hosts in a cluster and one host loses connectivity to the storage, the other host can usually recover replicated virtual machines. In some cases the other host might not recover the virtual machines and recovery fails with the following error: Error: Unable to copy the configuration file...

    Workaround: Rerun recovery.

  • While reprotecting a virtual machine, the following error might occur during the "Configure protection to reverse direction" step: Error - The operation was only partially completed for the protection group 'pg_name' since a protected VM belonging to it was not successful in completing the operation. VM 'vm_name' is not replicated by VR.

    This error occurs during the second reprotect run if the first run failed with Operation Timed out error during "Configure storage to reverse direction" step.

    Workaround: Manually configure reverse replication for the affected virtual machines and rerun reprotect. For information on reverse replication, see vSphere Replication Administration: Failback of Virtual Machines in vSphere Replication.

  • Internal error occurs during recovery.

    SRM retrieves various information from vCenter during the recovery process. If it does not receive critical information required to proceed, an internal error CannotFetchVcObjectProperty can occur. This error might occur when vCenter is under heavy stress or an ESXi host becomes unavailable due to heavy stress. This error might also occur when SRM tries to look up information of an ESXi host that is in a disconnected state or has been removed from vCenter inventory.

    Workaround: Rerun the recovery plan.

  • Recovered VMFS volume fails to mount with error: Failed to recover datastore.

    This error might occur due to a latency between vCenter, ESXi and SRM Server.

    Workaround: Rerun the recovery plan.

  • A recovery or test workflow fails for a virtual machine with the following message: Error - Unexpected error '3008' when communicating with ESX or guest VM: Cannot connect to the virtual machine.

    Under rare circumstances this error might occur when you configure IP customization or an in-guest callout for the virtual machine and the recovery site cluster is in fully-automated DRS mode. An unexpected vMotion might cause a temporary communication failure with the virtual machine, resulting in the customization script error.

    Workaround: Rerun the recovery plan. If the error persists, configure the recovery site cluster DRS to manual mode and rerun the recovery plan.

  • Some SRM initiated tasks that fail with a NoPermission error and displays Internal Error: vim.fault.NoPermission instead of Permission to perform this operation was denied.

    The vSphere Client asserts if a mirrored task contains a MoRef to an object that is not a vCenter Server or SRM object.

    Workaround: If the failed SRM task is a recovery task, consult the recovery task pane for a more specific error. For a vCenter Server task failure, see the subtasks which contain more information.

  • Reprotect fails with an error message that contains Unable to communicate with the remote host, since it is disconnected.

    This error might be due to the fact that the protected side cluster has been configured to use Distributed Power Management (DPM), and one of the ESX hosts required for the operation was put into standby mode. This could happen if DPM detected that the host had been idle, and put it in the standby mode. SRM had to communicate to the host in order to access the replicated datastore managed by this host. SRM does not manage the DPM state on the protected site but does, however, manage the DPM state during recovery, test, and cleanup on the recovery site.

    Workaround: If the error persists, temporarily turn off DPM and ensure the ESX hosts managing the replicated datastores on the protected side are turned on before attempting to run reprotect.

  • When protection site LUNs encounter All Paths Down (APD) or Permanent Device Loss (PDL), SRM might not recover raw disk mapping (RDM) LUNs in certain cases.

    During the first attempt at planned migration you might see the following error message when SRM attempts to shut down the protected virtual machine:

    Error - The operation cannot be allowed at the current time because the virtual machine has a question pending: 'msg.hbacommon.askonpermanentdeviceloss:The storage backing virtual disk VM1-1.vmdk has permanent device loss. You might be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.

    If the protected virtual machines have RDM devices, in some cases SRM does not recover the RDM LUN.

    Workaround:

    1. When LUNs enter APD/PDL, ESXi Server marks all corresponding virtual machines with a question that blocks virtual machine operations.
      1. In the case of PDL, click Cancel to power off the virtual machine.
      2. In the case of APD, click Retry.

      If you run planned migration, SRM fails to power off production virtual machines.
    2. If the virtual machines have RDM devices, SRM might lose track of the RDM device and not recover it. Rescan all HBAs and make sure that the status for all of the affected LUNs has returned from the APD/PDL state.
    3. Check the vCenter Server inventory and answer the PDL question that is blocking the virtual machine.
    4. If you answer the PDL question before the LUNs come back online, SRM Server on the protected site incorrectly detects that the RDM device is no longer attached to this virtual machine and removes the RDM device. The next time you run a recovery, SRM does not recover this LUN.
    5. Rescan all HBAs to make sure that all LUNs are online in vCenter Server inventory and power on all affected virtual machines. vCenter Server associates the lost RDMs with protected virtual machines.
    6. Check the Array Managers tab in the SRM interface. If all the protected datastores and RDM devices do not display, click Refresh to discover the devices and recompute the datastore groups.
    7. Make sure that Edit Group Settings shows all of the protected datastores and RDM devices and that the virtual machine protection status does not show any errors.
    8. Start a planned migration to recover all protected LUNs, including the RDM devices.
  • After restarting vCenter Server, when using vSphere Replication, reprotect operations fail with Error - Unable to reverse replication for the virtual machine 'virtual_machine'. The session is not authenticated.

    After vCenter Server restarts, it fails to refresh some sessions that SRM uses to communicate with vSphere Replication and causes reprotect to fail.

    Workaround: Restart the SRM services on both the sites.

  • Test recovery cleanup might fail if one of the hosts loses connection to a placeholder datastore.

    If you ran a test recovery on a cluster with two hosts on a recovery site and one of the hosts in the cluster loses connection to a placeholder datastore, cleanup of the test recovery might fail.

    Workaround: Run cleanup in force mode. On the recovery site, manually remove placeholder virtual machines created on the host that lost connection to the placeholder datastore. Remove the virtual machine replication configuration and reconfigure the replication. Reconfigure virtual machine protection from protection group properties.

  • vSphere Replication reports "Datastore is not accessible" for datastores at a host added to vCenter Server inventory while registering vSphere Replication server.

    vSphere Replication selects all supported hosts from vCenter inventory and enables them as part of vSphere Replication registration. If you add a host to vCenter while vSphere Replication is still being registered, vSphere Replication does not select this host and it cannot access datastores on the recovery site.

    Workaround: Disconnect and reconnect the host in the vCenter inventory for vSphere Replication to enable it.

  • vSphere Replication server registration might take a long time depending on the number of hosts in the vCenter Server inventory.

    If the vCenter Server inventory contains a few hundred or more hosts, the Register VR server task takes an hour or more to complete, as vSphere Replication updates each host's SSL thumbprint registry. The vCenter Server Events pane displays Host is configured for vSphere Replication for each host as the vSphere Replication server registration task progresses.

    Workaround: Wait for the registration task to complete. After it finishes, you can use vSphere Replication for incoming replication traffic.

  • vSphere Replication registration might fail with error: VRM server generic error ... Row was updated or deleted by another transaction ... HostEntity #<host-managed-object-id>.

    The Register VR server operation might fail with this error if vCenter Server has a large number of hosts in its inventory and you perform the following actions while registration is in progress:

    • Remove a host from the vCenter Server inventory.
    • Remove and reconnect a host from the inventory.
    • Change the host's SSL thumbprint.

    Workaround: Retry the Register VR server operation.

  • Recovery fails with Error creating test bubble image for group ... The detailed exception is Error while getting host mounts for datastore:managed-object-id... or The object has already been deleted or has not been completely created.

    If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:

    • vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.
    • The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.

    You must manually reconfigure the replication to refresh the new ID.

    Workaround: If the primary site is no longer available, contact VMware Support for instructions about adding a special configuration entry in the vSphere Replication appliance database that triggers an automatic fix of the changed internal datastore ID to allow recovery. If the primary site is still available:

    1. Run a cleanup operation on the recovery plan that failed.
    2. In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.
    3. Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.
    4. Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.
    5. Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.

  • Including a percent (%) symbol in a folder name on the recovery site creates a new folder during replication.

    If you include a percent (%) symbol in the folder name on the recovery site and try to configure replication to that folder, the replication might be created in an incorrect folder with additional encoding. For example, if you create the folder %3dTest, vSphere Replication creates a new folder %253dTest and places the replication in this folder.

  • Context-sensitive help is not accessible in Internet Explorer 7.

    See KB 1009801.

  • Generating support bundles on a heavily loaded environment might disrupt ongoing vSphere Replication operations.

    Generating support bundles in heavily loaded environments can cause vSphere Replication connection problems during recovery operations. This specifically occurs if the storage for the vSphere Replication virtual machine is overloaded.

    Workaround: If an operation fails to start when the vSphere Replication server is blocked by generation of the support bundle, attempt to rerun the operation. Re-evaluate the expected storage bandwidth requirements of the cluster, as well as the network bandwidth if the storage is NAS.

  • Last Sync Size value for a virtual machine protected by vSphere Replication is the amount of data that has changed since the last synchronization.

    Even if you perform a full synchronization on a virtual machine that vSphere Replication protects, the Last Sync Size value shows the amount of data that has changed since the last synchronization, and not the size of the full virtual machine. This can be misinterpreted as meaning that the synchronization was not complete. After the initial synchronization, during a full synchronization of a virtual machine, vSphere Replication compares entire disks, but only transfers data that has changed, not the entire disk.

    To see the size and duration of the initial synchronization, you can check the Events that vSphere Replication posts to vCenter Server. This issue only occurs on ESXi 5.0.x hosts. This behavior has been clarified on ESXi 5.1 hosts.

  • Reprotect fails with an error when running multiple recovery plans concurrently.

    When running multiple recovery plans conconcurrently, reprotect can fail with the error Error - The operation was only partially completed for the protection group 'protection_group' since a protected VM belonging to it was not successful in completing the operation.

    Workaround: Run the reprotect operation again.

  • Recovery or test recovery might fail with the error "No host with hardware version '7' and datastore 'ds_id' which are powered on and not in maintenance mode are available..." in cases in which very recent changes occur in the host inventory.

    SRM Server keeps a cache of the host inventory state. Sometimes when there are recent changes to the inventory, for example if a host becomes inaccessible, is disconnected, or loses its connection to some of the datastores, SRM Server can require up to 15 minutes to update its cache. If SRM Server has the incorrect host inventory state in its cache, a recovery or test recovery might fail.

    Workaround: Wait for 15 minutes before running a recovery if you have made changes to the host inventory. If you observe the error above, wait for 15 minutes then re-run the recovery.

  • Reprotect fails with error: Operation timed out: 7200 seconds VR synchronization failed for VRM group <Unavailable>. Operation timed out: 7200 seconds.

    When you run reprotect, SRM performs an online sync for the replication group which might time out the operation. The default timeout value is 2 hours.

    Workaround: Increase the timeout value in Advanced Settings in SRM.

  • Recovery takes a long time to finish and reprotect fails with error Cannot check login credentials. Authentication service infrastructure failed.

    This error occurs due to the exhaustion of ephemeral ports in vCenter Server running on Windows 2003 server. The SRM Server cannot communicate with vCenter Server.

    Workaround:

    1. Install the Microsoft hotfix from KB 979230 to fix a problem in the tcpip.sys driver.
    2. Set the following regedit values, either by making the changes manually or by importing the following .reg file:
      Windows Registry Editor Version 5.00
      
      [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
      "MaxUserPort"=dword:00002710
      "TcpTimedWaitDelay"=dword:0000001E
    3. If the registry values do not exist, create them.
    4. Restart the Windows 2003 Server machine after making the changes.
  • vSphere Replication appliance status is Disconnected when running the SRM client plug-in on Windows XP or Windows 2003.

    The status of the vSphere Replication appliance shows as Disconnected in the Summary tab for a vSphere Replication site. Attempting to reconfigure the connection results in the error Lost connection to local VRMS server at server_address:8043. (The client could not send a complete request to the server 'server_address'. (The underlying connection was closed: An unexpected error occurred on a send.)). This problem occurs because the SRM client plug-in and vSphere Client cannot negotiate cryptography when the SRM client plug-in runs on older versions of Windows. If you run the desktop version of vSphere Client and SRM client plug-in on Windows XP 64-bit or Windows Server 2003 SP2, you might encounter incompatibilities between server and client cryptography support.

    Workaround: Download and install the Microsoft Hotfix from Microsoft KB 948963. This hotfix is not applied in any regular Windows updates so you must manually download and apply the fix.

  • Synchronize virtual machine, recovery, or reprotect operations fail with vSphere Replication.

    This error can occur when you request a synchronize operation or when you run operations such as recovery or reprotect. The errors reported are similar to the following:

    • VR synchronization failed for VRM group group. VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: 'The requested instance with Id=ID was not found on the remote site.'.
    • Error - VR synchronization failed for VRM group group. Storage is locked for datastore path '[path] *.vmdk.vmdk'.

    This error is more likely to occur when running virtual machines with a high workload on the protected site.

    Workarounds:

    • Retry the operation. This might not succeed.
    • As this issue is related to the load running on the protected site, schedule the recoveries for times outside of business hours.
    • If attempting a test recovery, do not enable the "Replicate recent changes to recovery site" option.
    • Upgrade to SRM 5.5. SRM 5.5, in conjunction with vSphere 5.5, includes a number of updates that resolve this issue.
  • Pairing Sites Fails Due to Different Certificate Trust Methods

    When pairing SRM sites, the error Local and Remote servers are using different certificate trust methods appears. This occurs when the root certificate for the Certificate Authority (CA) signing the certificate is missing on SRM Server. To resolve this issue, install the root certificate for the SRM certificate's signing Certificate Authority using Microsoft Management Console. After installing the certificate, perform an SRM installation Modify operation to provide the user-generated certificate again.

  • Outdated Replication Status Displayed if Datastore Becomes Unavailable

    It is possible that after virtual machine synchronization begins, the target datastore becomes unavailable. In such a case, the group status should display information about this failure, but the status remains unchanged. To identify issues related to datastore unavailability, use the events generated by the target datastore. The following events are generated in such a case:

    • Datastore is not accessible for VR Server... Generated immediately after datastore becomes inaccessible
    • Virtual machine vSphere Replication RPO is violated... Replica can not be generated within the specified RPO

  • Generic Error Message Is Displayed When Server Pairing Fails Due to Certificate Policy Strictness

    Attempts to pair servers between sites might fail, displaying the following error message: Site pairing or break operation failed. Details: VRM Server generic error. This error might occur when one site being configured to use a strict certificate policy and the other site being configured to use a lenient certificate policy. In such a case, the pairing should fail, as it does. After such a failure, modify the lenient certificate policy to use strict certificate policy and provide a valid certificate.

  • Rerunning reprotect fails with error: Protection Group '{protectionGroupName}' has protected VMs with placeholders which need to be repaired.

    If a ReloadFromPath operation does not succeed during the first reprotect, the corresponding protected virtual machines enter a repairNeeded state. When SRM runs a reprotect on the protection group, SRM cannot repair the protected virtual machines nor restore the placeholder virtual machines. The error occurs when the first reprotect operation fails for a virtual machine because the corresponding ReloadFromPath operation failed.

    Workaround: Rerun reprotect with the force cleanup option enabled. This option completes the reprotect operation and enables the Recreate placeholder option. Click Recreate placeholder to repair the protected virtual machines and to restore the placeholder virtual machines.

  • vSphere Replication cannot access datastores through hosts with multiple management virtual NICs and posts DatastoreInaccessibleEvent in vCenter Server: vSphere Replication cannot access datastore.

    If a host is configured with multiple virtual NICs and you select more than one NIC for management traffic, vSphere Replication registers only the first NIC and uses it to access target datastores. If the vSphere Replication server address is not on the first management network of the host, vSphere Replication does not communicate with the host.

    Workaround: Use a host with a single virtual NIC selected for management traffic for datastores at the secondary site. You can also reconfigure the host networking so that the address of the first management virtual NIC is from a network that vSphere Replication can access.

  • A virtual machine cannot power off due to a pending question error.

    If you create a permanent device loss (PDL) situation, accidentally or deliberately, by dropping an initiator from the SAN to the host where the virtual machine is registered, you might see the following error:

    Error: The operation cannot be allowed at the current time because the VM has a question pending...

    This error occurs if hardware fails on the recovery site during PDL while running a clean up after you ran a recovery plan in test recovery mode.

    Workaround: Answer the question in the virtual machine Summary tab. Then rerun clean up in force clean up mode. After the clean up operation completes, the virtual machine might still exist on the recovery site, in which case, remove it manually.

  • vSphere Replication appliance on a shared recovery site shows as disconnected in the SRM UI during reprotect.

    During reprotect, after configuring replications in the reverse direction, SRM and the vSphere Replication appliance wait for the initial syncs to complete in the reverse direction. If you have more than 100 virtual machines in the recovery plan, the monitoring tasks of the initial syncs cause the vSphere Replication appliance to become unresponsive to calls from the SRM UI.

    Workaround: Wait for the initial sync of the reversed replications to complete. The replication status in the SRM UI eventually changes from Sync to OK. Re-try connecting from the SRM UI to the vSphere Replication appliance on the shared site.

  • Logging out of the SRM C# client during reprotect causes the sync step of a reprotect operation to fail with the error Error getting local VC view.

    During reprotect, after configuring replications in the reverse direction, SRM and the vSphere Replication appliance wait for the initial syncs to complete in the reverse direction. SRM triggers a separate sync operation to verify that there are no issues with the replication. If you log out of the SRM UI during this sync operation, the reprotect operation continues, but the vSphere Replication cannot impersonate the user and the sync operation fails.

    Workaround: Ignore the error shown in the sync step of the reprotect workflow, or use the SRM UI to manually trigger a sync after the reprotect is complete.

  • Modifying or repairing an SRM Server installation requires user Administrator or disabling of UAC.

    If you are a member of the Administrators group but you are not an administrator, you must disable Windows User Account Control (UAC) before you attempt to modify or repair an SRM Server installation from the Windows control panel. If you have installed SRM Server on a Windows Server 2012 host, you disable UAC by modifying the registry.

    Workaround: Log into the Windows Server 2012 machine as Administrator when you run the SRM installer in Modify or Repair mode, or disable User Access Control (UAC). To disable UAC on Windows Server 2012, see http://social.technet.microsoft.com/wiki/contents/articles/13953.windows-server-2012-deactivating-uac.aspx.

  • Running the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in an error.

    If you installed SRM by using the CUSTOM_SETUP option, for example to create a shared recovery site setup, attempting to run the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in the error CUSTOM_SETUP command line not supported when standard installation already exists.

    Workaround: Use Windows control panel to start the SRM installer in Modify mode.

  • SRM stops unexpectedly with a panic during a test recovery. You can restart the SRM service but it stops again when you continue the test.

    On rare occasions, SRM stops unexpectely during test recovery with the error Panic: Assert Failed: "this->_childJobs.size() == this->_jobsToRemove.size() (Invalid state, job cannot complete while child jobs are pending. Attempts to rerun the test after restarting the SRM service result in SRM stopping with the same error. This issue occurs if the SRM database is in an incorrect state.

    Workaround: Contact VMware support.

  • Performing simultaneous test recoveries on a shared recovery site setup shows a duplicate name error in the Recent Tasks view on the recovery site.

    Performing multiple simultaneous test recoveries in a shared recovery site setup (N:1 environment) can report the following errors in different circumstances:

    • The specified key, name, or identifier already exists, when adding a virtual switch or port group.
    • The object or item referred to could not be found, when removing a virtual switch or port group during test cleanup.
    • The resource 'resoure_name' is in use, when removing a virtual switch or port group during test cleanup.

    This problem is due to naming collisions.

    Workaround: Re-run test recoveries or cleanup operations sequentially on the same vCenter Server instance. Run one workflow at a time for all registered SRM Server instances on the same vCenter Server instance. This issue does not occur in all shared recovery site environments and it is not usually necessary to run only one operation at a time. Naming collisions are relatively rare.

  • Reprotect fails after removing the disconnected host on the protected site.

    If you remove the disconnected host from the protected site, and run reprotect, the reprotect operation might fail with the error Internal error: std::exception 'class Vmacore::Exception".

    Workaround: Rerun Reprotect with the Force Cleanup option selected.