VMware

VMware vCenter Site Recovery Manager 5.1.3.x Release Notes

VMware vCenter Site Recovery Manager 5.1.3.1a | 07 APR 2016

VMware vCenter Site Recovery Manager 5.1.3.1 | 15 OCT 2015 | Build 3013964

VMware vCenter Site Recovery Manager 5.1.3 | 04 DEC 2014 | Build 2318417

Last updated: 07 APR 2016

Check for additions and updates to these release notes.

For information about the Site Recovery Manager 5.1.3.x patch releases, including details of any required vSphere Replication patches, see the corresponding knowledge base articles.

What's in the Release Notes

These release notes cover the following topics:

What's New in SRM 5.1.3

VMware vCenter Site Recovery Manager 5.1.3 adds the following improvements.

  • Support for the following databases:
    • SQL Server 2014
    • Oracle 12C
  • IP customization support for the following guest operating systems:
    • CentOS 5.10
    • CentOS 6.5
    • Oracle Linux 6.5
    • Oracle 5.10
    • Ubuntu 14.04
    • RedHat Enterprise Linux 7.0
    • Suse Linux Enterprise 12 SP0
  • Bug fixes described in Resolved Issues.

Localization

VMware vCenter Site Recovery Manager 5.1.3 is available in the following languages:

  • English
  • French
  • German
  • Japanese
  • Korean
  • Simplified Chinese

Compatibility

SRM Compatibility Matrix

For interoperability and product compatibility information, including supported guest operating systems and support for guest operating system customization, see the Compatibility Matrixes for VMware vCenter Site Recovery Manager 5.1.

Compatible Storage Arrays and Storage Replication Adapters

For the current list of supported compatible storage arrays and SRAs, see the Site Recovery Manager Storage Partner Compatibility Guide.

VMware VSA Support

SRM 5.1.3 can protect virtual machines that reside on the vSphere Storage Appliance (VSA) by using vSphere Replication. VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.1.3.

Installation and Upgrade

For an evaluation guide to assist with a technical walkthrough of major features and capabilities of Site Recovery Manager 5.1.3, see the VMware vCenter Site Recovery Manager Resources.

For the supported upgrade paths for SRM, see the VMware Product Interoperability Matrixes and select Solution Upgrade Path and VMware vCenter Site Recovery Manager.

Install SRM 5.1.3

To create a new installation of SRM 5.1.3, download and run the installer VMware-srm-5.1.3-2318417.exe.

See Installing SRM in Site Recovery Manager 5.1 Installation and Configuration.

Upgrade an Existing SRM 4.1.2 Installation to SRM 5.1.3

Upgrade SRM 4.1.2 to SRM 5.0.x before you upgrade to SRM 5.1.3.

See Upgrading SRM in the Site Recovery Manager 5.0 Administration Guide.

IMPORTANT: Upgrading vCenter Server directly from 4.1.2 to 5.1.3 is a supported upgrade path. However, upgrading SRM directly from 4.1.2 to 5.1.3 is not a supported upgrade path, and you must upgrade to SRM 5.0.x before you can upgrade to 5.1.3. When upgrading a vCenter Server 4.1.2 instance that includes an SRM 4.1.2 installation, you must also upgrade vCenter Server to version 5.0.x before you upgrade SRM to 5.0.x. If you upgrade vCenter Server from 4.1.2 to 5.1.x directly, when you attempt to upgrade SRM from 4.1.2 to 5.0.x, the SRM upgrade fails. SRM 5.0.x cannot connect to a vCenter Server 5.1.x instance.

Upgrade an Existing SRM 5.0.x Installation to SRM 5.1.3

To upgrade an existing SRM 5.0.x installation to SRM 5.1.3, download and run the installer VMware-srm-5.1.3-2318417.exe.

See Upgrading SRM in Site Recovery Manager 5.1 Installation and Configuration.

Upgrade an Existing SRM 5.1.x Installation to SRM 5.1.3

You perform the following steps to upgrade an existing SRM 5.1.x installation to SRM 5.1.3.

  1. Log into the machine on which you are running SRM Server on the protected site.
  2. Back up the SRM database using the tools that your database software provides.
  3. Download and run the installer VMware-srm-5.1.3-2318417.exe.
  4. Click Yes when prompted for confirmation that you want to upgrade SRM.
  5. Click Yes to confirm that you have backed up the SRM database.
  6. Click Finish when the installation completes.
  7. Repeat the upgrade process on the recovery site.

After you have upgraded SRM Server, you must reinstall the SRM client plug-in.

  1. Log into a machine on which you are running a vSphere Client instance that you use to connect to SRM.
  2. Uninstall the SRM 5.1 client plug-in.
  3. Log into a vSphere Client instance and connect to the vCenter Server to which SRM Server is connected.
  4. Select Plug-ins > Manage Plug-ins.
  5. Click Download and Install to install the SRM 5.1.3 client plug-in.
  6. When the plug-in installation completes, log into SRM and verify that the configuration from the previous version has been retained.
  7. Repeat the process for all vSphere Client instances that you use to connect to SRM Server.

Upgrade vSphere Replication to vSphere Replication 5.1.3

If you have installed vSphere Replication with a previous release and you upgrade to SRM 5.1.3, you must also upgrade vSphere Replication to version 5.1.3. You must also upgrade vSphere Replication servers to version 5.1.3. You must make sure that you have upgraded to SRM 5.1.3 before you upgrade vSphere Replication to version 5.1.3.

See Upgrade vSphere Replication in Site Recovery Manager Installation and Configuration.

To upgrade the vSphere Replication appliance and any additional vSphere Replication servers to version 5.1.3 via the virtual appliance management interface (VAMI), paste the following URL into the upgrade page of the VAMI:

https://vapp-updates.vmware.com/vai-catalog/valm/vmw/05d561bc-f3c8-4115-bd9d-22baf13f7178/5.1.3.0.latest

IMPORTANT: Do not select the option in Update > Settings in the VAMI to automatically update vSphere Replication. If you select automatic updates, VAMI updates vSphere Replication to the latest 5.x version, which might be incompatible with SRM and vCenter Server 5.1.x. Leave the update setting set to No automatic updates.

Operational Limits for SRM and vSphere Replication

For the operational limits of SRM 5.1.x and vSphere Replication 5.1.x, see http://kb.vmware.com/kb/2034768.

SRM SDKs

For a guide to using the SRM SOAP-based API, see VMware vCenter Site Recovery Manager API.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in Site Recovery Manager 5.1.3 are available at Download VMware vCenter Site Recovery Manager. You can also download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent generally available release of vCenter Site Recovery Manager.

Caveats and Limitations

  • Interoperability with Storage vMotion and Storage DRS
    Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.1.3 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters.

  • Interoperability with vCloud Director
    Site Recovery Manager 5.1.3 offers limited support for vCloud Director environments. Using SRM to protect virtual machines within vCloud resource pools (virtual machines deployed to an Organization) is not supported. Using SRM to protect the management structure of vCD is supported. For information about how to use SRM to protect the vCD Server instances, vCenter Server instances, and databases that provide the management infrastructure for vCloud Director, see VMware vCloud Director Infrastructure Resiliency Case Study.

  • Interoperability with vSphere Replication
    vSphere Replication supports a maximum disk size of 2032GB.

  • SRM 5.1.3 supports 2 Microsoft Cluster Server (MSCS) nodes
    vSphere 5.1.x supports up to 5 MSCS nodes. SRM 5.1.x supports 2 MSCS nodes. See Protecting MSCS and Fault Tolerant Virtual Machines in Site Recovery Manager Administration.

  • vSphere Replication appliance and vSphere Replication server appliances are subject to Novell Security Advisory CVE-2008-5161
    Novell Security Advisory CVE-2008-5161 relates to the SUSE Linux Enterprise Server (SLES) SP1, the operating system for the vSphere Replication appliance and vSphere Replication server appliances. Novell states in the advisory document at http://support.novell.com/security/cve/CVE-2008-5161.html that the security risks are low. If necessary, to further mitigate the security risks, you can follow the Novell advisory to modify your SSH configurations. For the vSphere Replication appliance and vSphere Replication server appliances, you can retain just the AES ciphers by adding the following directive in the sshd_config and ssh_config files:

  • Ciphers aes128-ctr,aes256-ctr

    Disable RC4 and all other CBC ciphers, including arcfour256, arcfour, aes128-cbc, and aes256-cbc.

  • Windows Server 2003 does not support SHA256RSA certificates. To install SRM 5.1.3 on Windows Server 2003 and use custom, signed SHA256RSA certificates, you must install one of the following hot fixes from Microsoft:

  • ESXi Server 5.0.0 and 5.0.1 and ESXi Server 5.1.0 do not support guest operating system customization of virtual machines that run Windows 8 or Windows Server 2012
    Support for guest operating system customization of Windows 8 and Windows Server 2012 virtual machines is available with ESXi Server 5.0 u2 and later and with ESXi Server 5.1 u1 and later. If you run ESXi Server 5.0 or 5.0.1 or ESXi Server 5.1.0 on the recovery site, test recoveries and real recoveries fail because the ESXi Server on the recovery site cannot complete the the Windows 8 or Windows Server 2012 boot process. The virtual machine on the recovery site does not display the Windows login screen. SRM displays an error message in the recovery steps to state that VM Tools is timing out during power on, before the IP customization step.

    Workaround: Upgrade ESXi Server on the recovery site to version 5.0 u2 or later or to ESXi Server 5.1 u1 or later.

  • The vSphere Replication appliance does not need the rpcbind service, so it is disabled in this release.

Resolved Issues

The following issues from previous releases have been resolved in this release.

  • NEW The Site Recovery Manager 5.1.3.1a patch release fixes a vulnerability issue in the glibc library that allows remote code execution

    Your vSphere Replication appliance might be impacted by a vulnerability in the glibc that allows remote code execution.

    For more information about how to install the Site Recovery Manager 5.1.3.1a patch release, see http://kb.vmware.com/kb/2144834

  • Incorrect per-CPU license counts.

    Site Recovery Manager used per-CPU licenses in the 1.0.x and 4.0.x releases. It is possible that fewer per-CPU licenses than are required by SRM 5.1 are granted. This has been fixed.

  • IPv6 guest OS customization passes extra parameters, resulting in customization failure.

    Running recovery with IPv6 guest customization results in an error while enumerating IPv6 addresses to be removed. An erroneous %value is appended to the address, leading to the error Invalid address parameter (IPv6_address%value). It should be a valid IPv6 address. This has been fixed.

  • SQL 2014 ODBC driver "ODBC Driver 11 for SQL Server" is supported.
  • Upgraded OpenSSL Library

    The OpenSSL library has been upgraded from 0.9.8 to 0.9.8zb and from 1.0.0 to 1.0.0n. See the OpenSSL security advisory at https://www.openssl.org/news/secadv_20140806.txt for information about the issues that these updates resolve.

  • Embedded ActivePerl implementation uses an incorrect version of OpenSSL.

    The ActivePerl implementation embedded in Site Recovery Manager previously used an incorrect version of OpenSSL. This has been fixed and Site Recovery Manager now uses ActivePerl 5.14.4, which includes OpenSSL 1.0.1h.

  • Setting Reserve all guest memory on a virtual machine results in the virtual machine on the recovery site still being managed by Site Recovery Mananger after a recovery.

    If you set Reserve all guest memory (All locked) on a virtual machine, the virtual machine on the recovery site retains the flag This entity is managed by solution VMware vCenter Site Recovery Manger Extension. After a recovery, the virtual machine on the recovery site should not be flagged as managed by Site Recovery Manager. This has been fixed.

  • Site Recovery Manager fails to collect logs from guest OS during guest OS customization.

    SRM fails to collect logs from the guest OS because of an extra trailing space in an environment variable in the IP customization script. Attempts to access the logs result in the error Error accessing guestcust.log at error code: 1 in the log output file. This has been fixed.

  • Site Recovery Manager stops unexpectedly after upgrading from 5.0 to 5.1.x

    Site Recovery Manager might stop unexpectedly after upgrading from 5.0 to 5.1.x, generating a core dump with the error Panic: TerminateHandler called. This can occur if ProviderDetails in the ProtectionGroupPeer table in the Site Recovery Manager database contains messages of the type hms.fault.datastore.StorageFault. This has been fixed.

  • Test recovery fails with error: Error creating test bubble image from group instance. The detailed exception is: 'java.net.SocketTimeoutException: Read timed out'.

    The default timeout value of the http connection between vSphere Replication appliance and vSphere Replication Server is 20 minutes. The error occurs when you trigger a test recovery when pruning task is in progress which might take more than 20 minutes if the disk is very large.

    Workaround: If using a freshly installed setup, increase the value of hms-default-vlsi-client-timeout in the table configentryentity. The timeout value is in miliseconds. If using an upgraded setup, add a record for hms-default-vlsi-client-timeout to the table configentryentity. Restart vSphere Replication.

  • vSphere Replication cannot create an image for a replication instance because another image already exists.

    VRMS does not receive a response from the vSphere Replication server and assumes that the operation has failed. Subsequent attempts to create a test image fail because the image is already present in the vSphere Replication server database. The server incorrectly reports as NullPointerException as the image is not present in the VRMS database. vSphere Replication throws InstanceHasImage fault.

    This problem has been fixed in this release.

  • Recovery with vSphere Replication fails when there is an inactive datastore at the secondary site. This datastore can be any datastore other than where the virtual machine is failing over and the server throws exception: nullPointerException.

    This problem has been fixed and the server no longer throws the exception.

  • If the datastore and datacenter names contain unexpected characters such as "&" then recovery with vSphere Replication fails.

    This problem has been fixed and recovery is successful.

  • Replication fails due to error "Unable to determine the controller type for disk '[datastore name] VM Name/VM Name.vmdk'. Disk file not found."

    vSphere Replication ignores the controller type of virtual RDM disk and returns a static value. This error no longer occurs.

Known Issues

The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.

  • Running recovery on multiple LUNs simultaneously results in errors and timeouts.

    If you have a large-scale SRM 5.1.x environment, that involves between 50 to 255 Fibre Channel LUNs, and if you run recovery on more than 50 LUNs simultaneously, you might notice recovery timeouts, errors, and failures related to the LUNs and in some cases to the virtual machines. In some cases, you might have to run the recovery plan multiple times before it succeeds. This occurs whether you are protecting the LUNs in a single recovery plan or in multiple recovery plans.

    Workaround: See KB 2059498.

  • Virtual Machine Recovery Fails Due to Disk Configuration Error

    It is possible to place different disks and configuration files for a single protected virtual machine on multiple datastores. During recovery, SRM must have access to raw disk mapping and parent disk files. Without this access, SRM cannot determine disk types during recovery. In such a case, SRM might assume that a Raw Disk Mapping (RDM) disk is a non-RDM disk, resulting in a failed reconfiguration. To avoid this issue, ensure all hosts that can access recovered virtual machine configuration files can also access RDM mapping files and any parent disks, if such disks exist.

  • Cannot configure a virtual machine with physical mode RDM disk even if the disk is excluded from replication.

    If you configure a replication for a virtual machine with physical mode, you might see the following error:

    VRM Server generic error. Check the documentation for any troubleshooting information. The detailed exception is: HMS can not set disk UUID for disks of VM : MoRef: type = VirtualMachine, value = , serverGuid = null'.

    Workaround: None.

  • Temporary Loss of vCenter Server Connections Might Create Recovery Problems for Virtual Machines with Raw Disk Mappings

    If the connection to the vCenter Server is lost during a recovery, one of the following might occur:

    • The vCenter Server remains unavailable, the recovery fails. To resolve this issue re-establish the connection with the vCenter Server and re-run the recovery.
    • In rare cases, the vCenter Server becomes available again and the virtual machine is recovered. In such a case, if the virtual machine has raw disk mappings (RDMs), the RDMs might not be mapped properly. As a result of the failure to properly map RDMs, it might not be possible to power on the virtual machine or errors related to the guest operating system or applications running on the guest operating system might occur.
      • If this is a test recovery, complete a cleanup operation and run the test again.
      • If this is an actual recovery, you must manually attach the correct RDM to the recovered virtual machine.

    Refer to the vSphere documentation about editing virtual machine settings for more information on adding raw disk mappings.

  • Cancellation of Recovery Plan Not Completed

    When a recovery plan is run, an attempt is made to synchronize virtual machines. It is possible to cancel the recovery plan, but attempts to cancel the recovery plan run do not complete until the synchronization either completes or expires. The default expiration is 60 minutes. The following options can be used to complete cancellation of the recovery plan:

    • Pause vSphere Replication, causing synchronization to fail. After recovery enters an error state, use the vSphere Client to restart vSphere Replication in the vSphere Replication tab. After replication is restarted, the recovery plan can be run again, if desired.
    • Wait for synchronization to complete or time out. This might take considerable time, but does eventually finish. After synchronization finishes or expires, cancellation of the recovery plan continues.

  • Non-ASCII Passwords Not Accepted For Log In To Virtual Appliance Management Infrastructure (VAMI)

    Users can manage the vSphere Replication appliance using VAMI. Attempts to log on to VAMI with an account with a password that uses non-ASCII character fails. This occurs even when correct authentication information is provided. This issue occurs in all cases where non-ASCII passwords are used with VAMI. To avoid this issue, use ASCII passwords or connect using SSH.

  • vSphere Replication Appliance Fails to Support Valid ESX Hosts

    During vSphere Replication configuration, when a datastore is being selected on a supported version of ESX, the message VR server Server Name has no hosts through which to access destination datastore ... appears. This occurs when adding a new host to vCenter Server or during registration of vSphere Replication server, if there is a temporary interruption of communication between the vSphere Replication appliance and the vSphere Replication server. Communication problems typically arise due to temporary loss of connectivity or to the server services being stopped.

    To resolve this issue, restart the vSphere Replication management server service.

    1. Log into the virtual appliance management interface (VAMI) of the vSphere Replication appliance at https://vr_applliance_address:5480.
    2. Click Configuration > Restart under Service Status.

  • Error in recovery plan when shutting down protected virtual machines: Error - Operation timed out: 900 seconds during Shutdown VMs at Protected Site step.

    If you use SRM to protect datastores on arrays that support dynamic swap, for example Clariion, running a disaster recovery when the protected site is partially down or running a force recovery can lead to errors when re-running the recovery plan to complete protected site operations. One such error occurs when the protected site comes back online, but SRM is unable to shut down the protected virtual machines. This error usually occurs when certain arrays make the protected LUNs read-only, making ESXi unable to complete I/O for powered on protected virtual machines.

    Workaround: Reboot ESXi hosts on the protected site that are affected by read-only LUNs.

  • SRM stops during an attempt to protect an already reprotected array-based virtual machine using vSphere Replication.

    If you run a recovery, then try to use vSphere Replication to protect a virtual machine already protected by an array-based protection group, SRM Server asserts.

    Workaround: Restart SRM Server and unprotect the array-based protected virtual machine first before protecting with vSphere Replication. Alternatively, continue with array-based protection and do not not protect with vSphere Replication. SRM does not support protecting with both providers.

  • Planned migration fails with Error: Unable to copy the configuration file...

    If there are two ESXi hosts in a cluster and one host loses connectivity to the storage, the other host can usually recover replicated virtual machines. In some cases the other host might not recover the virtual machines and recovery fails with the following error: Error: Unable to copy the configuration file...

    Workaround: Rerun recovery.

  • Recovered VMFS volume fails to mount with error: Failed to recover datastore.

    This error might occur due to a latency between vCenter, ESXi and SRM Server.

    Workaround: Rerun the recovery plan.

  • Some SRM initiated tasks that fail with a NoPermission error and displays Internal Error: vim.fault.NoPermission instead of Permission to perform this operation was denied.

    The vSphere Client asserts if a mirrored task contains a MoRef to an object that is not a vCenter Server or SRM object.

    Workaround: If the failed SRM task is a recovery task, consult the recovery task pane for a more specific error. For a vCenter Server task failure, see the subtasks which contain more information.

  • When protection site LUNs encounter All Paths Down (APD) or Permanent Device Loss (PDL), SRM might not recover raw disk mapping (RDM) LUNs in certain cases.

    During the first attempt at planned migration you might see the following error message when SRM attempts to shut down the protected virtual machine:

    Error - The operation cannot be allowed at the current time because the virtual machine has a question pending: 'msg.hbacommon.askonpermanentdeviceloss:The storage backing virtual disk VM1-1.vmdk has permanent device loss. You might be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.

    If the protected virtual machines have RDM devices, in some cases SRM does not recover the RDM LUN.

    Workaround:

    1. When LUNs enter APD/PDL, ESXi Server marks all corresponding virtual machines with a question that blocks virtual machine operations.
      1. In the case of PDL, click Cancel to power off the virtual machine.
      2. In the case of APD, click Retry.

      If you run planned migration, SRM fails to power off production virtual machines.
    2. If the virtual machines have RDM devices, SRM might lose track of the RDM device and not recover it. Rescan all HBAs and make sure that the status for all of the affected LUNs has returned from the APD/PDL state.
    3. Check the vCenter Server inventory and answer the PDL question that is blocking the virtual machine.
    4. If you answer the PDL question before the LUNs come back online, SRM Server on the protected site incorrectly detects that the RDM device is no longer attached to this virtual machine and removes the RDM device. The next time you run a recovery, SRM does not recover this LUN.
    5. Rescan all HBAs to make sure that all LUNs are online in vCenter Server inventory and power on all affected virtual machines. vCenter Server associates the lost RDMs with protected virtual machines.
    6. Check the Array Managers tab in the SRM interface. If all the protected datastores and RDM devices do not display, click Refresh to discover the devices and recompute the datastore groups.
    7. Make sure that Edit Group Settings shows all of the protected datastores and RDM devices and that the virtual machine protection status does not show any errors.
    8. Start a planned migration to recover all protected LUNs, including the RDM devices.
  • Context-sensitive help is not accessible in Internet Explorer 7.

    See KB 1009801.

  • Last Sync Size value for a virtual machine protected by vSphere Replication is the amount of data that has changed since the last synchronization.

    Even if you perform a full synchronization on a virtual machine that vSphere Replication protects, the Last Sync Size value shows the amount of data that has changed since the last synchronization, and not the size of the full virtual machine. This can be misinterpreted as meaning that the synchronization was not complete. After the initial synchronization, during a full synchronization of a virtual machine, vSphere Replication compares entire disks, but only transfers data that has changed, not the entire disk.

    To see the size and duration of the initial synchronization, you can check the Events that vSphere Replication posts to vCenter Server. This issue only occurs on ESXi 5.0.x hosts. This behavior has been clarified on ESXi 5.1 hosts.

  • Recovery or test recovery might fail with the error "No host with hardware version '7' and datastore 'ds_id' which are powered on and not in maintenance mode are available..." in cases in which very recent changes occur in the host inventory.

    SRM Server keeps a cache of the host inventory state. Sometimes when there are recent changes to the inventory, for example if a host becomes inaccessible, is disconnected, or loses its connection to some of the datastores, SRM Server can require up to 15 minutes to update its cache. If SRM Server has the incorrect host inventory state in its cache, a recovery or test recovery might fail.

    Workaround: Wait for 15 minutes before running a recovery if you have made changes to the host inventory. If you observe the error above, wait for 15 minutes then re-run the recovery.

  • Recovery takes a long time to finish and reprotect fails with error Cannot check login credentials. Authentication service infrastructure failed.

    This error occurs due to the exhaustion of ephemeral ports in vCenter Server running on Windows 2003 server. The SRM Server cannot communicate with vCenter Server.

    Workaround:

    1. Install the Microsoft hotfix from KB 979230 to fix a problem in the tcpip.sys driver.
    2. Set the following regedit values, either by making the changes manually or by importing the following .reg file:
      Windows Registry Editor Version 5.00
      
      [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
      "MaxUserPort"=dword:00002710
      "TcpTimedWaitDelay"=dword:0000001E
    3. If the registry values do not exist, create them.
    4. Restart the Windows 2003 Server machine after making the changes.
  • Synchronize virtual machine, recovery, or reprotect operations fail with vSphere Replication.

    This error can occur when you request a synchronize operation or when you run operations such as recovery or reprotect. The errors reported are similar to the following:

    • VR synchronization failed for VRM group group. VRM Server generic error. Please check the documentation for any troubleshooting information. The detailed exception is: 'The requested instance with Id=ID was not found on the remote site.'.
    • Error - VR synchronization failed for VRM group group. Storage is locked for datastore path '[path] *.vmdk.vmdk'.

    This error is more likely to occur when running virtual machines with a high workload on the protected site.

    Workarounds:

    • Retry the operation. This might not succeed.
    • As this issue is related to the load running on the protected site, schedule the recoveries for times outside of business hours.
    • If attempting a test recovery, do not enable the "Replicate recent changes to recovery site" option.
    • Upgrade to SRM 5.5. SRM 5.5, in conjunction with vSphere 5.5, includes a number of updates that resolve this issue.
  • Rerunning reprotect fails with error: Protection Group '{protectionGroupName}' has protected VMs with placeholders which need to be repaired.

    If a ReloadFromPath operation does not succeed during the first reprotect, the corresponding protected virtual machines enter a repairNeeded state. When SRM runs a reprotect on the protection group, SRM cannot repair the protected virtual machines nor restore the placeholder virtual machines. The error occurs when the first reprotect operation fails for a virtual machine because the corresponding ReloadFromPath operation failed.

    Workaround: Rerun reprotect with the force cleanup option enabled. This option completes the reprotect operation and enables the Recreate placeholder option. Click Recreate placeholder to repair the protected virtual machines and to restore the placeholder virtual machines.

  • vSphere Replication appliance on a shared recovery site shows as disconnected in the SRM UI during reprotect.

    During reprotect, after configuring replications in the reverse direction, SRM and the vSphere Replication appliance wait for the initial syncs to complete in the reverse direction. If you have more than 100 virtual machines in the recovery plan, the monitoring tasks of the initial syncs cause the vSphere Replication appliance to become unresponsive to calls from the SRM UI.

    Workaround: Wait for the initial sync of the reversed replications to complete. The replication status in the SRM UI eventually changes from Sync to OK. Re-try connecting from the SRM UI to the vSphere Replication appliance on the shared site.

  • Logging out of the SRM C# client during reprotect causes the sync step of a reprotect operation to fail with the error Error getting local VC view.

    During reprotect, after configuring replications in the reverse direction, SRM and the vSphere Replication appliance wait for the initial syncs to complete in the reverse direction. SRM triggers a separate sync operation to verify that there are no issues with the replication. If you log out of the SRM UI during this sync operation, the reprotect operation continues, but the vSphere Replication cannot impersonate the user and the sync operation fails.

    Workaround: Ignore the error shown in the sync step of the reprotect workflow, or use the SRM UI to manually trigger a sync after the reprotect is complete.

  • Modifying or repairing an SRM Server installation requires user Administrator or disabling of UAC.

    If you are a member of the Administrators group but you are not an administrator, you must disable Windows User Account Control (UAC) before you attempt to modify or repair an SRM Server installation from the Windows control panel. If you have installed SRM Server on a Windows Server 2012 host, you disable UAC by modifying the registry.

    Workaround: Log into the Windows Server 2012 machine as Administrator when you run the SRM installer in Modify or Repair mode, or disable User Access Control (UAC). To disable UAC on Windows Server 2012, see http://social.technet.microsoft.com/wiki/contents/articles/13953.windows-server-2012-deactivating-uac.aspx.

  • IP Customization fails due to a timeout when uploading customization scripts to virtual machines via the VIX API.

    Uploading IP customization scripts to virtual machines by using VIX when running recovery plans fails with a timeout.

    Workaround: None.

  • SRM Server stops unexpectedly if you run test cleanup after upgrading to SRM 5.1.3 without upgrading the SRAs.

    If you use array-based replication and upgrade SRM to version 5.1.3 but do not upgrade the SRAs, SRM Server stops unexpectedly when you run a test cleanup.

    Workaround: Upgrade the SRAs to their appropriate version for 5.1.3.

  • Test cleanup fails with a datastore unmounting error.

    Running cleanup after a test recovery can fail with the error Error - Cannot unmount datastore 'datastore_name' from host 'hostname'. The operation is not allowed in the current state.. This problem occurs if the host has already unmounted the datastore before you run the cleanup operation.

    Workaround: Rerun the cleanup operation.

  • Virtual machine VNIC's MAC address is usually preserved during recovery.

    Under very rare circumstances, test or recovery might fail to recover a specific virtual machine because vCenter unexpectedly assigns a new MAC address to the virtual machine's VNIC on the recovery site. The error message in the result column in the recovery steps is the following: Error - Cannot complete customization, possibly due to a scripting runtime error or invalid script parameters (Error code: 255). IP settings might have been partially applied. The SRM logs contain a message: Error finding the specified NIC for MAC address = xx::xx:xx:xx:xx where xx::xx:xx:xx:xx is the expected MAC address.

    Workaround: Modify the affected virtual machine's MAC address manually in the vSphere Client virtual machine Properties to "xx::xx:xx:xx:xx" and restart the recovery plan.

  • vSphere Replication reports "Datastore is not accessible" for datastores at a host added to vCenter Server inventory while registering vSphere Replication server.

    vSphere Replication selects all supported hosts from vCenter inventory and enables them as part of vSphere Replication registration. If you add a host to vCenter while vSphere Replication is still being registered, vSphere Replication does not select this host and it cannot access datastores on the recovery site.

    Workaround: Disconnect and reconnect the host in the vCenter inventory for vSphere Replication to enable it.

  • vSphere Replication server registration might take a long time depending on the number of hosts in the vCenter Server inventory.

    If the vCenter Server inventory contains a few hundred or more hosts, the Register VR server task takes an hour or more to complete, as vSphere Replication updates each host's SSL thumbprint registry. The vCenter Server Events pane displays Host is configured for vSphere Replication for each host as the vSphere Replication server registration task progresses.

    Workaround: Wait for the registration task to complete. After it finishes, you can use vSphere Replication for incoming replication traffic.

  • vSphere Replication registration might fail with error: VRM server generic error ... Row was updated or deleted by another transaction ... HostEntity #<host-managed-object-id>.

    The Register VR server operation might fail with this error if vCenter Server has a large number of hosts in its inventory and you perform the following actions while registration is in progress:

    • Remove a host from the vCenter Server inventory.
    • Remove and reconnect a host from the inventory.
    • Change the host's SSL thumbprint.

    Workaround: Retry the Register VR server operation.

  • Recovery fails with Error creating test bubble image for group ... The detailed exception is Error while getting host mounts for datastore:managed-object-id... or The object has already been deleted or has not been completely created.

    If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:

    • vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.
    • The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.

    You must manually reconfigure the replication to refresh the new ID.

    Workaround: If the primary site is no longer available, contact VMware Support for instructions about adding a special configuration entry in the vSphere Replication appliance database that triggers an automatic fix of the changed internal datastore ID to allow recovery. If the primary site is still available:

    1. Run a cleanup operation on the recovery plan that failed.
    2. In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.
    3. Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.
    4. Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.
    5. Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.

  • Outdated Replication Status Displayed if Datastore Becomes Unavailable

    It is possible that after virtual machine synchronization begins, the target datastore becomes unavailable. In such a case, the group status should display information about this failure, but the status remains unchanged. To identify issues related to datastore unavailability, use the events generated by the target datastore. The following events are generated in such a case:

    • Datastore is not accessible for VR Server... Generated immediately after datastore becomes inaccessible
    • Virtual machine vSphere Replication RPO is violated... Replica can not be generated within the specified RPO

  • Generic Error Message Is Displayed When Server Pairing Fails Due to Certificate Policy Strictness

    Attempts to pair servers between sites might fail, displaying the following error message: Site pairing or break operation failed. Details: VRM Server generic error. This error might occur when one site being configured to use a strict certificate policy and the other site being configured to use a lenient certificate policy. In such a case, the pairing should fail, as it does. After such a failure, modify the lenient certificate policy to use strict certificate policy and provide a valid certificate.

  • A virtual machine cannot power off due to a pending question error.

    If you create a permanent device loss (PDL) situation, accidentally or deliberately, by dropping an initiator from the SAN to the host where the virtual machine is registered, you might see the following error:

    Error: The operation cannot be allowed at the current time because the VM has a question pending...

    This error occurs if hardware fails on the recovery site during PDL while running a clean up after you ran a recovery plan in test recovery mode.

    Workaround: Answer the question in the virtual machine Summary tab. Then rerun clean up in force clean up mode. After the clean up operation completes, the virtual machine might still exist on the recovery site, in which case, remove it manually.

  • Datastores Fail to Unmount When on Distributed Power Management (DPM) Enabled Clusters

    Planned migrations and disaster recoveries fail to unmount datastores from hosts that are attached to a DPM cluster if the host enters standby mode. The error Error: Cannot unmount datastore datastorename from host hostname. Unable to communicate with the remote host, since it is disconnected might appear. To resolve this issue, turn off DPM at the protected site before completing planned migrations or disaster recoveries. You can choose to turn DPM back on after completing recovery tasks.

  • Reprotect fails with an error message that contains Unable to communicate with the remote host, since it is disconnected.

    This error might be due to the fact that the protected side cluster has been configured to use Distributed Power Management (DPM), and one of the ESX hosts required for the operation was put into standby mode. This could happen if DPM detected that the host had been idle, and put it in the standby mode. SRM had to communicate to the host in order to access the replicated datastore managed by this host. SRM does not manage the DPM state on the protected site but does, however, manage the DPM state during recovery, test, and cleanup on the recovery site.

    Workaround: If the error persists, temporarily turn off DPM and ensure the ESX hosts managing the replicated datastores on the protected side are turned on before attempting to run reprotect.

  • Test recovery of a virtual machine with RDM fails at the Configure Storage step while powering on the virtual machine.

    Test recovery fails in the following situations:

    • A virtual machine with RDM configured is protected on the primary site.
    • In Sites > Resource Mappings, the protected site resource that contains the virtual machine is mapped to a vApp as the secondary site resource.

    Workaround: Map the virtual machine to a type of resource that is not a vApp, such as a host, on the secondary site.

  • Stopping Datastore Replication for Protected Virtual Machines Produces Incorrect Error Messages

    It is possible to protect a virtual machine that has disks on multiple datastores and then subsequently disable replication for one of the datastores. In such a case, the virtual machine's status in the protection group changes to Invalid: Virtual machine 'VM' is no longer protected. Internal error: Cannot create locator for disk'2001'... This information is incorrect. The status should change to Datastore '[datastore name]' is no longer replicated.

  • Internal error occurs during recovery.

    SRM retrieves various information from vCenter during the recovery process. If it does not receive critical information required to proceed, an internal error CannotFetchVcObjectProperty can occur. This error might occur when vCenter is under heavy stress or an ESXi host becomes unavailable due to heavy stress. This error might also occur when SRM tries to look up information of an ESXi host that is in a disconnected state or has been removed from vCenter inventory.

    Workaround: Rerun the recovery plan.

  • After restarting vCenter Server, when using vSphere Replication, reprotect operations fail with Error - Unable to reverse replication for the virtual machine 'virtual_machine'. The session is not authenticated.

    After vCenter Server restarts, it fails to refresh some sessions that SRM uses to communicate with vSphere Replication and causes reprotect to fail.

    Workaround: Restart the SRM services on both the sites.

  • SRM fails to recover virtual machines after RDM failures.

    Raw Disk Mapping (RDM) LUNs might fail while LUNs that back datastores are unaffected. In such a case, SRM cannot recover virtual machines with RDMs.

    Workaround: Recover affected virtual machines manually. Failover the RDM LUN and reattach it as an RDM disk on the recovered virtual machine.

  • Protect virtual machine task appears to remain at 100%.

    The VI Client Recent Tasks pane shows a virtual machine stuck at 100% during the Protect VM task. SRM marks the virtual machine as Configured, indicating that it was protected. You do not need to take action as SRM successfully protected the virtual machine.

  • Cleanup fails if attempted within 10 minutes after restarting recovery site ESXi hosts from maintenance mode.

    The cleanup operation attempts to swap placeholders and relies on the host resilience cache which has a 10 minute refresh period. If you attempt a swap operation on ESXi hosts that have been restarted within the 10 minute window, SRM does not update the information in the SRM host resiliency cache, and the swap operation fails. The cleanup operation also fails.

    Workaround: Wait for 10 minutes and attempt cleanup again.

  • Reprotect fails after removing the disconnected host on the protected site.

    If you remove the disconnected host from the protected site, and run reprotect, the reprotect operation might fail with the error Internal error: std::exception 'class Vmacore::Exception".

    Workaround: Rerun Reprotect with the Force Cleanup option selected.

  • Reprotect fails with an error when running multiple recovery plans concurrently.

    When running multiple recovery plans conconcurrently, reprotect can fail with the error Error - The operation was only partially completed for the protection group 'protection_group' since a protected VM belonging to it was not successful in completing the operation.

    Workaround: Run the reprotect operation again.

  • Test recovery cleanup might fail if one of the hosts loses connection to a placeholder datastore.

    If you ran a test recovery on a cluster with two hosts on a recovery site and one of the hosts in the cluster loses connection to a placeholder datastore, cleanup of the test recovery might fail.

    Workaround: Run cleanup in force mode. On the recovery site, manually remove placeholder virtual machines created on the host that lost connection to the placeholder datastore. Remove the virtual machine replication configuration and reconfigure the replication. Reconfigure virtual machine protection from protection group properties.

  • SRM stops unexpectedly with a panic during a test recovery. You can restart the SRM service but it stops again when you continue the test.

    On rare occasions, SRM stops unexpectely during test recovery with the error Panic: Assert Failed: "this->_childJobs.size() == this->_jobsToRemove.size() (Invalid state, job cannot complete while child jobs are pending. Attempts to rerun the test after restarting the SRM service result in SRM stopping with the same error. This issue occurs if the SRM database is in an incorrect state.

    Workaround: Contact VMware support.