VMware

VMware Site Recovery Manager Release Notes

Site Recovery Manager 1.0 | 06/19/08 | Build 97878


VMware Site Recovery Manager 1.0 is the initial release.

These release notes contain the following sections:

Features

Site Recovery Manager (SRM) 1.0 is a new disaster recovery, workflow-automation product. Leveraging array replication between a protected site and a recovery site, SRM installs into VMware VirtualCenter as a plugin. SRM simplifies disaster recovery, increases reliability and reduces costs. SRM provides the following features.

Provides Simple Setup and Configuration
  • Uses VirtualCenter to protect virtual machines.
  • Automates discovery of what virtual machines are replicated by your underlying Fibre Channel or iSCSI replication systems and are available to be configured for recovery.
  • Makes grouping of virtual machines and assigning recovery policies based on business requirements easy.
  • Automates the management of replicated virtual machines into the VirtualCenter Server at the recovery site.
  • Enables quality of service guarantees for recovered virtual machines.
  • Manages network connectivity and configuration of recovered virtual machines.

Press One Button to Start the Recovery Process

  • Shuts down non-essential virtual machines at the recovery site.
  • Automatically stops replication and promotes the read-only replica at the recovery site to read-write.
  • Rescans the VMware ESX Servers at the recovery site.
  • Registers the replicated virtual machines.
  • Completes power-up of replicated virtual machines in the order specified during creation of the disaster recovery plan.

Testing Automation for Increased Reliability

  • Allows testing to be done at any time.
  • Creates a test environment that is isolated from the production environment so testing requires no down time for production systems.
  • Provides a report of test results.
  • Resets everything in preparation for a disaster.

Supports Bi-Directional Protection Between Two Sites

  • Reduces the need for costly hardware to sit idle at recovery site.
  • Reduces maintenance costs and power consumption with less hardware at the recovery site.

Compatibility

SRM requires VMware Infrastructure 3 Standard or Enterprise. The Site Recovery Manager Compatibility Matrixes guide provides details on the compatibility of versions of VMware Infrastructure 3 components, and for supported databases, guest operating systems, and storage arrays.

This release version requires or functions with the listed version of the following servers and software:

ESX

  • ESX Server 3.0.2 Update 1
  • ESX 3.5 Update 1
VirtualCenter
  • VirtualCenter Server 2.5 Update 1
  • VI Client 2.5 Update 1

No Support for Microsoft Cluster Service (MSCS)
SRM cannot be configured as an application for failover with MSCS. Virtual machines running MSCS cannot be used for SRM protection.

Experimental Support for RDMs
SRM supports RDMs experimentally. VMware encourages you to try features in test environments and report issues, but do not put RDMs in production SRM workflows.

Installation Notes for This Release

Read the Getting Started with Site Recovery Manager for a high-level architectural overview and workflow.

Read the Administration Guide for Site Recovery Manager for step-by-step guidance on installing and configuring SRM. This guide contains information about all requirements and procedures to set up SRM and also addresses minimum requirements and scaling limits.

Read the Site Recovery Manager Evaluation Guide for a conceptual overview as well as step-by-step workflows describing planning for using SRM, setting up protected and recovery sites, testing failover, the failover and failback process, alarms and status monitoring, and a discussion of roles and privileges.

Known Issues

The following are known issues and configuration notes with Site Recovery Manager:

Database

  • SRM Supports Oracle 10.02.x.x Client Drivers
    SRM 1.0 supports Oracle 10.02.x.x client drivers. If you use other versions of the Oracle client drivers, you may be unable to pair sites or configure array managers using your Oracle database.
  • Mixed mode SQL Server Authentication
    When configuring a database connection to a SQL Server database that is not on the same host as the SRM Server, select mixed mode rather than Windows authentication.
  • Do Not Reinitialize the Database
    Do not reinitialize the database as doing so can cause SRM to stop working.

Installation

  • Non ASCII Characters are Not Supported
    SRM disallows non-ASCII characters during installation. Credentials must consist of standard ASCII characters. These are typically any characters that can be typed on a US/English keyboard. Special characters from other languages and ALT characters will cause the installation to fail.
  • Certificate Warnings when Connecting to SRM
    After you install the SRM plugin and connect the VI Client to the VirtualCenter server, SRM reports a certificate warning when the SRM plugin is enabled. This occurs when the VirtualCenter Server and SRM Server are installed on different machines.
    Workaround: Correct the information in the certificate so that the fully-qualified domain name on the certificate matches the address of the server you are trying to connect to.
  • Credentials for Data Source Name (DSN)
    During installation, SRM requests the datastore name and credentials for DSN. SRM encrypts and stores these internally in a credentials store. If the credentials for the data source change, you must update the credentials store to match. This is done by running the installcreds.exe utility found in the installation directory.
  • Installer Reports the Error: "Failed to Register Extension"
    During the SRM installation, the error message "failed to register extension" appears.
    Workaround: Click Retry to complete the installation.
  • SRM Installation Hangs on Windows XP if User Does Not Have Administrator Privileges
    Installing SRM requires administrator privileges. If installing SRM without administrator privileges on Windows XP, the installation hangs and the error: "Failed to find installer window" is reported to the installation log file.
    Workaround: If this error occurs, use Windows Task Manager to terminate the msiexec.exe process manually. Log in as administrator (or a user with administrator privileges) to perform the installation.
  • SRM Plugin is Not Visible on Windows Vista Business
    On Windows Vista Business host machines, the VI Client fails to display the SRM plugin in the Installed tab after the plugin is installed.
    Workaround: Close and reopen the VI Client. The SRM plugin appears in the list of available plugins as enabled.
  • Enabling and Disabling the SRM Plugin
    The VI Client fails to display the SRM user interface if the SRM plugin is disabled and then enabled.
    Workaround: Close and reopen the VI Client after you enable the SRM plugin.
  • SRM Server Installation Fails and Reports the Error: "Failed to Register Extension"
    During the SRM Server installation, the installation fails and reports the error message: "failed to register extension." SRM reports this error if VirtualCenter Server has license issues. For example, if the VirtualCenter Server isn't licensed, or it lost connection to its license server, registration of the extension fails during installation.
  • SRM PlugIn Support W2K Professional SP4 Update Rollup 1
    You must install Windows 2000 SP4 Update Rollup 1 using MSI installer version 3.1.4000.2435 in order to successfully install the Site Recovery Manager plugin.
  • SRM Does Not Support Using DSN Names with a Space at the End
    Do not use a DSN name with a space at the end (for example, "SRM DB "). If entering an extra space at the end of the DSN name in the database configuration screen of the SRM installer, the installation fails.
  • SRM Reports "msiexec.exe memory" Error During Installation
    SRM may report that the installer is no longer responding near the end of the SRM Server installation. This error exists but is very rare and does not cause any issues with SRM.
    Workaround: Close the error message.
  • Static IP addresses for Servers are Strongly Recommended
    It is strongly encouraged that you use static IP addresses for servers hosting the SRM, VirtualCenter, and database servers. Problems have occurred when IP addresses change during testing.
  • A Non-Specific Error Message Displays if the SRM Server is Down During SRM Plugin Installation
    If the SRM Server is down or unavailable when installing the SRM plugin in the VI Client, the VI Client displays the message "The remote server returned an error: (503) Server Unavailable."
    Workaround: Start the SRM Server.
  • Installation Fails When Using User-Provided PKCS12 Certificates
    Sometimes, SRM installation fails when using user-provided certificates in the PKCS12 format. The SRM installer expects PKCS12 files to contain exactly one server certificate and its private key.
    Workaround: Choose the auto-generate certificate option during installation.

Role and Permissions

  • The Default SRM Roles Do Not Include Specific Privileges for VirtualCenter
    To configure SRM, a user must have both VirtualCenter and SRM permissions. The default SRM roles (SRM Protection Administrator, SRM Recovery Administrator, and so forth) do not have specific privileges for VirtualCenter and therefore do not have adequate permissions to perform all SRM operations. The converse is also true; VirtualCenter roles do not provide any SRM privileges.
    Workaround: Ensure that SRM users have both VirtualCenter and SRM specific roles as appropriate. For more information, see "Managing Permissions and Roles" in the "Administration Guide" for Site Recovery Manager 1.0.
  • Vmomi.Fault.NoPermission Error Displayed if User Does Not Have Appropriate Permissions
    If a user does not have the appropriate permissions to perform an operation, the error text "Vmomi.Fault.NoPermission" displays.

SRM Service Failure

  • Do Not Change the Host Name
    After SRM is installed, do not change the host name of the machine where SRM is installed or the host name of the machine where VirtualCenter is installed. SRM will not start if either name is changed.
    Workaround: Change the name of the host machine back to the original name.
  • SRM Might Not Function Correctly if is Not on the Same Active Directory Domain as VirtualCenter
    If SRM is installed on a different machine from VirtualCenter, local machine accounts cannot be used to create permissions on the SRM Server. SRM monitors the permissions for protected virtual machines on the protected site. If a user does not have permission to access the machine that SRM runs on, but is given permissions for a SRM-protected virtual machine, then SRM's permissions could become out of sync until the unknown user is removed from the virtual machine's permissions.
    Workaround: SRM and VirtualCenter should reside in the same Active Directory domain, and only users, who are part of that domain, should be given permissions for protected virtual machines.
  • SRM Will Not Start
    SRM will not start if VirtualCenter is not running.
    Workaround: Ensure that VirtualCenter is running before trying to start SRM.

VI Client and SRM Plugin

  • VI Client Does Not Display Current Information if the SRM Service Fails
    If the SRM service fails and then reconnects to the SRM Server, the VI Client does not display current information for Site Recovery.
    Workaround: Restart the SRM Service and then restart the VI Client.
  • VI Client Must Be Restarted if it Loses Connection with SRM
    Site connection is not updated if the local SRM server loses connectivity with the remote SRM server.
    Workaround: Restart the VI Client after the recovery SRM Server restarts.
  • Some Operations Should be Grayed Out
    Operations, for which the user does not have privileges, should be grayed out in the user interface. However, some of these operations can be selected. When they are selected, the operation fails due to an authorization failure.
  • VI Client Does Not Download the SRM Plugin Over a Secure Connection -->
  • SRM Plugin is Still Present After the VI Client is Uninstalled
    The SRM plugin is not uninstalled when the VI Client is uninstalled. After reinstalling the VI Client, the SRM plugin is still present.
    Workaround: Using the VI Client, uninstall the SRM plugin before uninstalling the VI Client.

Site Pairing

  • SRM Reports Error Messages When Breaking Site Connection
    After attempting to break the protected and recovery site connection, SRM reports the errors: "Unable to break the connection with remote site because it is currently user by other users" and "The request refers to an object that no longer exists or has never existed." These errors appear if the recovery user permissions are changed to "No Access" when the VI client is connected to the protected site.
    Workaround: Do not change user permissions to "No Access" from the recovery site while protected site VI Client is connected to remote site with this user. If you receive these errors, restart the protected site's SRM service and the VI Client.
  • Accepting Thumbprints for Secondary Servers During Site Pairing Reports "Incompatible Authentication Method" Error
    During site pairing, SRM suggests to accept thumbprints for the secondary server. Thumbprint certificate validation during pairing is not a valid option if SRM and VirtualCenter authentication is using trusted certificates.
  • VI Client Displays "Loading..." in the SRM Tab if the SRM Server is Unavailable
    If the SRM Server is not installed or available, the "Connect To VMware SRM Server" button displays and the SRM tab displays "Loading..." for the status of each SRM component.
    Workaround: Start the SRM service if it is not running.
  • Configure Array Managers Display is Not Refreshed After Connecting the Protected and Recovery Sites
    After reconnecting the protected and recovery sites, the Configure Array Managers summary information in the VI Client is not refreshed and the information is out of sync.
    Workaround: Restart VI Client then launch Site Recovery Manager.
  • Protected Sites Shows "Unable to Connect" After Successful Connection
    After successful connection between protected and recovery sites, the protected site reports "Unable to Connect" and eventually reports the error: "Low Resources on Pair..."
    Workaround:
    1. Restart the SRM Service.
    2. Close the VI Client for the recovery site.
    3. Break the connection and configure connection from the protected Summary page.
    4. Start the VI Client and log in to the recovery site.
    5. Select Site Recovery and configure the connection from the remote site.
  • During Site Connection, an SSL Exception error reports: "The host certificate chain is not complete"
    When trying to connect protected and recovery sites, a SSL Exception error reports: "The host certificate chain is not complete." This error occurs if the certificate on the protected site is changed before pairing with the recovery site.
    Workaround: Restart the SRM service on the protected site before pairing with the recovery site.
  • Error Message Displays While Breaking Recovery Site Connection
    Breaking the connection from the protected site to the recovery site displays the error: "Object reference not set to an instance of an object" after the sites are disconnected.
    Workaround: Acknowledge the error message.
  • Cannot Break Connection After the VI Client Process is Terminated Abnormally
    You cannot break the connection with the recovery site from the protected site if the vpxClient.exe process is not running. The error message: "Unable to break the connection with the remote site because it is currently used by other users" is reported. Workaround: Restart both SRM Servers then break the connection between the recovery site and the protected site.
  • Inventory Mappings Information is Incorrect
    After breaking and reconnecting site pairing, the VI Client at the protection site might not display correct information in Inventory Mappings.
    Workaround: Refresh the Inventory Mappings to display the actual mappings. Click the Refresh button from the Inventory Mappings tab.
  • Pairing Site to Itself Doesn't Fail in the Correct Step
    If you select Site Recovery > Configure and enter the local VirtualCenter Server IP address, SRM continues to the next connection step and asks for user credentials. The connection should fail when the local VirtualCenter Server's IP address is entered.
    Workaround: Do not attempt to connect to a local VirtualCenter Server.
  • Do Not Break Connection with the Recovery Site
    Before breaking the connection with the recovery site from the protected site, ensure that no VI Clients are actively connected to the recovery site.
  • Pairing Status is Not Updated
    During site pairing or after breaking site connection, the VI Client at the recovery site might not update the status of the pairing.
    Workaround: You should restart the VI Client at the recovery site to obtain the latest status.
  • When Pairing Sites, Use Trusted Certificates
    When pairing sites and the certificates of the recovery-site VirtualCenter Server and SRM Server are not trusted by the protection-site SRM server, yellow warning triangles, rather than green check boxes, appear to the left of the Certificate Validation steps. The yellow warning triangles warn the user that the given certificates did not pass the validation requirements that the certificates be signed by a trusted Certificate Authority (CA) and have a DNS value matching the address of the server. During the pairing, the user indicated that the certificates should be accepted based on their SHA-1 thumb-prints. It is a serious security violation to accept certificates based on their thumb-prints without verifying that the thumb-prints are correct.
    Workaround: Ensure that both VirtualCenter Servers and both SRM Servers use trusted certificates.

Protection Group

  • VI Client Inventory Reports the Error: "The request refers to an object that no longer exists or has never existed"
    After removing a protection group, the VI Client Inventory view on the recovery site is not refreshed. Attempting to select an object from the Inventory reports the error: "The request refers to an object that no longer exists or has never existed."
    Workaround: Restart the VI Client.
  • Protection Groups Display is Not Refreshed After Connecting the Protected and Recovery Sites
    After reconnecting the protected and recovery sites, the Protection Groups display in the VI Client is not refreshed and the information is out of sync.
    Workaround: Restart the VI Client. SanGroupVmRemoveEvent is Posted to VirtualCenter only if a Virtual Machine is Moved From a Replicated Datastore to a Non-Replicated Datastore -->

Recovery Plan

  • SRM Reports the Error: "Cannot execute scripts" When Customizing Windows Virtual Machines During Recovery
    During test recovery or recovery, when Windows guests are customized, occasionally the virtual machines attempt to shut down gracefully and SRM reports the error "Cannot execute scripts." This results in a hard shut down after customization is complete and the virtual machine remains powered off regardless of its recovery plan priority.
    Workaround: Manually power on the Windows virtual machines that report this error.
  • Shutdown Protected Virtual Machine Fails and Reports the Error: "ERROR:Operation timed out" During Recovery Plan
    During test recovery or recovery, if a recovery plan has multiple virtual machines configured with different power states (For example, powered-on, power-off and suspended), SRM reports the error "Error:Operation timed out" when virtual machines are shut down at the recovery site during a recovery.
    Workaround: Don't mix virtual machine power states in a recovery plan.
  • Recovery Plan Wizard Page is Missing the Datacenter Column
    When creating a recovery plan using the Recovery Plan Wizard, the Datacenter column is missing. This happens the first time the Recovery Plan Wizard is opened. The database information reappears when it is populated with datacenter names, or if you open wizard again or if you click Back and Next to re-display the network page.
  • Recovery Plan Wizard Information is Reset if Clicking the Back Button
    The Test Network option defaults to "Auto" if you chose a network option, click Next, and then click Back to the Networks page.

Test Recovery

  • Virtual Machines Report the Error: "vim.fault.GenericVmConfigFault"
    During test recovery, a virtual machine fails to power on and reports the error "vim.fault.GenericVmConfigFault." This error occurs when snapshot is taken on the virtual machine with delta disks in a different directory than the base disk.
  • Running a Recovery Test and Verifying HA
    Running recovery tests, SRM is unable to a create test-Bubble vSwitch for hosts in HA clusters. To run test recovery and verify HA, use the "Test Networks" feature to connect the network used for HA. This known issue does not affect the recovery test. During a real recovery, SRM will map all networks.
  • A Stop Button Appears After Starting a Recovery Plan Test
    Occasionally, after you start recovery test for the first time, a "Stop" button appears with the message: "Stop Recovery. Are you sure you want to stop this recovery plan? This process may take several minutes."
    Workaround: Click "No." The test proceeds and completes successfully.
  • Recovery Plan Test Status is "Running Test" After the Test is Canceled
    Canceling a recovery plan test from the task list cancels the recovery plan test, but the VI Client displays the status as "Running Test" under Recovery Plans.
  • Virtual Machine Power State is Not Preserved if Converting From a Template
    If you test a recovery plan containing a template that is placed in the "Recover No Power On Virtual Machines" step, and you convert the template to a virtual machine and turn it on, the template does not become unregistered in the recovery plan. To unregister the virtual machine, you must manually power-off, unregister, and re-register the virtual machine.
  • Prepare Storage Step Remains at Zero Percent for a Long Period of Time While the Test Recovery Plan Runs
    When running a recovery test containing several protection groups, the Prepare Storage step could remain at zero percent (0%) for a long period of time even though progress is being made in the background.
    Workaround: None. Let the plan run and wait for the recovery plan to finish.
  • If a Linux Virtual Machine is Set to Use DHCP, its Networking May Not Work Correctly During Tests
    Consider the following scenario: The test network, that virtual machines are placed into, does not contain a DHCP server. When Linux virtual machines are set for DHCP networking boot up and cannot find a DHCP server, they can take a very long time to timeout and may not boot properly. Windows virtual machines do not work this way. They are able to formulate a DHCP address when they cannot find a DHCP server.
    Workaround: Create your own test network that contains a DHCP server and configure the shadow virtual machine to use this network as the test network. This will allow the Linux virtual machine to find the DHCP server and work correctly.

Miscellaneous Issues

  • SRM is Incompatible with DRS Clusters That Mix ESX 3.5 and ESX 3.0.x Hosts
    SRM does not support using ESX 3.5 and ESX 3.0.x versions of ESX Server in DRS clusters. Virtual machines fail and report errors during customization and resource pool configuration fails.
    Workaround: Create DRS clusters using ESX hosts of the same version.
  • SRM Alarms Appear in the VI Client After SRM is Uninstalled
    SRM Alarm Status (if any) is kept after SRM is uninstalled. If the VirtualCenter Server is not reinstalled and you install SRM again, the previous SRM Alarm Status is applied.
  • The srm-config Tool Exits and Reports the Error: "Error [2]: OSERROR [0x80090016] Failed to open crypto key container for certificate"
    During the SRM certificate replacement process, a Windows API can fail with the error message: "Failed to open crypto key container for certificate." This is caused by one of the following:
    • A missing operating system internal file in the following folder:
      C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys folder
    • Incorrect permissions of one of the following folders:
      C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto
      C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\S-1-5-18
      C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys

    Workaround: Run the command again or fix the permissions.

Configuration Notes

  • Changing the ESX Hosts Advanced Option to Improve Rescan Time
    You can improve rescan times by setting the Advanced Option Scsi.RescanAllHbas on ESX hosts. If rescan times on ESX hosts are longer than 10 minutes, you may want to set the following Host Advanced Option: ’Scsi.RescanAllHbas’ to ’1.’
  • Changing the Command Line Timeout on ESX Hosts
    You can change the command line timeout default using the <calloutCommandLineTimeout></calloutCommandLineTimeout> parameter in the vmware-dr.xml file. The value has a unit of seconds and the default value is set to 300 seconds. By default, SRM kills callout scripts that take more than 300 seconds to complete. You may want to increase the timeout value if using custom callout scripts that take more than five minutes to complete.
  • RDM Descriptors Must be Placed on a Replicated Datastore
    In order to protect virtual machines that use raw disk mapping (RDM) devices, ensure that the RDM descriptor files reside on replicated datastores. Placing the RDM descriptor files in the same datastore as the .vmx file that refers to them is highly recommended.
  • Custom Scripts in Recovery Plans are Executed as the Local Administrator
    SRM callouts to custom scripts run as the local administrator and not as the user logged into the VI Client.