Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

What is Dual Controller HA?

Dual Controller HA™ is an extension to our existing SoftNAS Cloud® high availability solution, SNAP HA™. It is designed to provide high availability for a shared pool of object storage only

Adding a device to a dedicated storage pool results in the pool being replicated in the usual way, via SyncImage and asynchronous SnapReplicate ZFS send/receive once per minute, ensuring a copy of the pool’s data is maintained on the target node.  HA failover operates as always, with dedicated storage devices and pools on each node having their own distinct, non-shared data that requires replication for use in HA (original design of SNAP HA).  SoftNAS SNAP HA™ provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. As only one controller is active at a time, this can be considered single-controller HA. 

Dual Controller HA™ on the other hand, only applies if a shared pool of object storage, such as AWS S3, or Azure Hot or Cool blob storage, is specified at storage pool creation. After adding object storage 'disks' via Disk Devices, and selecting Create in Storage Pools, the following dialog will appear. If Shared Storage is selected, Dual Controller HA™ will automatically be applied to the shared pool after SNAP HA™ is configured.

Image Added

Shared pools operate very differently from dedicated pools from an HA perspective.  First, underlying storage devices are shared across nodes. Such shared devices (e.g., S3 cloud disks, Azure Hot and Cool Blob storage) include their own data redundancy, and are typically accessed over a network connection, enabling it to be shared across two or more nodes (only two nodes are currently supported).

A second major difference is the take-over process for shared pools. Volume configuration files are replicated between both the primary and secondary controller (hence Dual Controller). Failover is initiated at the point the primary controller fails to reply to an IO request within the expected time frame. 

During a take-over event, first the devices associated with a shared pool must be mounted by the target node (and sometimes disconnected or unmounted from the original node, if required by the device type).  Next, the shared pool is imported using the ZFS import command (and verified the pool was imported successfully and is not degraded or faulted). The appropriate level of both debug/trace and info/error logging is provided in existing HA log files, to ensure it’s possible to troubleshoot and provide support in the field if errors or issues arise.

 With this method of failover:

  • Very little data needs to be transferred for fail-over to occur.
  • There is no need to create duplicate pools of already resilient object storage.
  • No potential loss of transactional data occurs due to standard SNAP HA asynchronous replication delays.

To determine if Dual Controller HA is right for your deployment, see /wiki/spaces/SD/pages/92995970.

No change to Dedicated Pools

As stated above, Dual Controller HA does not change the way SNAP HA is configured, nor does it change how it operates for dedicated pools. SoftNAS has worked very hard to ensure that this feature is a seamless addition, with little to no change to existing functionality, or configuration.

Regardless of whether it is a shared pool or dedicated, the customer must first define a SnapReplicate™ relationship between the primary and secondary node, then add the SNAP HA relationship. In other words, there is no change to the SnapReplicate/SNAP HA process shown below.

Adding a device to a shared storage pool results in the pool being excluded (skipped) by SnapReplicate; i.e., the data on the underlying device is already shared across nodes, so there is no need to replicate shared storage pools.  This involves a change in SnapReplicate’s “pool discovery” logic, forcing it to first read the sharedpools.xml file to get the list of shared pool names, then exclude those pools from the list of pools to be replicated (similar to how pool names not found on the target node get excluded). 

This allows SnapReplicate and SNAP HA to function across both types of pools, and to differentiate between them. Existing SNAP HA customer installations continue to operate uninterrupted, and new SoftNAS instances can be paired with both Dual Controller HA shared storage pools and dedicated pools asynchronously replicating via "standard" SNAP HA simultaneously. This also ensures that regardless of which type of pool selected, the customer can confidently set up SNAP HA with the same documentation. 

Configuring SnapReplicate™

Having prepared the environment on both SoftNAS Cloud AWS instances, we can now set up high availability. The first step towards high availability in SoftNAS is to establish replication. SnapReplicate™ makes this as simple as completing a quick wizard.

...

  1. Log into the source controller's (the first instance within which you created the CIFS enabled volume) SoftNAS StorageCenter administrator interface using a web browser.
  2. In the Left Navigation Pane, select the SnapReplicate™/SNAP HA™ option.

    The SnapReplicate/SNAP HA page will be displayed.

  3. Click the Add Replication button in the Replication Control Panel.



    The Add Replication wizard will be displayed. Read the instructions on the screen and then click the Next button.

  4. In the next step, enter the IP address or DNS name of the remote, target SoftNAS Cloud® controller node in the Hostname or IP Address text entry box. Note that by specifying the replication target's IP address, you are specifying the network path the SnapReplicate™ traffic will take. 



    The source node must be able to connect via HTTPS to the target node (similar to how the browser user logs into StorageCenter using HTTPS). HTTPS is used to create the initial SnapReplicate configuration. Next, several SSH sessions are established to ensure two-way communications between the nodes is possible. When connecting two Amazon EC2 nodes, it is best to use the internal instance IP addresses, as traffic is routed internally by default between instances in EC2. 

    Note: If you have not yet done so, the Security Group on each instance should be configured with the internal IP addresses of the paired instance (the source instance should recognize traffic from the target instance, and the target instance should recognize traffic from the source) to ensure both HTTPS and SSH traffic between instances is recognized. See Configuring Security Groups to learn more.

  5. Next, provide the username (softnas) and the password (if default, this is the instance id) of the target instance. Type the password again to verify, then click Next.


    The IP address/DNS name and login credentials of the target node will be verified. If there is a problem, an error message will be displayed. Click the Previous button to make the necessary corrections and then click the Next button to continue. 

  6. Read the final instructions and then click the Finish button.


    The SnapReplicate relationship between the two SoftNAS Cloud® controller nodes will be established. The corresponding SyncImage of the SnapReplicate will be displayed.




    After data from the volumes on the source node is mirrored to the target, once per minute SnapReplicate transfers keep the target node hot with data block changes from the source volumes.

    The tasks and an event log will be displayed in the Replication Control Panel section. This indicates that a SnapReplicate relationship is established and that replication should be taking place.


Configuring SNAP HA™

SnapReplicate™ establishes a replication relationship, one that can be manually triggered or scheduled, but is not automated. For true high availability in a failover situation, SNAP HA™ must be configured as well.

...