Overview

Once the SoftNAS® SnapReplicate™ and SNAP HA™ has been configured, the day to day operations are automated. Automatic Failover is one of the features included with SNAP HA. Once SNAP HA is set up, no additional configuration is required to make Automatic Failover work. SNAP HA Automatic Failover works via the use of the SoftNAS health monitor. When the health monitor detects a failure or is unable to reach the SoftNAS node, it will automatically failover to the other node and move all NAS services over to the other side.

However, there are occasions when you may want to perform actions that require administrative intervention to occur.

This document will show how to perform each of this these actions using the SoftNAS StorageCenter™ Administrative Interface.

Anchor
HA Software Update Process
HA Software Update Process
Setting Up Manual Takeover and Giveback

When a takeover is initiated, the SNAP HA™ Controller will ensure that data is not being written to a node in the process of a switch over. This will avoid the split brain condition.

The HA controller will authorize the switch over, reassign the IPs, and change the primary/secondary designation for the SoftNAS® instances. Also, as

As part of the takeover the problematic instance is also shutdown.

Note
In the case of any failure, it is very important to ensure that all data has been fully synced before performing the takeover or giveback actions. Note in the example below that Ensure all synchronization activities have been completed before performing the operation

Warning

NOTE: In the example to the right, the current state is listed as

"

DELTASYNC-UNDERWAY

"

. Under

no

NO circumstances should you perform a

takeover

Takeover or

giveback

Giveback operation until Deltasync appears as completed.

Image RemovedImage Added

Takeover

Note
Ensure all synchronization activities have been completed before performing the operation.

From the SoftNAS StorageCenter™ interface of the good node, navigate to the SnapReplicate™ panel and select Action > Takeover.
Click on Actions>Takeover. Confirm at the prompt.

Image Added

Click the Yes button on the Confirm Action prompt.
The takeover process beginswill begin. This process will shut down the source node and allow the target to take over as primary. After After the process has completed successfully, the good node will display as the HA Primary.

Info
After the problematic node has been fixed, bring the node back up.

Image RemovedImage AddedImage Removed

Giveback

After rebooting the node shut down by the takeover process, perform

Note
Ensure all synchronization activities have been completed before performing the operation.

Perform a Giveback from the secondary instance to allow the SNAP HA™ controller to safely and securely perform the switch over to protect data integrity. Ensure all synchronization activities have been completed before performing the operation.

From the SoftNAS StorageCenter™ SnapReplicate™ screen, click on Giveback.interface of the good node, navigate to the SnapReplicate™ panel and select Action > Giveback.

Image Added

Confirm the action by clicking Yes.

Image RemovedImage Added

Anchor
Recovering from a HA Failure
Recovering from a HA Failure
Recovering from a High Availability Failure

In order to properly recover from a node failure without risking data loss, internal internal processes must be allowed to complete, and tasks must be performed in a particular order. In this article, we will simulate a failover in order to cover the necessary steps to recover from HA failure, as well as the cues SoftNAS provides you to ensure that required processes are complete prior to moving to performing the next task.

First, log into both of your HA nodes (in separate browser tabs) via the IP addresses you provided to them. For the purposes of this article, we will call the source node SoftNAS01, and the target node SoftNAS02.

Simulating the Failure

From the source node (SoftNAS01), open SoftNAS' SnapReplicate/SNAP HA menu from the Storage Administration pane to check on system progress.

Image RemovedImage Added

In the SnapReplicate/SNAP HA pane, a healthy configuration would look like this. This means that replication is ongoing and active between the nodes, ensuring that in event of failure, the secondary node is fully ready for active duty.

Image RemovedImage Added

From the secondary node, you would notice the same status,

save

but it would say HA Secondary under the center HA symbol.

Image RemovedImage Added

In order to simulate a failure, we will initiate a takeover from the target node, SoftNAS02.

From the Action menu, select Takeover.

Image RemovedImage Added

Click Yes on the

warning

Confirm Action prompt, which ensures you wish to proceed.

Image RemovedImage Added

Simulated Node failure will begin immediately. You will notice immediately that HA will be deactivated, and that the target node (SoftNAS02) is now listed as the primary node.

Image RemovedImage Added

Verifying Failover is complete

In the event of an actual failure, this is also what you would see. Likewise, from SoftNAS01, the former primary node, you would note that it is listed as secondary.

Image RemovedImage Added

External servers and applications continue to have access to the data residing in SoftNAS, but now retrieve this data from SoftNAS02. Now let's dig a little deeper and look at the Replication Control Panel.
Here is where you will see the status of the takeover/replication process. In the event of an actual failure, this will provide the statuses you will use to determine that the takeover process is complete. You will first notice the DeltaSync process. This process tracks changes occurring to the data from the primary node while HA is in a degraded state.

Image RemovedImage Added

Once the former primary node (

SofTNAS01

SoftNAS01)

ha

has been fenced off, you will see the status COMMFAIL appear between the nodes. This is because SoftNAS01 is stopped, and no longer accessible.
Info
If this were an actual failure, this status would require human intervention to resolve. In that case, the primary node would need to be rebooted.
To reboot your node, you must go to the host platform in question, find the instance or virtual machine in question, and restart the node.

Image RemovedImage Added

For AWS:

Open the AWS EC2 dashboard

, select the

.
Right-Click on the instance in question (SoftNAS01 in this case) and

select Action, Instance State, and Start

select Start instance.

Image RemovedImage Added

For Azure:

In the Azure portal, select All resources and search for the virtual machine by name.
Double click the virtual machine to open it.

Select Start to reboot the virtual machine.

Image RemovedImage Added

For VMware:

In VMware, find the virtual machine by name, and power it up.

Image RemovedImage Added

Because the data is still accessible and changing on the second node, this means the data on SoftNAS01 is outdated. It will need to be resynced with the content of SoftNAS02 before high availability can be re-established.
Once SoftNAS01 in rebooted, we can once again log in from the original primary node to ensure it is up and running.
After ensuring both nodes are running and ready, return to SoftNAS02 and the SnapReplicate/SNAP HA tab. It still shows the COMMFAIL status.
From Actions, select Activate to re-establish the link between the nodes, and high availability.

Image RemovedImage Added

If activation is successful, a prompt will appear stating Activate completed successfully. Click OK.

Image RemovedImage Added

The first thing that occurs is a DeltaSync operation to restore all data changes that occurred during the high availability outage on the surviving node (SoftNAS02).

Image RemovedImage Added

Note
High Availability is re-established, but SoftNAS02 is now the primary node.

Image RemovedImage Added

You can see the data changes between the systems by investigating Volumes and LUNs on each node.
Here we see that SoftNAS02, now serving as the primary node, shows a total used space of 143 GB.

Image RemovedImage Added

SoftNAS01 shows 146 GB of data under Total Used Space.

Note
This data discrepancy must be resolved before any giveback operation is performed.

Image RemovedImage Added

Even though the option is available to perform a giveback operation on SoftNAS02, or a takeover operation from SoftNAS01 to

reestablish

re-establish the original HA configuration, do not perform either action while DeltaSync is underway or you risk data loss.

Warning
All data configuration changes that occurred while HA was in a degraded state will be lost.

Note
In the near future, SoftNAS will make Takeover and Giveback actions unavailable until all data synchronization is complete.

Image RemovedImage Added

On SoftNAS01, no status will be displayed. This is by design, as all operations should be performed on the current primary node.
On SoftNAS02, return to Volumes and LUNs from the Storage Administration pane.
In Volumes and LUNs on SoftNAS02 you will notice a second volume has appeared in the list, labelled EBSvol_DELTACLONE. This volume is created to manage the data changes between volumes on each node.

Image RemovedImage Added

On SoftNAS01, no status will be displayed. This is by design, as all operations should be performed on the current primary node.
On SoftNAS02, return to Volumes and LUNs from the Storage Administration pane.
In Volumes and LUNs on SoftNAS02 you will notice a second volume has appeared in the list, labelled EBSvol_DELTACLONE. This volume is created to manage the data changes between volumes on each node.

Image RemovedImage Added

Continue to refresh until you see a status of DELTASYNC-COMPLETE.

Image RemovedImage Added

With DeltaSync complete, return to SoftNAS02 SnapReplicate™/SNAP HA, and refresh once more. A SnapReplicate operation is now underway. This too needs to complete fully before a giveback operation is completed.

Image RemovedImage Added

This shows that the failover is fully completed, and that HA is now in a fully healthy state, with SoftNAS02 as the primary node. Operations can resume as normal in this configuration. However, should you wish to reestablish re-establish the original configuration with SoftNAS01 as the primary, you can now safely perform a giveback operation.

Performing the Giveback Operation

Once all synchronization operations have been completed, as verified above, you can perform a giveback operation to make SoftNAS01 once again the primary node.
In the Actions menu on the current primary node (SoftNAS02) select Giveback.

Image Removed

Image Added

Confirm the action by clicking Yes.

Image Added

Once again, HA will be deactivated, and this time, SoftNAS02 will have to be rebooted in the same manner as before.

Image RemovedImage Added