High Availability

Enterprise-Grade High-Availability with Low Complexity and Low Cost

SoftNAS™ SNAP HA™ High Availability delivers a low-cost, low-complexity solution for high-availability clustering that is easy to deploy and manage. A robust set of HA capabilities protect against data center, availability zone, server, network and storage subsystem failures to keep business running without downtime. 

SNAP HA™ monitors all critical storage components, ensuring they remain operational and when there is an unrecoverable failure in a system component, another storage controller detects the problem and automatically takes over, ensuring no downtime or business impacts occur.

SNAP HA™ works hand in hand with SoftNAS® data protection features, including RAID, and automatic error detection and recovery, and as a result, reduces operational costs and boosting storage efficiency.

High Availability protects companies from lost revenue when access to their data resources and critical business applications would otherwise be disrupted with features that:

  • Protect against unplanned storage outages 24 x 7 x 365
  • Provide disaster recovery capabilities to quickly resume mission-critical operations in the event of a disaster (e.g., across availability zones or data centers)
  • Ensure failed components are quickly and automatically identified and isolated, so they do not cause data loss, application errors or downtime
  • Replicate data so there is an up-to-the-minute copy of any changes that have taken place
  • Prevent outdated or incorrect data from be made available due to multiple failures across the storage environment
  • Assure business owners that applications and IT infrastructure continue operating uninterrupted by unexpected failures in the storage environment
  • Enable IT administrators to take storage systems offline for maintenance and repair, without disrupting production IT systems or applications
SoftNAS SnapReplicate™ and SNAP HA™ are not in themselves a replacement for an enterprise ready backup solution. SoftNAS strongly recommends that important enterprise data is consistently and separately backed up on a regular basis, in addition to the replication provided by our high availability solution.

High-Integrity Data Protection

SNAP HA™ offers two methods of data protection, based on the storage type selected at creation, Standard or Shared Pools:

SNAP HA™

Several measures have been taken to ensure the highest possible data integrity of your highly available block storage system. An independent "witness" HA controller function ensures there is never a condition that can result in what is known as "split-brain", where a controller with outdated data is accidentally brought online. SNAP HA™ prevents split-brain using a number of industry-standard best practices, including use of a 3rd party witness HA control function that tracks which node contains the latest data. On AWS, shared data stored in highly-redundant S3 storage is used. On Azure, blob storage is leveraged. On VMware, a separate HA Controller VM is used. 

Another HA feature is "fencing". In the event of a node failure or takeover, the downed controller is shut down and fenced off, preventing it from participating in the cluster until any potential issues can be analyzed and corrected, at which point the controller can be admitted back into the cluster.

Finally, data synchronization integrity checks prevent accidental failover or manual takeover by a controller which contains data which is out of date.

The combination of high-integrity features built into SNAP HA™ ensures data is always protected and safe, even in the face of unexpected types of failures or user error. 

Even with these strong measures in place, limited data loss (approximately 5 seconds worth) can occur at the moment of failure if default settings for SoftNAS' implementation are used. This risk is present to a varying degree in any high availability solution relying on the real-time transfer of active data between two nodes. SoftNAS' default settings are in place to provide a balance between performance and data integrity concerns. Measures can be taken when creating pools and volumes for high availability to limit or eliminate this potential loss. Sync mode settings can be used to further enforce data integrity, but with a hit to performance. SoftNAS strongly recommends the creation of a write log, or ZIL to cache high bursts of write activity, and further protect data integrity, as well as boosting performance. 

Minimize Downtime from Host and Storage Failures

SoftNAS™ SNAP HA™ High Availability delivers the availability required by mission-critical applications running in virtual machines and cloud computing environments, independent of the operating system and application running on it. HA provides uniform, cost-effective failover protection against hardware and operating system outages within virtualized IT and cloud computing environments. SNAP HA™:

  • Monitors SoftNAS® storage servers to detect hardware and storage system failures
  • Automatically detects network and storage outages and re-routes NAS services to keep NFS and Windows servers and clients operational
  • Restarts SoftNAS® storage services on other hosts in the cluster without manual intervention when a storage outage is detected
  • Reduces application and IT infrastructure downtime by quickly switching NAS clients over to another storage server when an outage is detected
  • Maintains a fully-replicated copy of live production data for disaster recovery for block storage
  • Is quick and easy to install by any IT administrator, with just a few mouse clicks using the automatic setup wizard

Extend and Enhance Data Protection Across Enterprise Infrastructure

Most availability solutions are tied to specialized hardware or require complex setup and configuration. In contrast, an IT administrator configures SoftNAS SNAP HA™ with a few clicks from within the SoftNAS StorageCenter™ client interface. With simple configuration and minimal resource requirements, SNAP HA™ allows administrators to:

  • Provide uniform, automated data protection and availability for all applications without modifications to the application or guest operating system
  • Establish a consistent first line of data protection defense for an entire IT infrastructure
  • Protect data and applications that have no other failover options, which might otherwise be left unprotected and subject to extended outages and downtime
  • In Amazon Web Services cloud environment, provides storage HA across AWS availability zones
  • Compatible with SoftNAS® advanced NFS file servers, Windows CIFS file servers, and iSCSI SAN servers

Highly-Available NAS Services

SoftNAS SNAP HA™ provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. Block replication transmits only the changed data blocks from the source (primary) controller node to the target (secondary) controller. Data is maintained in a consistent state on both controllers using the ZFS copy-on-write filesystem, which ensures data integrity is maintained. In effect, this provides a near real-time backup of all production data (kept current within 1 to 2 minutes).

Storage Monitoring

A key component of SNAP HA™ is the HA Monitor. The HA Monitor runs on both nodes that are participating in SNAP HA™. On the secondary node, HA Monitor checks network connectivity, as well as the primary controller's health and its ability to continue serving storage. Faults in network connectivity or storage services are detected within 10 seconds or less, and an automatic failover occurs, enabling the secondary controller to pick up and continue serving NAS storage requests, preventing any downtime.

Storage Failover

Once the failover process is triggered, either due to the HA Monitor (automatic failover) or as a result of a manual takeover action initiated by the admin user, NAS client requests for NFS, CIFS and iSCSI storage are quickly re-routed over the network to the secondary controller, which takes over as the new primary storage controller. Takeover on VMware typically occurs within 20 seconds or less. On AWS, it can take up to 30 seconds, due to the time required for network routing configuration changes to take place.

Scales to Hundreds of Millions of Files

SNAP HA™ has been validated in real-world enterprise customer environments and is proven to handle hundreds of millions of files efficiently and effectively. The use of block replication instead of file replication supports hundreds of millions of files and directories

All Platforms Information

Operation in AWS Virtual Private Cloud

In AWS, SNAP HA™ is applied to SoftNAS storage controllers running in a Virtual Private Cloud (VPC). It is recommended to place each controller into a separate AWS Availability Zone (AZ), which provides the highest degree of underlying hardware infrastructure redundancy and availability.

High Availability on Amazon Web Services (AWS)

Operation in Azure VNet

On Azure, SNAP HA is applied to SoftNAS storage controllers running in Azure VNET. It is recommended to place each controller in a separate availability set at the least, or better yet, a separate availability zone, which provides the highest degree of underlying hardware infrastructure redundancy and availability.

High Availability on Microsoft Azure

Operation in VMware Private Clouds

On VMware, it is common to dedicate a non-routable VLAN to storage traffic. The storage VLAN segregates primary storage traffic (e.g., VMDKs attached to VMs over NFS or iSCSI) from other traffic. Data replication traffic can also be placed on its own separate non-routed VLAN. SoftNAS StorageCenter™ is typically placed on a routable VLAN (the default network), where it can be readily accessed by admins from a web browser from anywhere within the organization (or via a VPN).

A Virtual IP (VIP) address is employed to route NAS client traffic to the primary storage controller. In the event of a failover or takeover, the VIP is reassigned to the other controller, which immediately re-routes NAS client traffic to the proper controller.

High Availability on VMware



Also see the following content in General Navigation: