S3 Cloud Disk Best Practices
- Without proper configuration, a SoftNAS instance leveraging S3-compatible cloud disk extenders can perform poorly.
- To get the best performance possible for a SoftNAS deployment with S3-compatible cloud disks, keep the following information in mind.
General Guidelines
- A Storage Pool should have a one-to-one correspondence with an S3 Cloud Disk in a JBOD Storage Pool. It is not recommended to use RAID-0, RAID-1, or RAID-Z with S3 Cloud Disks. Capacity expansion can be utilized to add additional capacity to a JBOD Storage Pool by adding another S3 Cloud Disk.
- S3 Cloud Disk object storage should be located "near" the instance that is using the object storage. This means in the same region on public clouds and with minimal latency on private clouds.
- Do not use a "block cache file" on SoftNAS versions that have support for that feature.
- Block disks such as EBS or VMDK can be used to provide Read Cache and Write Log for Storage Pools which are backed by S3 Cloud Disks when different performance requirements exist.
- A Storage Pool backed by S3 Cloud Disks should be configured for "Sync Mode" of "Standard"
Sizing
Sizing a solution involving use of Cloud Disk Extenders is very much the same as for a solution making use of a block-based implementation (VMDK or EBS). There is no change to storage space requirements.
However, additional system resources may be required in order to handle the virtualization of the S3-compatible storage required in order to present the S3 Cloud Disk as block storage.
CPU
If using cloud disk extenders in your instance/s, it is important to configure your instance with additional processing power (CPU), above and beyond what is required for traditional block-based storage access. Presenting S3 storage as block-based storage requires several additional functions to be executed, including, for example, SSL/TLS key exchange and encryption, MD5 block computations, network stack processing, as well as optional encryption options.
To avoid performance issues:
- Do not use cloud disk extender on single vCPU instances.
- 4 vCPU instances may be suitable for test scenarios. 4 vCPU instances may still prove insufficient if your S3-compatible test/POC environment requires decent performance metrics.
- For a production environment, a minimum of 4 vCPU instances is highly recommended. Many workloads will perform better with additional vCPU.
- For each 75 MB/s of throughput required to perform the same task with block-based storage, an additional two vCPU is highly recommended.
- CPU utilization should be monitored during proof-of-concept and initial production stages to verify that sufficient CPU has been provisioned for the provided workload.
- Monit email alerts should be monitored and indications of high CPU utilization should be reviewed with respect to the Cloud Disk Extender configuration.
- If operating in a trusted environment, and available as an option for the S3-compatible object storage being used, CPU usage can be reduced by using http rather than https.
- CPU usage can be further reduced by disabling optional encryption options.
EXAMPLE
A customer wants to use S3 object storage to save money over EBS.
The current workload operates between 100-150MB/s of throughput and is running on an m4.xlarge instance.
Evaluating the current workload, we know that it averages a healthy 50% CPU usage. To provide the same 150MB/s of S3 throughput, the general guideline requests 4 additional vCPU over and above the current instance's existing 4 vCPU base.
As a result, the CPU recommendation points to an m4.2xlarge instance, in order to provide four additional vCPU.
RAM
As mentioned previously in this document, each instance of the cloud disk extender represents a process that is running inside of the SoftNAS instance for virtualizing the object storage as block storage.
- Cloud Disk Extender should not be used in production on systems with less than 16 GB of RAM.
- Memory footprints less than 8GB of RAM may be suitable for test or PoC environment only.
- A general guideline of 512MB of RAM should be provisioned above the normal required memory for a given workload.
- Remember that half of the RAM is utilized for file-system caching. Additional resources are needed for the network file services and the base operating environment (~2GB of RAM).
Network
Cloud Disk Extender utilizes the network interface of an instance in order to access the object storage. Sufficient network bandwidth must be provisioned in order to reach maximum performance profiles using Cloud Disk Extender.
When considering the desired available throughput to the object store also consider the amount of network throughput for network file services (NFS, CIFS, iSCSI, AFP) and SnapReplicate™/SNAP HATM which, in most configurations and platforms, all come from the same pool of available network bandwidth.
- A somewhat safe calculation can be to determine the available network throughput being used for the instance, and to divide it divided by 3, in order to calculate 1/3 for file services, 1/3 for replication, and 1/3 for object storage I/O.
- When calculating, consider that SnapReplicate™ only replicates the write bandwidth, not the read bandwidth.
- Be sure to convert properly between bits and bytes when comparing network throughput (usually expressed in bits) to disk throughput (usually expressed in bytes)
- There is inherent overhead in the protocols used on the network (request/response, headers, checksums, control data, etc.) such that full network saturation does not yield the full bandwidth as useful throughput. Consider only anticipating 90% of the link-speed as usable throughput.
- Most clouds (and most data centers) do not provide full link-speed bandwidth on a sustained basis as systems are utilizing shared resources. Systems designed to run at full provisioned capacity (of any metric) should be assigned to dedicated hosts rather than shared tenancy.
EXAMPLE
A customer uses NFS, SnapReplicate™ and SNAP HATM, and would like to use object storage. Expected throughput is about 40MB/s with 90% reads. According to calculation, the network throughput for the source node reads as follows:
- 4MB/s writes to NFS (incoming)
- 36MB/s reads to NFS (outgoing)
- 4MB/s writes to SnapReplicate (outgoing)
- 4MB/s writes to Object Storage (outgoing)
- 36MB/s reads to Object Storage (incoming)
Total: 40MB/s incoming 44MB/s outgoing:
Calculating the total throughput in bytes, this is 320mbps incoming and 352mbps outgoing. According to calculation, the network throughput for the target node reads as follows:
- 4MB/s writes from SnapReplicate (incoming)
- 4MB/s writes to Object Storage (outgoing)
Total: 4MB/S incoming and 4MB/S outgoing:
In bytes, this works out to 32mbps incoming 32mbps outgoing. A 100 mbps network connection is certainly not sufficient for this configuration, however, a 1gbps connection should be enough, even considering protocol overhead and avoiding 100% saturation of the network.
Amazon AWS S3 Recommendation: VPC Endpoints
- Only use S3 Cloud Disks in the same region as the EC2 Instance.
- Always utilize VPC Endpoints to directly access S3 storage without contention through the public Internet and ensure the relevant routing tables are utilizing the VPC Endpoint.
Additional Information
SoftNAS 3.4.6 includes improvements to the implementation of S3 Cloud Disk Extender. "Block Cache File" is no longer supported as a cache mechanism. After upgrading to 3.4.6 (or later) and rebooting, it is possible and recommended to delete s3cachepool pools that were only used for the block cache file storage. The block devices used for these pools can be reassigned as read cache, write log, or de-provisioned.
Additionally, after upgrading to 3.4.6 (or later), it is necessary to reboot the instance in order for all of the improvements to be installed. S3 Cloud Disks will continue to function, but until the system is rebooted not all of the improvements will have been applied.