Scale up or scale out? What’s up with separate, clustered and federated storage

Computer systems tend to grow after they are first installed. They need more capacity and / or more performance. They expand by scaling up or scaling out.

Scale-up versus scale-out

Scaling up adds more capacity or resource within the single system. In storage that generally means adding more storage drives, either to slots in the base enclosure or by adding expansion chassis to the base system.

For example, Western Digital’s IntelliFlash N5100 starts with a base enclosure with 24 drive slots and 46TB capacity. It can scale up by adding up to six expansion trays with a total of 283TB of extra capacity. Beyond that it can scale no more and that may be a limitation.

If a customer knows they have to scale beyond a single system then they must obtain a scale-out system – in other words, multiple systems grouped together.

Scale-out

A group of storage arrays can be made to co-operate or co-ordinate to varying degrees through software and hardware.

They can be coupled tightly or in a looser way. Clustered storage is a tighter form of coupling than federation and often involves specialised hardware to connect the separate arrays, or nodes. The result functions pretty much as a single system.

Generally we can say a cluster is a tightly-coupled set of identical storage arrays that has a dedicated high-speed, low latency interconnect, share IO, have a single failure domain, and an operating system that is cluster-aware.

A federation is a more loosely coupled grouping, with no dedicated interconnect, no shared IO and does not necessarily need identical nodes while still functioning as a single storage resource. The difference is that nodes in a federation are less co-ordinated than nodes in a cluster.

To add confusion, the IT world can refer to tightly-coupled and loosely-coupled clusters.

IBM defines a loosely-coupled cluster as consisting of computers that run with a minimum of communication and cooperation. This results in efficient usage of each individual computer but limits the amount of coordination and sharing of workloads.

tightly-coupled cluster consists of a group of computers that cooperate to a great degree, coordinating and sharing workload, and communicating status details on a continuous basis.

For the purposes of this article we will say a cluster is a tightly-coupled system while a federation is a loosely-coupled system.

Data is divided between nodes in a cluster, providing resilience if a node fails, and increasing the overall IO rate. In a federation, data entities – files for example – are stored in their entirety on a node. It can be mirrored or replicated to other nodes by software

Cluster and federation differences

Clustered computers co-operate to a great degree, co-ordinating and sharing the workload, and communicating node and data status details, across the interconnect, on a continuous basis.

The cluster interconnect transmits messages and data between the cluster nodes. The data can be stored as customer data or cluster node metadata; who is writing to which drives; which files are locked, and so forth. It is separate from the data access paths used by application servers accessing the cluster nodes.

An immediate observation is that the limit of a cluster’s expansion is provided by the capabilities of the cluster interconnect in terms of speed, bandwidth and ports. Once it is fully utilised there is no point in adding another node as the interconnect cannot cope with the additional communication needs.

A federation is not limited in its node count in this way.

Tightly-coupled versus loosely-coupled, federated systems

Blocks & Files asked various experts about the differences between clustering and federation.

Aaron Plaza, a director at VAST Data, says: “I think of a cluster as a tightly-coupled system where all resources are shared with each other and receive the benefits of a single name space and data services that span the full capacity, i.e. no silos.

He said: “Federated systems are more isolated/disparate systems loosely coupled…that have limited shared resource ability and leverage software to ‘stitch’ the silos together.”

John Martin, director of strategy and technology for APAC at NetApp, says: “For me the purpose of a cluster is to create a single perf/capacity container that minimises or eliminates “bin packing”, Federation is more of a single namespace for security and replication with multiple capacity/performance containers.”

The general bin-packing problem in computing refers to packing things with different sizes into a number of bins in a way to minimise the number of bins.

John Nicolson, a senior technical marketing architect at VMware, says: “Cluster is a shared IO or persistent failure domain, federation is a shared management domain. Federation also may offer portability or migrations data services between clusters. Example: a given LUN or volume will be stored in a single vSAN cluster. Multiple clusters can be managed from federated vCenter Servers in a common SSO domain and Volumes can non-disruptively be vMotion’d between clusters.”

He says file storage arrays can be clustered or federated just like block-access arrays (SANs); “For a traditional filer the difference would be a common namespace (DFS etc) vs a given file system and collection of blocks.  The metaphors for federation vs cluster extend up and down the storage stack from block to file to VM.”

For example, Qumulo filers can be clustered with a minimum of 4 nodes. Also Panasas high-performance computing filers form a distributed and clustered parallel file system. Dell EMC’s Isilon filers can also be clustered.

According to Oracle

  • A federated database is a logical unification of distinct databases running on independent servers, sharing no resources (including disk), and connected by a LAN. Data is horizontally partitioned across each participating server. For both the DBA as well as the Application Developer, there is a clear distinction between “local” and “remote” data.
  • A cluster consists of servers (usually SMPs), a cluster interconnect and a shared disk subsystem. Shared disk database architectures run on hardware clusters that give every participating server equal access to all disks – however, servers do not share memory. A database instance runs on every node of the cluster. Transactions running on any instance can read or update any part of the database – there is no notion of data ownership by a node. 

Rob Peglar, president at Advanced Computation and Storage, says: “A key concept…is that a cluster is designed to be self-contained, self-controlled. No third party needed (or wanted, in most cases). “Federations” depend entirely on a third party – aka distributed management, not self-aware. This is effectively why “federation” is not a term of art; it could be defined [as] any collection of systems. Clustering implies specific algorithmic behaviour. Tons of academic work in this field (e.g. ACM PODS).”

A ‘term of art’ is a phrase that has a precise and specific meaning in a particular field or profession. ACM is the Association for Computing Machinery which aims to advance computing as a science and a profession. PODS is the symposium on the Principles of Database Systems and technical papers are published at its conferences.

The Blocks & Files takeaway: clustering can have a precise meaning while federation is less defined.

Product examples

A closer look at Dell EMC SC, Unuty and XtremIO arrays will illustrate the differences between scale-up, federated and clustered systems

SC arrays are federated and the SC operating system provides a layer of virtualization in multi-array environments. This acts as a “storage hypervisor” to abstract and dynamically manage LUN mappings across more than one SC system, independent of their physical location.

Dell EMC SC Array federation video screenshot
Dell EMC SC Array federation video

Live Volume creates synchronous or asynchronous live copies of data on separate arrays, transparently maintaining and swapping the primary host source, either on-demand or in response to an unexpected outage. 

The Live Migrate feature uses this virtualization layer, allowing you to move data freely and transparently across local, campus, metro or geo distances, without interrupting workload or reconfiguring hosts.

The Dell Storage Manager (DSM) 4 “tiebreaker” service simplifies management further by automatically keeping each array aware of the other’s status and ensuring they synchronise fully during recovery. The DSM tiebreaker runs on a VM in a public or private cloud at a third location, providing additional fault tolerance for the overall system.

XtremIO arrays are clustered, with clusters formed from up to eight X-Bricks. An X-Brick consists of two controllers and a Drive Array Enclosure. The X-Brick nodes are interconnected with a redundant 40Gbit/s QDR InfiniBand network for backend connectivity between the bricks.

Unity arrays have a dual-controller architecture and are neither federated nor clustered. They scale up by converting to a higher-performing model in the range through data-in-place conversions. This involvesr eplacing the storage controllers.

Federating and clustering summary

You can use a group of separate arrays, a group of federated arrays or a cluster of arrays when you need more than one array to provide storage for a set of applications.

With separate arrays each application uses one array from the group as its storage device. Each array is separately managed and there no in-built arrangements for data preservation if an array or access to it fails.

With a cluster the group of arrays functions as single logical array, sharing IOs, a namespace, management and the workload across the nodes (arrays) in the cluster. They use the cluster interconnect for cluster control path messages enabling the nodes to be so co-ordinated. There can be protection against cluster node failure. The cluster is centrally managed.

Clustering provides a higher degree of fault-tolerance and increased storage compute than federation but requires cluster interconnect hardware, identical nodes generally, and a cluster-aware array operating system

With a federation the group of arrays presents itself as a halfway house with some degree of co-ordination possible. But there is no concept of belonging to a single logical resource and no in-built facility for recovering from a node failure.

Federation can use ordinary, commodity array hardware and dissimilar nodes and it is typically less expensive than clustering. Software can be used to bring the federated nodes closer together, protecting against node failure or helping with workload balancing.

The more co-ordination software you add to a federation the closer it gets to becoming a cluster.

Reasons for clustering and federating

Why cluster? Clusters are usually deployed to improve performance and availability over that of a single computer. Typically this is more cost-effective than single computers of comparable speed or availability. For example, it is cheaper to cluster several XtremIO arrays than to buy an equivalent capacity single PowerMAX array.

An XtremIO cluster can store mission-critical data and maintain access to it when a node fails. 

Why federate? In general, federated systems are deployed when:

  • You need simpler management than that involved in looking after several separate arrays
  • You don’t need total capacity and processing resource sharing across the nodes
  • You can’t justify the cost of clustered systems with their interconnect and more complex software
  • You possibly don’t need to scale out as much as you might with a clustered system.

It is less expensive to federate several SC arrays than to buy an equivalent capacity XtremIO cluster.

.