At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery of inter-related services across a set of machines.

Even a distributed and/or replicated application that is able to survive the failure of one or more components can benefit from a higher level cluster:

System HA is possible without a cluster manager, but you save many headaches using one anyway

While SYS-V init replacements like systemd can provide deterministic recovery of a complex stack of services, the recovery is limited to one machine and lacks the context of what is happening on other machines - context that is crucial to determine the difference between a local failure, clean startup and recovery after a total site failure.

Features

The ClusterLabs stack, incorporating Corosync and Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments.

Background

Pacemaker has been around since 2004 and is primarily a collaborative effort between Red Hat and Novell, however we also receive considerable help and support from the folks at LinBit and the community in general.

Corosync also began life in 2004 but was then part of the OpenAIS project. It is primarily a Red Hat initiative, however we also receive considerable help and support from the folks the community.

The core ClusterLabs team is made up of full-time developers from Australia, Austria, China, Czech Repulic, England, Germany and the USA. Contributions to the code or documentation are always welcome.

The ClusterLabs stack ships with most modern enterprise distributions and has been deployed in many critical environments including Deutsche Flugsicherung GmbH (DFS) which uses Pacemaker with Heartbeat to ensure its air traffic control systems are always available