Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
To understand Helix, first you need to understand what is cluster management. A distributed system typically runs on multiple nodes for the following reasons:
Each node performs one or more of the primary function of the cluster, such as storing/serving data, producing/consuming data streams, etc. Once configured for your system, Helix acts as the global brain for the system. It is designed to make decisions that cannot be made in isolation. Examples of decisions that require global knowledge and coordination:
While it is possible to integrate these functions into the distributed system, it complicates the code. Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior in a declarative state model, and let Helix manage the coordination. The result is less new code to write, and a robust, highly operable system.
Modeling a distributed system as a state machine with constraints on state and transitions has the following benefits:
Requirements: Jdk 1.6+, Maven 2.0.8+
git clone https://git-wip-us.apache.org/repos/asf/incubator-helix.git
cd incubator-helix
mvn install package -DskipTests
Maven dependency
<dependency>
<groupId>org.apache.helix</groupId>
<artifactId>helix-core</artifactId>
<version>0.6.1-incubating</version>
</dependency>
Download Helix artifacts from here.