Saturday, October 3, 2009

Clustering vs. Load Balancing

Before you can talk about differences between clustering and load balancing, and there are more than a few, you've got to get the definitions straight. Clustering is often understood to mean the capability of some software to provide load balancing services, and load balancing is often used as a synonym for a hardware- or third-party-software-based solution.

In practice, clustering is usually used with application servers like IBM WebSphere, BEA WebLogic and Oracle AS (10g). Also being used in that environment are load balancing features found in Application Delivery Controllers (ADC) like BIG-IP. (For simplicity, we will talk about "clustering" versus "ADC" approaches.)

Scalability, horizontally speaking

There are hardware load balancers, of course, but there we talk about "pools" or "farms," the server groupings where application requests get distributed. It is in the software world that the term "cluster" is applied to that same group.

Clustering will typically convert one instance of an application server to a master controller, then process/distribute requests to multiple instances using such industry standard algorithms as round robin, weighted round robin or least connections. Clustering is similar to load balancing in that it has horizontal scalability, a nearly transparent way to add additional instances of application servers for increased capacity or response time performance. To ensure that an instance is actually available, clustering approaches typically use an ICMP ping check or, sometimes, HTTP or TCP connection checks.

Health and transparency

For load balancing, ADCs support the same industry algorithms, but have additional, complex number-crunching processes, and check such parameters as per-server CPU and memory utilization, fastest response times, etc. ADCs also support more robust health monitoring than the simple app server clustering solutions. This means they can verify content and do passive monitoring, dispensing with even the low impact of health checks on app server instances.

For applications that require the user to interact with the same server during a session, clustering uses server affinity to get the user there. This is most common during the execution of a process like order entry, where the session is used between pages (requests) to store data needed to close a transaction, like a shopping cart.

For the same situation, ADCs use persistence. Clustering solutions are usually somewhat limited as to the variables they can use, while ADCs can not only use traditional application variables but also get other information from the application or network-based data.

More than a few clustering solutions need node-agents deployed on each instance of an application server that is "clustered" by a controller. It may not be a burden as far as deploying and managing it, since it is often in place, but it is still means more processes running on the servers and consuming memory and CPU resources. Of course, it also adds another possible failure point to the data path. Since ADCs need no server-side components, they remain completely transparent.

Making the choice

Some would ask, Why do the extra work of building a distributed software system and cluster server setup when you can have multiple servers fulfilling specific roles-such as separate database servers, web servers, mail servers, etc.-whenever necessary?

So, how do you choose? That depends on the reasons you are considering this kind of solution in the first place, and (perhaps) whether or not you have to make an additional purchase to achieve clustering capabilities for the particular application server you have. There is also the broader question of whether or not you need (or want) to provide support for multiple application server brands. Clustering, of course, is proprietary to the application server, but ADCs can provide services for any and all applications or web servers.

Clustering checklist

Pros:

- Typically available with application server's enterprise package
- Doesn't require the highest level of networking know-how
- Usually less costly than redundant ADC deployments

Cons:

- High availability not assured with clustering solutions
- Best practices deploy the cluster controller on separate hardware
- Node agents required on managed app server instances
- Clustering is "proprietary" (you can cluster only homogeneous servers)

ADC checklist

Pros:
- Provides high availability and load balancing in heterogeneous environments
- Added value of application optimization, security and acceleration
- No changes required to applications or servers where they're deployed

Cons:

- An additional piece of infrastructure in the architecture
- Generally more costly than clustering solutions
- Could require new skill set to deploy/manage

Recommendation

Get more insight into performance, configurations and case studies by reading some testing-based articles on ADCs, and testing-based reviews of server clustering. Look for case studies that mirror your own situation, as closely as possible, and talk to people who are doing what you are planning (or thinking about). Unlike government going into the car business or taking over health care, do not do something quickly just to be seen doing something. Take care with this decision.

Seja o primeiro a comentar

Discount Lighting & Electrical, Plumbing, Flashlights, Outdoor Power Equipment, Power Tools © 2008. Template by Dicas Blogger.

TOPO