PHONAX.com Back   DNS / BIND   E-mail / Sendmail / PostFix   Fileserver / SambA / NFS   Clustering / High Availability   Security / Warnings / Firewalls   Downloads / Software   Miscelanious   Links  
  PHONAX.com  
  Clustering
Clustering in the computer-business means that you use more then 1 system for a specific task.
There are 2 mainstream clusters in the computer business:
- Redundancy clusters (to provide failover either cold/standby failover or hard realtime failover)
- Performance clusters for high-performance caluculations.

Here is a small summary of Redundancy clusters

Hot Standby Cluster (mainly to handle hardware failures)
The so called redundancy clusters can also provide loadsharing to increase performance but that is not always the case. In case of a so called Hot Standby cluster 1 machine does all the work, a second machine monitors if the machine still works on. This monitoring can be done on a hardware basis and on a software basis. When the server fails, the standby/monitor server will try to shutdown the other host and mount their shared storage and start the services that the old-server used to run. The standby server will take the old-server's IP address and sometimes even the MAC address. Within minutes of a failure a new server is running and downtime has been kept to a minimum. A disadvantage of this approach is that not always the services are tested, a service could die and the standby server would'nt notice it, because it does, RS-232 and UDP heartbeats. Nothing is wrong with the NIC/IP-Stack nor the RS-232 interface so a failling service is often unnoticed. VINCA is such a standby cluster. To me these aren't true clusters, since only a single node is active at a time.

Load/loadbalancing Redundancy/Failover Clusters
First of all let me explain the difference between loadsharing and loadbalancing. Loadsharing just devides tasks to servers in sequence. It doesn't have any knowledge of the resources available on a node. DNS Round Robin is a loadsharing principle. Loadsharing works great when all systems handle the same kinds of tasks and the machines have a simulat perfomance index.

Loadbalancing on the other hand will poll system and resource information on a scheduled basis and will submit a job to the least loaded serverin a cluster. This is spreads out the load better then with loadsharing, therefor it has a bit more overhead and is more complex.

A loadsharing/balancing cluster is designed to increase performance and also provide you with redundancy. PolyServe's Understudy and my Loadsharing-Failover are loadbalancing and loadsharing clusters. The Tru64 cluster from Compaq/Digital is also a loadbalancing cluster that provides a very-highlevel loadbalancing and redundancy. The hardest part here is a good CFS (Clustering File System), the CFS might be a shared disk array or networked synchronisation of the files. The problem lies in the fact that every server in the cluster will change files, these files should be locked so that consitency of the files are garanteed and incase of a NCFS all the changes should be send to the client nodes. Digitals CFS is very robust, Veritas also provides a clusterred filesystem (XCFS) that runs on Solaris.

Parallel Virtual Machines
PVM's in short are clusters that need specialy written applications to take use of the resources. A PVM cluster is a high-performance calculation cluster. The idea of a PVM is almost simular to SMP applications. A appliction needs to have sufficient parallelism to gain performance. For instance the calculation of a fractal that uses millions of itterations on a single formula is ideal for PVM's. A master server will send several fragments to it's cluster node the be calculated and he receives the results and makes a complete table/picture from these values. MPEG encoding is also highly mathimatical and can also be done using PVM's.
The disadvantage is the fact that you need to rewrite/write a application to use these features. BeoWulf is a very good PVM. PVM's do not provide Redundancy. A PVM could be extended to provide redundancy, but this cluster was'nt made for redundancy, it is made for highlevel IO and computation power.

Dedicated Software Clusters
I though of this word myself, I don't know a common term for this yet. I can give an example, think about SETI. When you run SETI then you are part of a gigantic cluster. But the only task it can do is calculate frames of RF data. This is pretty simple, it is simular to both Parallel and a fork and forget cluster. A frame will be send to a node and it will calculate the complete node. The same goes for 3D Studio MAX. A complete picture is send to a node. That node will calculate the whole picture without the use of other cluster members. Unlike Povray with PVM, parts of a frame/picture are rendered by several nodes. These clusters are very simple due to the fact that the tasks are very consistent and not dynamic there for a client server oriented distribution system is very easy to build.
The hacking of a password file by brute force can be done also in this matter. Send x hosts an x-part of a password dictionary and let them hash those passwords and let them check it to the provided password file. The more hosts you have the smaller the list of words is the faster you have the solution.

Fork and forget Clusters
A fork and forget cluster is both a performance cluster and very often also a redundancy cluster TruCluster and VMS-cluster they real example for this. Mosix is slowy going to provide redundancy also. The fork and forget principal is realitvly easy, you start a process and the process is being send to the least loaded server. You don't need to rewrite applications to run on a fork and forget. Logically the performance gains are not bigger than the fastest server in your cluster, provided that you have a single procss program. The smart thing is, that Mosix and TruCluster will migrate processes to the fastest server automatically. When a process needs more memory then CPU usage, a process snapshot is being made and will be send to the server with more memory. The process migration can also be used to provide redundancy. When a server fails, then the last made service snapshot can be started on a other node and contionue were it left of. You also need a CFS to get redundancy.

As you can see the different clusters tend to overlap one and other here and there. The decision for a cluster depends on wheter you have code that either is parallel or can be made parallel. Whether you want redundancy and how fast a process/host needs to switch.

I can garantee that the most complete cluster nowadays is TruCluster from Compaq/Digital. It is loadbalancing it can migrate processes between nodes it can check for the health of it's services and it provides a very good Clustering File System. When you also need PVM for your heavy caluclations, then you can just run a PVM on to of the TruCluster. This way you have a full-featured cluster, providing high-computation performance, redundancy and loadsharing.

Latency is your enemy in network based clusters
It is a common fact that applications on a SMP system might run slower than on a single CPU machine. Due to the latency of communicating over the CPU bus. Both CPU's need to be interreupted and "stopped" to exchange shared data. This procedure can slowdown a process instead of speeding it up.
When you are talking about network clusters, then your latency is even bigger. Simply due to the fact that your CPU bus has a bigger bandwidth then a Ethernet and specially a WAN. When you can calculate a fragment in 10usec and it takes 100msec to send to the master server, then you start slowing things down. You need a big enough crain to give get performance gain. The rendering of a picture forinstance takes at least 9 seconds. Sending and receiving that frame perhaps takes 1 second on a LAN, so it is worth it. But when you would cluster using the Internet, it may wel take 20 seconds to pump over the data. Which makes it faster to render all frames locally.

 
Questions

If any of you readers want to contribute, please do so. Comments (error in my suggestions , dead links, etc.) and questions? Mail me.
If you want to ask me an Unix related question then please give sufficient details about the problem, including OS.
The email address is:

questions@phonax.com
 

Search
 
fade
  Design: Rienk
|[dreamloop]|
© 2000, Raymond Doetjes
Back Top  
  Home | DNS | E-mail | Fileservers | Clustering | Security | Download | Miscelanious | Links | Contact