Citizendia
Your Ad Here

An example of a computer cluster
An example of a computer cluster

A computer cluster is a group of coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. A computer is a Machine that manipulates data according to a list of instructions. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. [1]

Contents

Cluster categorizations

High-availability (HA) clusters

High-availability clusters (also known as failover clusters) are implemented primarily for the purpose of improving the availability of services which the cluster provides. High-availability clusters (also known as HA Clusters or Failover Clusters) are Computer clusters that are implemented primarily for the purpose of providing They operate by having redundant nodes, which are then used to provide service when system components fail. A node ( Latin nodus, ‘knot’ is a critical element of any Computer network. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. A two-node cluster is the minimal High-availability cluster that can be built HA cluster implementations attempt to manage the redundancy inherent in a cluster to eliminate single points of failure. Reliability engineering is an Engineering field that deals with the study of Reliability: the ability of a System or component to perform its required

There are many commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux OSs. The Linux-HA (High-Availability Linux project provides a high-availability ( clustering) solution for Linux, FreeBSD, OpenBSD, Free software or software libre is Software that can be used studied and modified without restriction and which can be copied and redistributed in modified or unmodified Linux (commonly pronounced ˈlɪnəks

Load-balancing clusters

Load-balancing clusters operate by distributing a workload evenly over multiple back end nodes. In Computer networking, load balancing is a technique to spread work between two or more computers network links CPUs hard drives or other resources in order to get optimal Typically the cluster will be configured with multiple redundant load-balancing front ends.

Grid computing

Main article: Grid computing

Grid computing or grid clusters are a technology closely related to cluster computing. Grid computing is a form of Distributed computing whereby a "super and virtual computer" is composed of a cluster of networked loosely-coupled Grid computing is a form of Distributed computing whereby a "super and virtual computer" is composed of a cluster of networked loosely-coupled The key differences (by definitions which distinguish the two at all) between grids and traditional clusters are that grids connect collections of computers which do not fully trust each other, or which are geographically dispersed. (The term GRID to denote a distributed computing and storage environment was coined in 1998 by Ian Foster and Carl Kesselman. It refers to the metaphor of the power grid: computing capacity from a wall outlet and no need to install and maintain complex IT infrastructures on every location that needs access to applications. )[2] Grids are thus more like a computing utility than like a single computer. Utility computing is the packaging of computing resources, such as computation and storage as a metered service similar to a traditional Public utility (such In addition, grids typically support more heterogeneous collections than are commonly supported in clusters.

Grid computing is optimized for workloads which consist of many independent jobs or packets of work, which do not have to share data between the jobs during the computation process. Grids serve to manage the allocation of jobs to computers which will perform the work independently of the rest of the grid cluster. Resources such as storage may be shared by all the nodes, but intermediate results of one job do not affect other jobs in progress on other nodes of the grid.

An example of a very large grid is the Folding@home project. Folding@home (sometimes abbreviated as FAH or F@h) is a Distributed computing (DC project designed to perform computationally intensive simulations It is analyzing data that is used by researchers to find cures for diseases such as Alzheimer's and cancer. Another large project is the SETI@home project, which may be the largest distributed grid in existence. SETI@home ("SETI at home" is a Distributed computing ( Grid computing) project using Internet -connected computers hosted by the Space It uses approximately three million home computers all over the world to analyze data from the Arecibo Observatory radiotelescope, searching for evidence of extraterrestrial intelligence. The Arecibo Observatory is a very sensitive Radio telescope located approximately south-southwest from the town of Arecibo in Puerto Rico. A radio telescope is a form of directional Radio antenna used in Radio astronomy and in tracking and collecting data from Satellites

Implementations

The TOP500 organization's semiannual list of the 500 fastest computers usually includes many clusters. The TOP500 project ranks and details the 500 most powerful known Computer systems in the world TOP500 is a collaboration between the University of Mannheim, the University of Tennessee, and the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory. The University of Mannheim is one of the younger German universities The University of Tennessee (also known as UT) sometimes called the University of Tennessee Knoxville ( UT Knoxville, or UTK) is the flagship The Ernest Orlando Lawrence Berkeley National Laboratory ( LBNL) is a U As of November 2007, the top supercomputer is the Department of Energy's IBM BlueGene/L system with performance of 478. A supercomputer is a Computer that is at the frontline of processing capacity particularly speed of calculation (at the time of its introduction The United States Department of Energy ( DOE) is a Cabinet -level department of the United States government responsible for energy policy International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Blue Gene is a Computer architecture project designed to produce several Supercomputers designed to reach operating speeds in the PFLOPS (petaFLOPS 2 TFlops measured with High-Performance LINPACK benchmark. Measuring performance In order for FLOPS to be useful as a measure of floating-point performance a standard benchmark must be available on all computers of interest LINPACK is a software library for performing numerical Linear algebra on digital computers

Clustering can provide significant performance benefits versus price. The System X supercomputer at Virginia Tech, the 28th most powerful supercomputer on Earth as of June 2006[1], is a 12. System X is a Supercomputer assembled by Virginia Tech in the summer of 2003, that was originally composed of 1100 Apple Power Mac G5 Virginia Polytechnic Institute and State University, better known as Virginia Tech, is a public land grant polytechnic University in 25 TFlops computer cluster of 1100 Apple XServe G5 2. Apple Inc, ( formerly Apple Computer Inc, is an American Multinational corporation with a focus on designing and manufacturing Consumer electronics Xserve is the name of Apple Inc 's 1U rackmount line of server computers The PowerPC 970, PowerPC 970FX, PowerPC 970GX, and PowerPC 970MP, are 64-bit Power Architecture processors from IBM 3 GHz dual-processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X and using InfiniBand interconnect. A gigabyte (derived from the SI prefix Giga-) is a unit of Information or Computer A hard disk drive ( HDD) commonly referred to as a hard drive, hard disk, or fixed disk drive, is a Non-volatile storage device Mac OS X (mæk oʊ ɛs tɛn is a line of computer Operating systems developed marketed and sold by Apple Inc, the latest of which is pre-loaded on all currently InfiniBand is a Switched fabric communications link primarily used in High-performance computing. The cluster initially consisted of Power Mac G5s; the rack-mountable XServes are denser than desktop Macs, reducing the aggregate size of the cluster. The Power Mac G5 is Apple 's marketing name for models of the Power Macintosh which contain the PowerPC G5 CPU. The total cost of the previous Power Mac system was $5. 2 million, a tenth of the cost of slower mainframe computer supercomputers. Mainframes (often colloquially referred to as Big Iron) are Computers used mainly by large organizations for critical applications typically bulk data (The Power Mac G5s were sold off. )

The central concept of a Beowulf cluster is the use of commercial off-the-shelf (COTS) computers to produce a cost-effective alternative to a traditional supercomputer. Originally referring to a specific computer built in 1994 Beowulf is a class of Computer clusters similar to the original NASA system Commercial off-the-shelf ( COTS) is a term for Software or Hardware, generally technology or computer products that are ready-made and available for sale One project that took this to an extreme was the Stone Soupercomputer. The Stone Soupercomputer was a Beowulf Computer cluster built at the Oak Ridge National Laboratory in 1997.

However it is worth noting that FLOPs (floating point operations per second), aren't always the best metric for supercomputer speed. Clusters can have very high FLOPs, but they cannot access all data the cluster as a whole has at once. Therefore clusters are excellent for parallel computation, but much poorer than traditional supercomputers at non-parallel computation.

JavaSpaces is a specification from Sun Microsystems that enables clustering computers via a distributed shared memory. A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing Sun Microsystems Inc ( is a multinational vendor of Computers computer components Computer software, and Information technology services Distributed Shared Memory (DSM also known as a distributed global address space ( DGAS) is a term in Computer science that refers to a wide class of software

History

The history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: "Virtually every press release from DEC mentioning clusters says 'DEC, who invented clusters. . . '. IBM did not invent them either. Customers invented clusters, as soon as they could not fit all their work on one computer, or needed a backup. The date of the first is unknown, but it would be surprising if it was not in the 1960s, or even late 1950s. "

The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network.

Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Packet switching is a network communications method that splits data traffic (digital representations of text sound or video data into chunks called packets, that are then The RAND Corporation ( R esearch AN d D evelopment is a Nonprofit global policy Think tank first formed to offer research and analysis Year 1962 ( MCMLXII) was a Common year starting on Monday (the link is to a full 1962 calendar of the Gregorian calendar. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET ( Advanced Research Projects Agency Network) developed by ARPA of the United States Department of Defense, was the world's first operational Year 1969 ( MCMLXIX) was a Common year starting on Wednesday (link will display full calendar of the Gregorian calendar. The ARPANET project grew into the Internet -- which can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). The Internet is a global system of interconnected Computer networks It also established the paradigm in use by all computer clusters in the world today -- the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames.

The development of customer-built and research clusters proceeded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. Unix (officially trademarked as UNIX, sometimes also written as Unix with Small caps) is a computer The Internet Protocol Suite (commonly TCP/IP) is the set of Communications protocols used for the Internet and other similar networks PARC (Palo Alto Research Center Inc formerly Xerox PARC, is a Research and development company in Palo Alto California that began as a division of The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971. HYDRA was an early capability-based, object-oriented, Microkernel designed to support a wide range of possible Operating systems to run on top The PDP-11 was a series of 16-bit Minicomputers sold by Digital Equipment Corp The Cmmp was an early MIMD multiprocessor system developed at Carnegie Mellon University by William Wulf (1971 However, it was not until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available commercially, along with a shared filesystem. Year 1983 ( MCMLXXXIII) was a Common year starting on Saturday (link displays the 1983 Gregorian calendar) Sun Microsystems Inc ( is a multinational vendor of Computers computer components Computer software, and Information technology services

The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCNET (also CamelCased as ARCnet, an Acronym from Attached Resource Computer NETwork is a Local area network ( LAN) protocol Datapoint Corporation, originally known as Computer Terminal Corporation (CTC, was a computer company based in San Antonio Texas, United States. ARCnet was not a commercial success and clustering per se did not really take off until DEC released their VAXcluster product in 1984 for the VAX/VMS operating system. Digital Equipment Corporation was a pioneering American company in the Computer industry A VMScluster is a Computer cluster involving a group of computers running the OpenVMS operating system Year 1984 ( MCMLXXXIV) was a Leap year starting on Sunday (link displays the 1984 Gregorian calendar) Open Virtual Memory System ( OpenVMS) initially known just as Virtual Memory System ( VMS) is the name of a High-end Computer server The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. In Computing, a file system (often also written as filesystem) is a method for storing and organizing Computer files and the data they contain to make For an account of the words periphery and peripheral as they are used in biology sociology politics computer hardware and other fields see the The idea was to provide the advantages of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems. Open Virtual Memory System ( OpenVMS) initially known just as Virtual Memory System ( VMS) is the name of a High-end Computer server

Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use). Tandem Computers was an early manufacturer of Fault-tolerant computer systems, marketed to the growing number of Transaction processing customers who used them for Year 1994 ( MCMXCIV) was a Common year starting on Saturday (link will display full 1994 Gregorian calendar) Year 1994 ( MCMXCIV) was a Common year starting on Saturday (link will display full 1994 Gregorian calendar)

No history of commodity computer clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. Year 1989 ( MCMLXXXIX) was a Common year starting on Sunday (link displays 1989 Gregorian calendar) This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer -- a high performance compute cluster -- made out of any TCP/IP connected systems. Open source software (OSS began as a marketing campaign for Free software. The Internet Protocol Suite (commonly TCP/IP) is the set of Communications protocols used for the Internet and other similar networks Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive "big iron" supercomputers. Measuring performance In order for FLOPS to be useful as a measure of floating-point performance a standard benchmark must be available on all computers of interest Big iron, as the hacker 's dictionary the Jargon File defines it "refers to large expensive ultra-fast computers PVM and the advent of inexpensive networked PCs led, in 1993, to a NASA project to build supercomputers out of commodity clusters. Year 1993 ( MCMXCIII) was a Common year starting on Friday (link will display full 1993 Gregorian calendar) The National Aeronautics and Space Administration ( NASA, ˈnæsə is an agency of the United States government, responsible for the nation's public space program In 1995 the invention of the "beowulf"-style cluster -- a compute cluster built on top of a commodity network for the specific purpose of "being a supercomputer" capable of performing tightly coupled parallel HPC computations. Year 1995 ( MCMXCV) was a Common year starting on Sunday. Events of 1995 Originally referring to a specific computer built in 1994 Beowulf is a class of Computer clusters similar to the original NASA system This in turn spurred the independent development of Grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named. Grid computing is a form of Distributed computing whereby a "super and virtual computer" is composed of a cluster of networked loosely-coupled Unix (officially trademarked as UNIX, sometimes also written as Unix with Small caps) is a computer

Technologies

MPI is a widely-available communications library that enables parallel programs to be written in C, Fortran, Python, OCaml, and many other programming languages. Message Passing Interface ( MPI) is a specification for an API that allows many computers to communicate with one another tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured Fortran (previously FORTRAN) is a general-purpose, procedural, imperative Programming language that is especially suited to Python is a general-purpose High-level programming language. Its design philosophy emphasizes programmer productivity and code readability Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon

The GNU/Linux world supports various cluster software; for application clustering, there is Beowulf, distcc, and MPICH. Originally referring to a specific computer built in 1994 Beowulf is a class of Computer clusters similar to the original NASA system In Software development, distcc is a tool for speeding up compilation of Source code by using Distributed computing over a Computer MPICH is a freely available portable implementation of MPI, a standard for message-passing for distributed-memory applications used in Parallel computing. Linux Virtual Server, Linux-HA - director-based clusters that allow incoming requests for services to be distributed across multiple cluster nodes. Linux Virtual Server ( LVS) is an advanced load balancing solution for Linux systems The Linux-HA (High-Availability Linux project provides a high-availability ( clustering) solution for Linux, FreeBSD, OpenBSD, MOSIX, openMosix, Kerrighed, OpenSSI are full-blown clusters integrated into the kernel that provide for automatic process migration among homogeneous nodes. MOSIX is a management system for Linux clusters and organizational Grids that provides a Single-system image (SSI i openMosix was a free cluster management system that provides Single-system image (SSI capabilities e Kerrighed is an Open source Single-system image (SSI cluster software project OpenSSI is Single-system image clustering software for Linux. In Computer science, the kernel is the central component of most computer Operating systems (OS OpenSSI, openMosix and Kerrighed are single-system image implementations. In Distributed computing, a Single system image cluster is a cluster of machines that appears to be one single system

Microsoft Windows Compute Cluster Server 2003 based on the Windows Server platform provides pieces for High Performance Computing like the Job Scheduler, MSMPI library and management tools. Microsoft Windows is a series of Software Operating systems and Graphical user interfaces produced by Microsoft. Windows Server is a brand name for server Operating systems released by Microsoft Corporation. NCSA's recently installed Lincoln is a cluster of 450 Dell PowerEdge 1855 blade servers running Windows Compute Cluster Server 2003. This cluster debuted at #130 on the Top500 list in June 2006. The TOP500 project ranks and details the 500 most powerful known Computer systems in the world

gridMathematica provides distributed computations over clusters including data analysis, computer algebra and 3D visualization. gridMathematica is a product sold by Wolfram Research which is based on its main product Mathematica. It can make use of other technologies such as Altair PBS Professional, Microsoft Windows Compute Cluster Server, Platform LSF and Sun Grid Engine. [3]

gLite is a set of middleware technologies created by the Enabling Grids for E-sciencE (EGEE) project. Enabling Grids for E-sciencE ( EGEE) is a project funded by the European Commission 's Sixth Framework Programme through Directorate F Emerging Technologies

See also

References

  1. ^ Bader, David; Robert Pennington (June 1996). Botnet is a Jargon term for a collection of Software robots or bots that run autonomously and automatically Computer clusters run usually on physical Computers With the Virtualization approach there are new possibilities of setting up different kinds of clusters A distributed data store is a network in which a user stores his or her information on a number of peer network nodes. A flash mob computing (also flash mob computer) is a temporary ad-hoc Computer cluster running specific Software to coordinate the individual Computers For other uses of the term see Peer-to-peer (disambiguation For peer-to-peer networks used for file sharing see File sharing RoS is the abbreviation for the computing term R equest o f S ervice (or requests of service in its plural form. A server farm or server cluster is a collection of Computer servers usually maintained by an enterprise to accomplish server needs far beyond the capability In Computing, symmetric multiprocessing or SMP involves a Multiprocessor computer-architecture where two or more identical processors can connect to a single Terracotta is an Open source JVM-level clustering software for Java. A two-node cluster is the minimal High-availability cluster that can be built Service-oriented modeling is a Software development methodology that employs disciplines and a universal language to provide tactical and strategic solutions to enterprise David A Bader (born May 4, 1969) is a Professor and Executive Director of High-Performance Computing in the Georgia Tech College of Computing Cluster Computing: Applications. Georgia Tech College of Computing. The College of Computing at the Georgia Institute of Technology has roots stretching back to an Information Science degree established in 1964. Retrieved on 2007-07-13. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 1174 - William I of Scotland, a key rebel in the Revolt of 1173-1174, is captured at Alnwick by forces loyal to
  2. ^ http://www.gridipedia.eu/aboutgrid.html
  3. ^ gridMathematica Cluster Integration.

Further reading

External links


© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic