Citizendia

Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime. Hardware is a general term that refers to the physical artifacts of a Technology. System (from Latin systēma, in turn from Greek systēma is a set of interacting or interdependent Entities, real or abstract Blacklisted is a 2002 album by Neko Case. Guest musicians on the album include Howe Gelb, Kelly Hogan, Mary Margaret O'Hara The word regime (occasionally spelled " régime " particularly in older texts refers to a set of conditions most often of a Political nature

In distributed computing a program is split up into parts that run simultaneously on multiple computers communicating over a network. Distributed computing is a form of parallel computing, but parallel computing is most commonly used to describe program parts running simultaneously on multiple processors in the same computer. Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often Both types of processing require dividing a program into parts that can run simultaneously, but distributed programs often must deal with heterogeneous environments, network links of varying latencies, and unpredictable failures in the network or the computers.

Contents

Organization

Organizing the interaction between the computers that execute distributed computations is of prime importance. In order to be able to use the widest possible variety of computers, the protocol or communication channel should not contain or use any information that may not be understood by certain machines. Special care must also be taken that messages are indeed delivered correctly and that invalid messages, which would otherwise bring down the system and perhaps the rest of the network, are rejected.

Another important factor is the ability to send software to another computer in a portable way so that it may execute and interact with the existing network. This may not always be practical when using differing hardware and resources, in which case other methods, such as cross-compiling or manually porting this software, must be used.

Goals and advantages

There are many different types of distributed computing systems and many challenges to overcome in successfully designing one. The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Any change in a Computing system such as new feature or new component is transparent if the system after change adheres to previous external interface as much as In Telecommunications and Software engineering, scalability is a desirable property of a system a network or a process which indicates its ability to either Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems. In Engineering, Fault-tolerant design, also known as fail-safe design, is a design that enables a system to continue operation possibly at a reduced level (also known

Openness

Openness is the property of distributed systems such that each subsystem is continually open to interaction with other systems (see references). Web Services protocols are standards which enable distributed systems to be extended and scaled. A Web service (also Web Service) is defined by the W3C as "a software system designed to support interoperable machine-to-machine interaction In general, an open system that scales has an advantage over a perfectly closed and self-contained system.

Consequently, open distributed systems are required to meet the following challenges:

Monotonicity
Once something is published in an open system, it cannot be taken back.
Pluralism
Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information. There is no central arbiter of truth in open distributed systems.
Unbounded nondeterminism
Asynchronously, different subsystems can come up and go down and communication links can come in and go out between subsystems of an open distributed system. Therefore the time that it will take to complete an operation cannot be bounded in advance (see unbounded nondeterminism). In Computer science, unbounded nondeterminism or unbounded indeterminacy is a property of concurrency by which the amount of delay in servicing a request

Drawbacks and disadvantages

See also: Fallacies of Distributed Computing

Technical issues

If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause disruption of the other nodes. The Fallacies of Distributed Computing are a set of common but flawed assumptions made by Programmers when first developing distributed applications. In Telecommunications and Reliability theory, the term availability has the following meanings 1 Leslie Lamport famously quipped that: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. Dr Leslie Lamport (born February 7, 1941 in New York City) is an American computer scientist. "[1]

Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may require connecting to remote nodes or inspecting communication between nodes. Troubleshooting is a form of Problem solving. It is the systematic search for the source of a problem so that it can be solved

Many types of computation are not well suited for distributed environments, typically owing to the amount of network communication or synchronization that would be required between nodes. Computation is a general term for any type of Information processing. A node ( Latin nodus, ‘knot’ is a critical element of any Computer network. If bandwidth, latency, or communication requirements are too significant, then the benefits of distributed computing may be negated and the performance may be worse than a non-distributed environment. A performance, in Performing arts, generally comprises an event in which one group of people (the performer or performers behave in a particular way for another group of people

Architecture

Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system. In computing a process is an instance of a Computer program that is being sequentially executed by a computer system that has the ability to run several computer

Distributed programming typically falls into one of several basic architectures or categories: Client-server, 3-tier architecture, N-tier architecture, Distributed objects, loose coupling, or tight coupling. The client-server Software architecture model distinguishes client systems from server systems which communicate over a Computer network In Software engineering, multi-tier architecture (often referred to as n-tier architecture) is a Client-server architecture in which an application In Software engineering, multi-tier architecture (often referred to as n-tier architecture) is a Client-server architecture in which an application Distributed objects are software modules that are designed to work together but reside either in multiple Computers connected via a network or in different Loose coupling describes a resilient relationship between two or more systems or organizations with some kind of exchange relationship

Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. In Computer science, message passing is a form of communication used in Parallel computing, Object-oriented programming, and Interprocess communication Master/slave is a model of Communication where one device or process has unidirectional Control over one or more other devices Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database. Database-centric architecture or data-centric architecture has several distinct meanings generally relating to Software architectures in which Databases Inter-Process Communication ( IPC) is a set of techniques for the exchange of data among two or more threads in one or more processes. A Computer Database is a structured collection of records or data that is stored in a computer system [2]

Concurrency

Distributed computing implements a kind of concurrency. In Computer science, concurrency is a properties of system in which several Computational processes are executing at the same time and potentially interacting It interrelates tightly with concurrent programming so much that they are sometimes not taught as distinct subjects [3]. Concurrent computing is the concurrent (simultaneous execution of multiple interacting computational tasks

Multiprocessor systems

A multiprocessor system is simply a computer that has more than one CPU on its motherboard. Multiprocessing is the use of two or more central processing units (CPUs within a single computer system If the operating system is built to take advantage of this, it can run different processes (or different threads belonging to the same process) on different CPUs. In computing a process is an instance of a Computer program that is being sequentially executed by a computer system that has the ability to run several computer

Multicore systems

Intel CPUs from the late Pentium 4 era (Northwood and Prescott cores) employed a technology called Hyperthreading that allowed more than one thread (usually two) to run on the same CPU. The Pentium 4 brand refers to Intel 's line of single- core mainstream desktop and Laptop Central processing units (CPUs introduced Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology A thread in Computer science is short for a thread of execution. The more recent Sun UltraSPARC T1, AMD Athlon 64 X2, AMD Athlon FX, AMD Opteron, Intel Pentium D, Intel Core, Intel Core 2 and Intel Xeon processors feature multiple processor cores to also increase the number of concurrent threads they can run. Sun Microsystems ' UltraSPARC T1 Microprocessor, known until its 14 November 2005 announcement by its development Codename " The Athlon 64 X2 is the first dual-core desktop CPU manufactured by AMD. The Athlon 64 is an eighth-generation AMD64 architecture Microprocessor produced by AMD, released on The Opteron is AMD 's X86 server processor line and was the first processor to implement the AMD64 Instruction set architecture (known The Pentium D brand refers to two series of Dual-core 64-bit X86 processors with the NetBurst Microarchitecture manufactured The Core brand refers to Intel 's 32-bit mobile Dual-core X86 CPUs that derived from the Pentium M branded processors The Core 2 brand refers to a range of Intel 's consumer 64-bit dual-core and 2x2 MCM quad-core CPUs with the X86-64 instruction set The Xeon brand refers to many families of Intel 's x86 Multiprocessing CPUs – for dual-processor (DP and multi-processor (MP configuration

Multicomputer systems

A multicomputer may be considered to be either a loosely coupled NUMA computer or a tightly coupled cluster. Non-Uniform Memory Access or Non-Uniform Memory Architecture ( NUMA) is a computer memory design used in Multiprocessors where the memory access Multicomputers are commonly used when strong compute power is required in an environment with restricted physical space or electrical power.

Common suppliers include Mercury Computer Systems, CSPI, and SKY Computers. Mercury Computer Systems Inc provides high-performance embedded real-time digital signal and image processing solutions

Common uses include 3D medical imaging devices and mobile radar.

Computing taxonomies

The types of distributed systems are based on Flynn's taxonomy of systems; single instruction, single data (SISD), single instruction, multiple data (SIMD), multiple instruction, single data (MISD), and multiple instruction, multiple data (MIMD). Flynn's taxonomy is a classification of Computer architectures proposed by Michael J In Computing, SISD ( S ingle I nstruction S ingle D ata is a term referring to an architecture in which a single processor an uniprocessor In Computing, SIMD ( S ingle I nstruction M ultiple D ata is a technique employed to achieve data level parallelism as in a Vector In Computing, MISD ( M ultiple I nstruction S ingle D ata is a type of Parallel computing architecture where In Computing, MIMD ( M ultiple I nstruction stream M ultiple D ata stream is a technique employed to achieve parallelism Other taxonomies and architectures available at Computer architecture and in Category:Computer architecture. In Computer engineering, computer architecture is the conceptual design and fundamental operational structure of a Computer system

Computer clusters

Main article: Cluster computing

A cluster consists of multiple stand-alone machines acting in parallel across a local high speed network. Distributed computing differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running "group" tasks, whereas clustered computers are usually much more tightly coupled. Distributed computing also often consists of machines which are widely separated geographically.

Grid computing

Main article: Grid computing

A grid uses the resources of many separate computers, loosely connected by a network (usually the Internet), to solve large-scale computation problems. Grid computing is a form of Distributed computing whereby a "super and virtual computer" is composed of a cluster of networked loosely-coupled Public grids may use idle time on many thousands of computers throughout the world. Such arrangements permit handling of data that would otherwise require the power of expensive supercomputers or would have been impossible to analyze. A supercomputer is a Computer that is at the frontline of processing capacity particularly speed of calculation (at the time of its introduction

Languages

Nearly any programming language that has access to the full hardware of the system could handle distributed programming given enough time and code. A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer. Hardware is a general term that refers to the physical artifacts of a Technology. Remote procedure calls distribute operating system commands over a network connection. Remote procedure call ( RPC) is an Inter-process communication technology that allows a Computer program to cause a Subroutine or procedure to An operating system (commonly abbreviated OS and O/S) is the software component of a Computer system that is responsible for the management and coordination Systems like CORBA, Microsoft DCOM, Java RMI and others, try to map object oriented design to the network. The Common Object Requesting Broker Architecture (CORBA is a standard defined by the Object Management Group (OMG that enables software components written Distributed Component Object Model ( DCOM) is a proprietary Microsoft technology for communication among software components distributed across The Java Remote Method Invocation API, or Java RMI, is a Java application programming interface for performing the object equivalent of Remote procedure Object-oriented programming (OOP is a Programming paradigm that uses " objects " and their interactions to design applications and computer programs Loosely coupled systems communicate through intermediate documents that are typically human readable (e. g. XML, HTML, SGML, X.500, and EDI). Don't change "Extensible" HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure The Standard Generalized Markup Language ( ISO 88791986 SGML) is an ISO Standard Metalanguage in which one can define Markup languages X500 is a series of computer networking standards covering electronic Directory services The X Electronic Data Interchange ( EDI) refers to the structured transmission of data between organizations by electronic means

Examples

Projects

A variety of distributed computing projects have grown up in recent years. A list of Distributed computing projects. Berkeley Open Infrastructure for Network Computing (BOINC See also Berkeley Open Infrastructure for Network Many are run on a volunteer basis, and involve users donating their unused computational power to work on interesting computational problems. Examples of such projects include the Stanford University Chemistry Department Folding@home project, which is focused on simulations of protein folding to find disease cures and to understand biophysical systems; World Community Grid, an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity, run and funded by IBM; SETI@home, which is focused on analyzing radio-telescope data to find evidence of intelligent signals from space, hosted by the Space Sciences Laboratory at the University of California, Berkeley; and distributed.net, which is focused on breaking various cryptographic ciphers. Leland Stanford Junior University, commonly known as Stanford University or simply Stanford, is a private Research university located in Chemistry (from Egyptian kēme (chem meaning "earth") is the Science concerned with the composition structure and properties Folding@home (sometimes abbreviated as FAH or F@h) is a Distributed computing (DC project designed to perform computationally intensive simulations Protein folding is the physical process by which a Polypeptide folds into its characteristic and functional three-dimensional structure. World Community Grid ( WCG) is an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology SETI@home ("SETI at home" is a Distributed computing ( Grid computing) project using Internet -connected computers hosted by the Space The Samuel Silver Space Sciences Laboratory (SSL is an Organized Research Unit of the University of California The University of California Berkeley (also referred to as Cal, Berkeley and UC Berkeley) is a major research university located in Berkeley distributednet (or Distributed Computing Technologies Inc or DCTI) is a world-wide Distributed computing effort that is attempting to solve large scale [4]

Distributed computing projects also often involve competition with other distributed systems. This competition may be for prestige, or it may be a matter of enticing users to donate processing power to a specific project. For example, stat races are a measure of the work a distributed computing project has been able to compute over the past day or week. This has been found to be so important in practice that virtually all distributed computing projects offer online statistical analyses of their performances, updated at least daily if not in real-time.

See also

References

  1. ^ Leslie Lamport. The Fallacies of Distributed Computing are a set of common but flawed assumptions made by Programmers when first developing distributed applications. Computability Computability An introduction Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often Network Agility is an architectural discipline for Computer networking. In n-tier architecture an application server is a Server that hosts an API to expose Business Logic and Business Processes for use by other Component-based software engineering (CBSE (also known as Component-Based Development (CBD or Software Componentry) is a branch of the Software engineering The Distributed Computing Environment (DCE is a software system developed in the early 1990s by a consortium that included Apollo Computer (later part of Hewlett-Packard A job scheduler is an enterprise software application that is in charge of unattended background executions commonly known for historical reasons as Batch processing. High-throughput computing (HTC is a Computer science term to describe the use many computing resources over long periods of time to accomplish a computational task A list of Distributed computing projects. Berkeley Open Infrastructure for Network Computing (BOINC See also Berkeley Open Infrastructure for Network Service-oriented modeling is a Software development methodology that employs disciplines and a universal language to provide tactical and strategic solutions to enterprise Dr Leslie Lamport (born February 7, 1941 in New York City) is an American computer scientist. Subject: distribution (Email message sent to a DEC SRC bulletin board at 12:23:29 PDT on 28 May 87). Retrieved on 2007-04-28. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 1192 - Assassination of Conrad of Montferrat (Conrad I King of Jerusalem, in Tyre, two days after his title
  2. ^ A database-centric virtual chemistry system, J Chem Inf Model. 2006 May-Jun;46(3):1034-9
  3. ^ CS236370 Concurrent and Distributed Programming 2002
  4. ^ David P. Anderson (2005-05-23). David Pope Anderson (born 1955 is a Research Scientist at the Space Sciences Laboratory, at the University of California Berkeley, and an Adjunct Professor of Computer Year 2005 ( MMV) was a Common year starting on Saturday (link displays full calendar of the Gregorian calendar. Events 1430 - Siege of Compiègne: Joan of Arc is captured by the Burgundians while leading an army to relieve Compiègne "A Million Years of Computing". Retrieved on 2006-08-11. Year 2006 ( MMVI) was a Common year starting on Sunday of the Gregorian calendar. Events 2492 BC - Traditional date of the defeat of Bel by Hayk, progenitor and founder of the Armenian nation

Further reading

External links

The Open Directory Project ( ODP) also known as dmoz (from directory The Open Directory Project ( ODP) also known as dmoz (from directory The Open Directory Project ( ODP) also known as dmoz (from directory

Dictionary

distributed computing

-noun

  1. the process of aggregating the power of several computers to collaboratively run a single computational task in a transparent and coherent way
  2. System of using the idle time of large numbers of networked computers to work on projects too large for any single group.
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org