Downtime or outage refers to a period of time or a percentage of a timespan that a system is unavailable or offline. System (from Latin systēma, in turn from Greek systēma is a set of interacting or interdependent Entities, real or abstract This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance. Failure ( fail, phail or flop) in general refers to the state or Condition of not meeting a desirable or intended objective Maintenance repair and operations or maintenance repair and overhaul (MRO is fixing any sort of mechanical or Electrical device should it
The term is commonly applied to networks and servers. A telecommunications network is a network of Telecommunications links and nodes arranged so that messages may be passed from one part of the network to another over A server is a Computer dedicated to providing one or more services over a computer network typically through a request-response routine The common reasons for unplanned outages are system failures (such as a crash) or communications failures (commonly known as network outage). A crash in Computing is a condition where a program (either an application or part of the Operating system) stops performing its expected function and also
The opposite of downtime is uptime. Uptime is a measure of the time a Computer system has been "up" and running
Contents |
Unplanned downtime may be the result of a software bug, human error, equipment failure, malfunction, high bit error rate, power failure, overload due to exceeding the channel capacity, a cascading failure, etc. A software bug (or just “bug” is an error flaw mistake Failure, fault or “undocumented feature” in a Computer program that prevents it Failure ( fail, phail or flop) in general refers to the state or Condition of not meeting a desirable or intended objective In Telecommunication, an error ratio is the Ratio of the number of Bits elements, characters, or blocks incorrectly received A power outage (also known as power cut, power failure, power loss, or blackout) is the loss of the Electricity supply to an People are often concerned about measuring the maximum data Throughput rate of a communications link or network access In Electrical engineering, Computer science and Information theory, channel capacity is the tightest upper bound on the amount of Information cascading failure is failure in a system of interconnected parts where the service provided depends on the operation of a preceding part and the failure of a preceding part can trigger
See also: Planned downtime
Outages caused by system failures can have a serious impact on the users of computer/network systems, in particular those industries that rely on a nearly 24-hour service:
Also affected can be the users of an ISP and other customers of a telecommunication network. Health informatics or medical informatics is the intersection of Information science, Computer science, and Health care. Nuclear power is any Nuclear technology designed to extract usable Energy from atomic nuclei via controlled Nuclear reactions Infrastructure typically refers to the technical structures that support a society such as Roads Water supply, Wastewater, Power grids A banker or bank is a Financial institution whose primary activity is to act as a payment agent for customers and to borrow and lend money In Financial economics, a financial institution acts as an agent that provides Financial services for its clients or members Aeronautics (from Greek aero which means air or sky and nautis which means sailor i An airline provides air transport services for Passengers or Freight, generally with a recognized operating certificate or license News agency (alternative A news agency is an organization of Journalists established to supply News reports to organizations in the News trade Electronic commerce, commonly known as e-commerce' or eCommerce, consists of the buying and selling of products or services over electronic Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications typically for data entry and retrieval A virtual world is a computer-based simulated environment intended for its users to inhabit and interact via avatars These avatars are usually depicted An Internet service provider ( ISP, also called Internet access provider or IAP) is a company which primarily offers their customers access to the Internet
Corporations can lose business due to network outage or they may default on a contract, resulting in financial losses.
Those people or organizations that are affected by downtime can be more sensitive to particular aspects:
The most demanding users are those that require high availability. High availability is a System design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period
| Business | Average US$ cost per hour due to network outage |
|---|---|
| Brokerage Operations | $5. 45 Million |
| Credit Card Authorisations | $2. 5 Million |
| ATM Fees | $14,000 |
| Tele Ticket Sales | $69,000 |
AT&T lost its frame relay network for 26 hours on April 13 1998[1]. Before proposing a merge request please see Talk and see if the merger you propose has recently been made and In the context of Computer networking, frame relay consists of an efficient Data transmission technique used to send digital information This affected many thousands of customers, and bank transactions were one casualty. AT&T failed to meet the service level agreement on their contracts with customers and had to refund[2] 6600 customer accounts, costing millions of dollars. A service level agreement (frequently abbreviated as SLA) is a part of a service contract where the level of service is formally defined A customer is someone who makes use of the paid products of an individual or Organization.
In service level agreements, it is common to mention a percentage value (per month or per year) that is calculated by dividing the sum of all downtimes timespans by the total time of a reference time span (e. A service level agreement (frequently abbreviated as SLA) is a part of a service contract where the level of service is formally defined g. a month). 0% downtime means that the server was available all the time.
For Internet servers downtimes above 1% per year or worse can be regarded as unacceptable as this means a downtime of more than 3 days per year. For e-commerce and other industrial use any value above 0. 1% is usually considered unacceptable.
It is the duty of the network designer to make sure that a network outage does not happen. When it does happen, a well-designed system will further reduce the effects of an outage by having localized outages which can be detected and fixed as soon as possible.
A process needs to be in place to detect a malfunction - network monitoring - and to restore the network to a working condition - this generally involves a help desk team that can troubleshoot a problem, one composed of trained engineers; a separate help desk team is usually necessary in order to field user input, which can be particularly demanding during a downtime. The term network monitoring describes the use of a system that constantly monitors a Computer network for slow or failing components and that notifies the Network administrator A help desk is an information and assistance resource that Troubleshoots problems with Computers or similar products Troubleshooting is a form of Problem solving. It is the systematic search for the source of a problem so that it can be solved
A network management system can be used to detect faulty or degrading components prior to customer complaints, with proactive fault rectification. Network management refers to the activities methods procedures and tools that pertain to the Operation, Administration, Maintenance, and Provisioning
Risk management techniques can be used to determine the impact of network outages on an organisation and what actions may be required to minimise risk. For non-business risks see Risk or the disambiguation page Risk analysis. Risk may be minimised by using reliable components, by performing maintenance, such as upgrades, by using redundant systems or by having a contingency plan or business continuity plan. In Engineering, redundancy is the duplication of critical components of a system with the intention of increasing reliability of the System, usually A Contingency plan is a plan devised for a specific situation when things could go wrong Business Continuity Planning ( BCP) is an Interdisciplinary Concept used to create and validate a practiced Logistical Plan for how Technical means can reduce errors with error correcting codes, retransmission, checksums, or diversity scheme. In Mathematics, Computer science, Telecommunication, and Information theory, error detection and correction has great practical importance in Retransmission is the resending of packets which have been either damaged or lost In Mathematics, Computer science, Telecommunication, and Information theory, error detection and correction has great practical importance in In Telecommunications, a diversity scheme refers to a method for improving the reliability of a message signal by utilizing two or more communication channels with
A planned outage is the result of a planned activity by the system owner and/or by a service provider. A service provider is an Entity that provides services to other entities Such activities can include changes or upgrades, and they are often scheduled as maintenance windows. The term upgrade refers to the replacement of a product with a newer version of that same product however it is most often used in Computing and Consumer electronics In the context of Information technology, a maintenance window, is a period of time designated in advance by the technical staff of a High-availability service such
Outages can also be planned as a result of a predictable natural event, such as Sun outage. A sun outage' is an interruption in or Distortion of geostationary satellite signals caused by interference from Solar radiation.
Maintenance downtimes have to be carefully scheduled in industries that rely on computer systems. In many cases, system-wide downtimes can be averted using what is called a "rolling upgrade" - the process of incrementally taking down parts of the system for upgrade, without affecting the overall functionality.
Downtime can also refer to time when human capital or other assets go down. For instance, if employees are in meetings or unable to perform their work due to another constraint, they are down. This can be equally expensive, and can be the result of another asset (i. e. computer/systems) being down. This is also commonly known as "dead time". In particle and nuclear detector systems the dead time is the time after each event during which the system is not able to record another event if it happens