In databases and transaction processing, a schedule (transaction history) is serializable, has the Serializability property, if its outcome (the resulting database state, the values of the database's data) is equal to the outcome of its transactions executed sequentially without overlapping in time. A Computer Database is a structured collection of records or data that is stored in a computer system For other meanings see the disambiguation page at Transaction. In the field of Databases a schedule is a list of actions (ie Transactions are normally executed concurrently (they overlap), since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control. In Database systems isolation is a property that defines how/when the changes made by one operation become visible to other concurrent operations In Computer science, especially in the fields of Computer programming (see also Concurrent programming, Parallel programming) Operating systems
Contents |
Serializability is the major criterion for the correctness of concurrent transactions' executions. As such it is supported in all general purpose database systems. The rationale behind it is the following:
Schedules that are not serializable are likely to generate erroneous outcomes. Well known examples are with transactions that debit and credit accounts with money. If the related schedules are not serializable, then the total sum of money may not be preserved. Money could disappear, or be generated from nowhere. This and violations of possibly needed other invariant preservations are caused by one transaction writing, and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database. It does not happen if serializability is maintained.
Comment: If any specific order between some transactions is requested by an application, then it is enforced independently of the underlying serializability mechanisms. These mechanisms are typically indifferent to any specific order, and generate equivalence to some unpredictable serial order of these transactions. This order results from the scheduling ordres of concurrent transactions' data access operations, which depend on many factors.
In all real systems transactions can abort, and serializability by itself is not sufficient for correctness. Schedules also need to possess the recoverability property. In the field of Databases a schedule is a list of actions (ie Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states). While serializability is currently compromised in many applications, compromising recoverability would quickly violate the database's integrity, as well as that of transactions' results.
In many applications, unlike with finances, absolute correctness is not needed. For example, when retrieving a list of products according to specification, in most cases it does not matter much if a product, whose data was updated a short time ago, does not appear in the list, even if it meets the specification. It will typically appear in such a list when tried again a short time later. Commercial databases provide concurrency control with a whole range of isolation levels which are in fact (controlled) serializability violations in order to achieve higher performance. In Database systems isolation is a property that defines how/when the changes made by one operation become visible to other concurrent operations Higher performance means better transaction execution rate and shorter transaction response time (transaction duration).
Classes of schedules defined by relaxed serializability properties either contain the serializability class, or are incomparable with it.
Two major types of serializability exist: view serializability, and conflict serializability. Any schedule that is conflict-serializable is also view-serializable. Conflict serializability is widely utilized because it is easier to determine. Determining view-serializability of a schedule is an NP-complete problem. In Computational complexity theory, the Complexity class NP-complete (abbreviated NP-C or NPC) is a class of problems having two properties
View serializability of a schedule is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that respective transactions in the two schedules read and write the same data values ("view" the same data values).
Conflict serializability is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that both schedules have the same sets of respective chronologically-ordered pairs of conflicting operations (same precedence relations of respective conflicting operations).
Operations upon data are read or write (insert or modify or delete). Two operations are conflicting, if they are of different transactions, upon the same datum, and at least one of them is write. Debt AIDS Trade in Africa (or DATA) is a Multinational non-government organization founded in January 2002 in London by U2 's The transaction of the second operation in the pair is said to be in conflict with the transaction of the first operation. A more general definition of conflicting operations (also for complex operations, which may consist each of several "simple" read/write operations) requires that they are noncommutative (changing their order also changes their combined result). In Mathematics, commutativity is the ability to change the order of something without changing the end result Each such operation needs to be atomic by itself (by proper system support) in order to be commutative (nonconflicting) with the other. For example, the operations increment and decrement of a counter are both write operations, but do not need to be considered conflicting since they are commutative.
Schedule compliance with conflict serializability can be tested with the precedence graph (serializability graph, conflict graph) of the schedule for committed transactions. A precedence graph, also named conflict graph and Serializability graph, is used in the context of Concurrency control in Databases It is the directed graph representing precedence of transactions in the schedule, as reflected by precedence of conflicting operations in the transactions. In Mathematics and Computer science, a graph is the basic object of study in Graph theory. In the conflict graph transactions are nodes and precedence relations are directed edges. There exists an edge from a first transaction to a second transaction if the second transaction is in conflict with the first (see Conflict serializability above). This graph, when only committed transactions are considered, is acyclic, if and only if the schedule is conflict-serializable. In Computer science and Mathematics, a directed acyclic graph, also called a DAG, is a with no; that is for any vertex v, there ↔ This means that when committed transactions create a cycle, conflict-serializability is violated.
Cycles of committed transactions can be prevented by aborting an undecided (neither committed, nor aborted) transaction on each cycle in the precedence graph of all the transactions, that can otherwise turn into a cycle of committed transactions (a committed transaction cannot be aborted). One transaction aborted per cycle is both required and sufficient number. The probability of cycle generation is typically low, but nevertheless, such a situation is carefully handled, since correctness is involved. Transactions aborted due to serializability violation prevention are executed again immediately.
Serializability enforcing mechanisms typically do not maintain a precedence graph as a data structure, but rather prevent or break cycles implicitly (e. g. , SS2PL below).
Strong strict two phase locking (SS2PL) is a common mechanism utilized in database systems to enforce both conflict serializability and strictness (a special case of recoverability) of a schedule. In Databases and Transaction processing, two phase locking, ( 2PL) is a Concurrency control locking protocol mechanism that guarantees In the field of Databases a schedule is a list of actions (ie SS2PL is the name of the resulting schedule property as well, which is also called rigorousness. In this mechanism each datum is locked by a transaction before accessing it (any read or write operation): The item is marked by, associated with a lock of a certain type, depending on operation (and the specific implementation; various models with different lock types exist; in some models locks may change type during the transaction's life). In Computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of As a result access by another transaction may be blocked, typically upon conflict, depending on lock type and the other transaction's access operation type. All locked data on behalf of a transaction are released only after the transaction has ended (either committed or aborted).
Mutual blocking between transactions results in a deadlock, where execution of these transactions is stalled, and no completion can be reached. A deadlock is a situation wherein two or more competing actions are waiting for the other to finish and thus neither ever does A deadlock is a reflection of a potential cycle in the conflict graph, that would occur without the blocking. Deadlocks are resolved by aborting a transaction involved with such potential cycle. It is often detected using a wait-for graph, that indicates which transaction is "waiting for" lock release by which transaction, and a cycle means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are executed again immediately.
Other known mechanisms include:
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. A precedence graph, also named conflict graph and Serializability graph, is used in the context of Concurrency control in Databases In Databases and Transaction processing, two phase locking, ( 2PL) is a Concurrency control locking protocol mechanism that guarantees In Computer science, in the field of Databases timestamp-based concurrency control is a Non-lock concurrency control method used in Relational In Databases ' and Transaction processing, Commitment ordering (or Commit ordering; CO is a Serializability technique In Databases ' and Transaction processing, global serializability is a property of a global schedule of Transactions. In Databases ' and Transaction processing, Commitment ordering (or Commit ordering; CO is a Serializability technique A federated database system is a type of Meta-[[database management system]] (DBMS which transparently integrates multiple autonomous database systems into a single A distributed database is a Database that is under the control of a central Database management system (DBMS in which storage devices are not all attached Enforcing global serializability in such heterogeneous system, where databases may use different types of concurrency control, is problematic. In Databases ' and Transaction processing, global serializability is a property of a global schedule of Transactions. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability would lead to unacceptable performance, primarily due to computer and communication latency. Latency is a time delay between the moment something is initiated and the moment one of its effects begins or becomes detectable The problem of achieving global serializability effectively in such systems had been characterized as open until the end of 1991 (see Global serializability). An open problem is a known problem with importance or interest in some scientific area that can be accurately stated and has not yet been solved (no solution for it is known In Databases ' and Transaction processing, global serializability is a property of a global schedule of Transactions.
An effective way to enforce conflict serializability globally is to enforce the commitment ordering (CO, or commit-order-serializability) property of each local schedule. In Databases ' and Transaction processing, Commitment ordering (or Commit ordering; CO is a Serializability technique CO is a broad special case of conflict serializability, and enforcing it locally in each schedule also enforces it, and hence serializability, globally. The only needed communication between the databases for this purpose is that of the atomic commitment protocol (such as two-phase commit protocol), that exists in most distributed environments and already must be utilized for each distributed transaction. In Computer networking and Databases the two-phase commit protocol (2PC is a Distributed algorithm that lets all nodes in a Distributed system Thus CO incurs no communication overhead. In each single database, local CO algorithm can run beside any local concurrency control mechanism (serializability enforcing mechanism) without interfering with its resource access scheduling strategy. As such CO provides a general, high performance, fully distributed solution. No central processing component or central data structure is needed. Moreover, CO works also in heterogeneous environments with different database system types and other multiple transactional objects that employ different serializability mechanisms. The CO solution scales up in network size and the number of databases without any negative impact on performance (assuming the statistics of a single distributed transaction, e. In Databases ' and Transaction processing, Commitment ordering (or Commit ordering; CO is a Serializability technique g. , the average number of databases involved with a single transaction, are unchanged). This makes CO instrumental for global concurrency control. Global concurrency control typically pertains to the Concurrency control of a system comprising several components each with its own concurrency control
CO implementation by itself is not sufficient as a concurrency control mechanism, since it lacks the important recoverability property.
SS2PL implies CO, and any SS2PL-compliant database can participate in multidatabase systems that utilize the CO solution for global serializability without any modification or addition of a CO algorithm component. As a matter of fact global serializability has been achieved in all-SS2PL multidatabase environments with the two phase commit (2PC) protocol since the eighties, and SS2PL in conjunction with 2PC is the de facto standard to reach global serializability across (the common, SS2PL-compliant) databases. In Computer networking and Databases the two-phase commit protocol (2PC is a Distributed algorithm that lets all nodes in a Distributed system A de facto standard is a Standard (formal or informal that has achieved a dominant and accepted position
Having the CO property means that the precedence (partial) order of transactions' commitment events is identical to the precedence (partial) order of the respective transactions as determined by their local acyclic conflict graph. In Mathematics, especially Order theory, a partially ordered set (or poset) formalizes the intuitive concept of an ordering sequencing or arrangement Any conflict serializable schedule can be made a CO compliant one, without aborting any transaction in the schedule, by delaying commitment events to comply with the needed partial order.
The commitment event of a distributed transaction is always generated by some atomic commitment protocol, utilized to reach consensus among its processes on whether to commit or abort it. This procedure is always carried out for distributed transactions, independently of CO. The atomic commitment protocol plays a central role in the distributed CO algorithm. In case of incompatible local commitment orders in two or more databases (no global partial order can embed the local partial orders), which implies a global cycle (a cycle that spans two or more database) in the global conflict graph, CO generates a voting-deadlock for the atomic commitment protocol, and the protocol resolves that deadlock by aborting a transaction on the cycle and breaking the cycle. With CO, the global augmented conflict graph (where being blocked by a lock to prevent conflict is also considered conflict) becomes a wait-for graph for voting, and a global cycle means a voting-deadlock. Thus also global deadlocks due to locking are resolved automatically by the same mechanism. No implementation of such a global graph is needed, and it is used only to explain the behavior of CO.