A database is a structured collection of records or data. Structure is a fundamental and sometimes Intangible notion covering the Recognition, Observation, nature, and Stability of Debt AIDS Trade in Africa (or DATA) is a Multinational non-government organization founded in January 2002 in London by U2 's A computer database relies upon software to organize the storage of data. A computer is a Machine that manipulates data according to a list of instructions. The software models the database structure in what are known as database models. A database model is a theory or specification describing how a Database is structured and used The model in most common use today is the relational model. The relational model for Database management is a Database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models). Hierarchical model redirects here For the statistics usage see Hierarchical linear modeling. The network model is a Database model conceived as a flexible way of representing objects and their relationships
Database management systems (DBMS) are the software used to organize and maintain the database. These are categorized according to the database model that they support. A database model is a theory or specification describing how a Database is structured and used The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
Contents |
The earliest known use of the term data base was in November 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base[1]. System Development Corporation (SDC based in Santa Monica California, was arguably the world's first Computer software company Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (The abbreviation DB, however, survives. )
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Charles William Bachman (Dec 11 1924 Manhattan Kansas) is a American Computer scientist, who spent his entire career as an industrial researcher rather than in Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Magnetic tape is a medium for Magnetic recording generally consisting of a thin magnetizable coating on a long and narrow strip of Plastic. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell later adopted by IBM as the cornerstone of their IMS product. A data model is an Abstract model that describes how data is represented and accessed CODASYL (often spelt Codasyl) is an Acronym for "Conference on Data Systems Languages" The network model is a Database model conceived as a flexible way of representing objects and their relationships Hierarchical model redirects here For the statistics usage see Hierarchical linear modeling. Rockwell International was the ultimate incarnation of a series of companies under the sphere of influence of Willard Rockwell, who had made his fortune after the invention and International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology IBM Information Management System ( IMS) is a joint Hierarchical database and Information management system with extensive Transaction processing While IMS along with the CODASYL IDMS were the big, high visibility databases developed in the 1960s, several others were also born in that decade, some of which have a significant installed base today. IDMS (Integrated Database Management System is a ( network) CODASYL Database management system first developed at B Two worthy of mention are the PICK and MUMPS databases, with the former developed originally as an operating system with an embedded database and the latter as a programming language and database for the development of healthcare systems. The Pick operating system (often called just "the Pick system" or simply "Pick" is a demand-paged, multiuser Virtual memory, Time-sharing Mumps or epidemic Parotitis is a Viral disease of the Human species
The relational model was proposed by E. F. Codd in 1970. The relational model for Database management is a Database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar Edgar Frank "Ted" Codd ( August 23, 1923 – April 18, 2003) was a British computer scientist who while working He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Among the first implementations were Michael Stonebraker's Ingres at Berkeley, and the System R project at IBM. Michael Stonebraker is a Computer scientist specializing in database research and development The University of California Berkeley (also referred to as Cal, Berkeley and UC Berkeley) is a major research university located in Berkeley IBM System R is a Database system built as a research project at IBM San Jose Research (now IBM Almaden Research Center) in the 1970s Both of these were research prototypes, announced during 1976. The first commercial products, Oracle and DB2, did not appear until around 1980. Oracle Database (commonly referred to as Oracle RDBMS or simply Oracle) is a Relational database management system (RDBMS produced and marketed by DB2 is one of IBM 's families of Relational database management system (RDBMS (or as IBM now calls it data server software products within IBM's broader Information The first successful database product for microcomputers was dBASE for the CP/M and PC-DOS/MS-DOS operating systems. dBase was the first widely used Database management system (DBMS for Microcomputers published by Ashton-Tate for CP/M, and later on the CP/M (Control Program for Microcomputers is an Operating system originally created for Intel 8080 / 85 based Microcomputers by Gary Kildall IBM PC-DOS is a DOS operating system for the IBM Personal Computer, sold throughout the 1980s and 1990s MS-DOS (short for M icro' s' oft D isk O perating S ystem is an Operating system commercialized by Microsoft.
During the 1980s, research activity focused on distributed database systems and database machines. A distributed database is a Database that is under the control of a central Database management system (DBMS in which storage devices are not all attached A database machine or back end processor is a computer or special hardware that stores and retrieves data from a database Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. In an object database (also object oriented database) information is represented in the form of objects'' as used in Object-oriented programming. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software repositories), and multimedia data. A spatial database is a Database that is optimized to store and query data related to objects in space including points lines and polygons A software repository (sometimes abbreviated as a repo) is a storage location from which software packages may be retrieved and installed on a Computer. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result. The 1990s also saw the spread of Open Source databases, such as PostgreSQL and MySQL. Open source is a development methodology which offers practical accessibility to a product's source (goods and knowledge PostgreSQL is an Object-relational database management system (ORDBMS MySQL is a Relational database management system (RDBMS which has more than 11 million installations
In the 2000s, the fashionable area for innovation is the XML database. An XML database is a Data persistence software system that allows data to be stored in XML format As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
Various techniques are used to model data structure. A Data model is not just a way of structuring data it also defines a set of operations that can be performed on the data Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. In Mathematics, model theory is the study of (classes of mathematical structures such as groups, Fields graphs or even models Here are three examples:
In a hierarchical model, data is organized into an inverted tree-like structure, implying a multiple downward link in each node to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Hierarchical model redirects here For the statistics usage see Hierarchical linear modeling. This structure arranges the various data elements in a hierarchy and helps to establish logical relationships among data elements of multiple files. Each unit in the model is a record which is also known as a node. In such a model, each record on one level can be related to multiple records on the next lower level. A record that has subsidiary records is called a parent and the subsidiary records are called children. Data elements in this model are well suited for one-to-many relationships with other data elements in the database.
This model is advantageous when the data elements are inherently hierarchical. The disadvantage is that in order to prepare the database it becomes necessary to identify the requisite groups of files that are to be logically integrated. Hence, a hierarchical data model may not always be flexible enough to accommodate the dynamic needs of an organisation.
The network model tends to store records with links to other records. The network model is a Database model conceived as a flexible way of representing objects and their relationships Each record in the database can have multiple parents, i. e. , the relationships among data elements can have a many to many relationship. Associations are tracked via "pointers". These pointers can be node numbers or disk addresses. Most network databases tend to also include some form of hierarchical model. Databases can be translated from hierarchical model to network and vice versa. The main difference between the network model and hierarchical model is that in a network model, a child can have a number of parents whereas in a hierarchical model, a child can have only one parent.
The network model provides greater advantage than the hierarchical model in that it promotes greater flexibility and data accessibility, since records at a lower level can be accessed without accessing the records above them. This model is more efficient than hierarchical model, easier to understand and can be applied to many real world problems that require routine transactions. The disadvantages are that: It is a complex process to design and develop a network database; It has to be refined frequently; It requires that the relationships among all the records be defined before development starts, and changes often demand major programming efforts; Operation and maintenance of the network model is expensive and time consuming.
Examples of database engines that have network model capabilities are RDM Embedded and RDM Server. RDM Embedded is a high performing ACID-compliant embedded database management library designed for both disk based and in-memory embedded systems and applications
The basic data structure of the relational model is a table where information about a particular entity (say, an employee) is represented in columns and rows. The relational model for Database management is a Database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar The columns enumerate the various attributes of an entity (e. g. employee_name, address, phone_number). Rows (also called records) represent instances of an entity (e. g. specific employees).
The "relation" in "relational database" comes from the mathematical notion of relations from the field of set theory. This article sets out the set-theoretic notion of relation For a more elementary point of view see Binary relations and Triadic relations A relation is a set of tuples, so rows are sometimes called tuples. In Mathematics, a tuple is a Sequence (also known as an "ordered list" of values called the components of the tuple All tables in a relational database adhere to three basic rules.
If the same value occurs in two different records (from the same table or different tables) it can imply a relationship between those records. Relationships between records are often categorized by their cardinality (1:1, (0), 1:M, M:M). In Data modeling, the cardinality of one data table with respect to another data table is a critical aspect of Database design
Tables can have a designated column or set of columns that act as a "key" to select rows from that table with the same or similar key values. A "primary key" is a key that has a unique value for each row in the table. Keys are commonly used to join or combine data from two or more tables. For example, an employee table may contain a column named address which contains a value that matches the key of a address table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.
Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web applications, such as Wikipedia, perform SQL queries when generating pages. ***************************************************************************************** * *
In response to a query, the database returns a result set, which is the list of rows constituting the answer. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted. Often, data from multiple tables are combined into one, by doing a join. A SQL JOIN clause combines records from two tables in a Relational database, resulting in a new temporary table sometimes called a "joined table" There are a number of relational operations in addition to join.
Relations are classified based upon the types of anomalies to which they're vulnerable. Database normalization, sometimes referred to as canonical synthesis, is a technique for designing Relational database tables to minimize duplication of A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal form.
An RDBMS implements the features of the relational model outlined above. A database management system ( DBMS) is Computer software designed for the purpose of managing Databases DBMSes may use any of a variety of Data models In this context, Date's Information Principle states:
The entire information content of the database is represented in one and only one way. Chris Date (born 1941 is an independent author lecturer researcher and consultant specializing in relational database technology Namely as explicit values in column positions (attributes) and rows in relations (tuples) Ergo, there are no explicit pointers between related tables. In Mathematics, a tuple is a Sequence (also known as an "ordered list" of values called the components of the tuple
Several products have been identified as post-relational because the data model incorporates relations but is not constrained by the Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the relational model. The relational model for Database management is a Database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar These might be identified as a directed graph with trees on the nodes. In Mathematics and Computer science, a graph is the basic object of study in Graph theory. In Computer science, a tree is a widely-used Data structure that emulates a Tree structure with a set of linked nodes A data structure in Computer science is a way of storing Data in a computer so that it can be used efficiently
Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS. The Pick operating system (often called just "the Pick system" or simply "Pick" is a demand-paged, multiuser Virtual memory, Time-sharing Multidimensional databases are variously (depending on the context data Aggregators which combine data from a multitude of data sources Databases which offer networks Mumps or epidemic Parotitis is a Viral disease of the Human species
In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. Object-oriented programming (OOP is a Programming paradigm that uses " objects " and their interactions to design applications and computer programs In an object database (also object oriented database) information is represented in the form of objects'' as used in Object-oriented programming. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. In Computer science, a type system defines how a Programming language classifies values and expressions into '''types''', how it can This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). The object-relational impedance mismatch is a set of conceptual and technical difficulties which are often encountered when a Relational database management system is being At the same time, object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. In Computer science, polymorphism is a Programming language feature that allows values of different Data types to be handled using a
A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. In Computer science, persistence refers to the characteristic of data that outlives the execution of the program that created it This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered flat files, ISAM, heaps, hash buckets or B+ trees. Database tables/indexes are typically stored in memory or on hard disk in one of many forms ordered/unordered Flat files, ISAM, Heaps, Hash buckets or ISAM stands for Indexed Sequential Access Method, a method for indexing data for fast retrieval In Computer science, a heap is a specialized tree -based Data structure that satisfies the heap property if B is a In Computer science, a hash table, or a hash map, is a Data structure that associates keys with values. In Computer science, a B+ tree is a type of tree which represents sorted data in a way that allows for efficient insertion retrieval and removal of records These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.
Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries. [2]
All of these databases can take advantage of indexing to increase their speed. A database index is a Data structure that improves the speed of operations on a database table. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). In Computer science, a B-tree is a Tree data structure that keeps data sorted and allows searches insertions and deletions in logarithmic amortized In Computer science, a hash table, or a hash map, is a Data structure that associates keys with values. In Computer science, a linked list is one of the fundamental Data structures and can be used to implement other data structures Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.
Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. A SQL JOIN clause combines records from two tables in a Relational database, resulting in a new temporary table sometimes called a "joined table" The naive algorithm that joins two relations R and S by making two nested loops For each tuple in R as r do For each tuple in S as s do If r and s satisfy The Sort-Merge Join is an example of a Join algorithm and is used in the implementation of a relational Database management system. The Hash join is an example of a join algorithm and is used in the implementation of a relational Database management system. Which of these is chosen depends on whether an index exists, what type it is, and its cardinality. In SQL (Structured Query Language the term cardinality refers to the Uniqueness of data values contained in a particular column (attribute of a database
An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency. )
A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.
In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction . A database transaction is a unit of work performed against a Database management system or similar system that is treated in a coherent and reliable way independent of other Ideally, the database software should enforce the ACID rules, summarized here:
In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance.
Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. In Computer science, especially in the fields of Computer programming (see also Concurrent programming, Parallel programming) Operating systems The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions . In Databases and Transaction processing, a schedule (transaction history is Serializable, has the Serializability property if its In Databases and Transaction processing, a schedule (transaction history is Serializable, has the Serializability property if its
Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve performance or availability of the whole database system. Common replication concepts include:
Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.
Database security denotes the system, processes, and procedures that protect a database from unintended activity. Database security is the system processes and procedures that protect a Database from unintended activity
In the United Kingdom legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. (reference: [1])
Locking is the act of putting a lock (access restriction) on an aspect of a database which at a particular given instance is being modified. In Computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of Such locks can be applied on a row level, or on other levels such as an entire table. This helps maintain the integrity of the data by ensuring that only one user at a time can modify the data. Databases can also be locked for other reasons, like access restrictions for given levels of user.
Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance. See IBM for more detail.
Depending on the intended use, there are a number of database architectures in use. Many databases use a combination of strategies. On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like Google's BigTable, or bibliographic database(library catalogue) systems may use a column-oriented datastore architecture. Google Inc is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online BigTable is a compressed, high performance and proprietary database system built on Google File System (GFS Chubby Lock Service
Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.
Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).
Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common Application Programming Interface to retrieve the information stored in a database. Application software is a subclass of Computer software that employs the capabilities of a computer directly and thoroughly to a task that the user wishes to perform Two commonly used database APIs are JDBC and ODBC. Java Database Connectivity (JDBC is an API for the Java programming language that defines how a client may access a Database. In Computing, Open Database Connectivity ( ODBC) provides a standard Software API method for using Database management systems (DBMS
For example suppliers database contains the data relating to suppliers such as;
It is often used by schools to teach students and grade them.