The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. In the field of Bioinformatics, a sequence database is a large collection of DNA, Protein, or other sequences stored on a computer Open access ( OA) is free immediate permanent full-text Online access for any user web-wide to digital scientific and scholarly material primarily Nucleotides are Organic compounds that consist of three joined structures a nitrogenous base a Sugar, and a Phosphate group Proteins are large Organic compounds made of Amino acids arranged in a linear chain and joined together by Peptide bonds between the Carboxyl This database is produced at National Center for Biotechnology Information (NCBI) as part of the International Nucleotide Sequence Database Collaboration, or INSDC. The National Center for Biotechnology Information ( NCBI) is part of the United States National Library of Medicine (NLM a branch of the National Institutes The International Nucleotide Sequence Database Collaboration (INSDC http//insdc The International Nucleotide Sequence Database Collaboration (INSDC http//insdc GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. GenBank continues to grow at an exponential rate, doubling every 18 months. Release 155, produced in August 2006, contained over 65 billion nucleotide bases in more than 61 million sequences. August 2006 was a month with thirty-one days On August 10, an alleged plot to detonate ten Airliners over the Atlantic Ocean was revealed to GenBank is built by direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centers.
Direct submissions are made to GenBank using BankIt, which is a Web-based form, or the stand-alone submission program, Sequin. Upon receipt of a sequence submission, the GenBank staff assigns an Accession number to the sequence and performs quality assurance checks. An accession number in bioinformatics is a unique identifier given to a DNA or protein sequence record to allow for tracking of different versions of that sequence The submissions are then released to the public database, where the entries are retrievable by Entrez or downloadable by FTP. The Entrez Global Query Cross-Database Search System is a powerful Federated search engine or Web portal that allows users to search many discrete Health sciences Bulk submissions of Expressed Sequence Tag (EST), Sequence Tagged Site (STS), Genome Survey Sequence (GSS), and High-Throughput Genome Sequence (HTGS) data are most often submitted by large-scale sequencing centers. An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either Protein -coding or not The GenBank direct submissions group also processes complete microbial genome sequences.
Contents |
Walter Goad of the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory and others established the Los Alamos Sequence Database in 1979, which culminated in 1982 with the creation of the public GenBank funded by the National Institutes of Health, the National Science Foundation, the Department of Energy and the Department of Defense. Walter Goad is one of the founders of GenBank when he was in Los Alamos. Los Alamos National Laboratory (LANL (previously known at various times as Site Y, Los Alamos Laboratory, and Los Alamos Scientific Laboratory) is a "NIH" redirects here For other meanings of NIH see NIH (disambiguation. LANL collaborated on GenBank with the firm Bolt, Beranek, and Newman, and by the end of 1983 more than 2,000 sequences were stored in it.
In the mid 1980s, the Intelligenetics bioinformatics company at Stanford University managed the GenBank project in collaboration with LANL. Leland Stanford Junior University, commonly known as Stanford University or simply Stanford, is a private Research university located in As one of the earliest bioinformatics community projects on the Internet, the GenBank project started BIOSCI/Bionet news groups for promoting open access communications among bioscientists. Bioinformatics is the application of information technology to the field of molecular biology BIOSCI, also known as Bionet, is a set of electronic communication forum used by life scientists around the world Open access ( OA) is free immediate permanent full-text Online access for any user web-wide to digital scientific and scholarly material primarily During 1989 to 1992, the GenBank project transitioned to the newly created National Center for Biotechnology Information. The National Center for Biotechnology Information ( NCBI) is part of the United States National Library of Medicine (NLM a branch of the National Institutes
The GenBank release notes for release 162. 0 (October, 2007) state that "from 1982 to the present, the number of bases in GenBank has doubled approximately every 18 months. " The following plot clearly shows the exponential growth. (On a semi-log scale such as this, a straight line represents an exponential change. In Science and Engineering, a semi-log graph or semi-log plot is a way of visualizing data that are changing with an exponential relationship )
![]()
The GenBank database includes additional data sets which are constructed mechanically from the main sequence data collection, and therefore are excluded from this count.