| Tar | |
|---|---|
![]() |
|
GNU Tar 1. 16 showing three common types of Tarballs (shown in red). |
|
| File name extension | . A filename extension is a suffix to the name of a Computer file applied to indicate the encoding convention ( File format) of its contents tar |
| Internet media type | application/x-tar |
| Uniform Type Identifier | public. An Internet media type, originally called a MIME type after MIME and sometimes a Content-type after the name of a header in several protocols whose value A Uniform Type Identifier ( UTI) is a string defined by Apple Inc tar-archive |
| Magic number | ustar at byte 257 |
| Type of format | file archive |
| Container for | anything |
| Contained by | compress, gzip, bzip2, lzma |
In computing, Tar (derived from tape archive) is both a file format (in the form of a type of archive bitstream) and the name of the program used to handle such files. A file format is a particular way to encode information for storage in a Computer file. A file archiver is a Computer program that combines a number of files together into one Archive file, or a series of archive files for easier transportation compress is a UNIX compression program based on the LZC compression method which is an LZW implementation using variable size pointers as in gzip is a Software application used for File compression. gzip is short for GNU zip; the program is a Free software replacement for the bzip2 is a free and open source Lossless data compression Algorithm and program developed by Julian Seward. The Lempel-Ziv-Markov chain-Algorithm ( LZMA) is an Algorithm used to perform Data compression. Computing is usually defined like the activity of using and developing Computer technology Computer hardware and software. A file format is a particular way to encode information for storage in a Computer file. An archive format is the File format of an Archive file. The archive format is determined by the File archiver. A bitstream or bit stream is a Time series of Bits A Bytestream is a series of Bytes typically of 8 bits each and can be The format was standardized by POSIX. POSIX (ˈpɒzɪks or "Portable Operating System Interface" is the collective name of a family of related standards specified by the IEEE to define 1-1988 and later POSIX. 1-2001. Initially developed as a raw format, used for tape backup and other sequential access devices for backup purposes, it is now commonly used to collate collections of files into one larger file, for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures. A tape drive, which is also known as a streamer, is a data storage device that reads and writes data stored on a magnetic tape. In Computer science, sequential access means that a group of elements (e In Information technology, backup refers to making copies of Data so that these additional copies may be used to restore the original after a A software distribution, also referred to as a software distro, is a bundle of a specific Software (or a collection of multiple even an entire Operating An archive refers to a collection of historical records and also refers to the location in which these records are kept In Computing, a file system (often also written as filesystem) is a method for storing and organizing Computer files and the data they contain to make In Computing, a directory, catalog, folder or drawer is an entity in a File system, which contains a group of files and/or other directories
Tar's linear roots can still be seen in its ability to work on any data stream and its slow partial extraction performance, as it has to read through the whole archive to extract only the final file. A Tar file (somefile. tar), when subsequently compressed using a compression utility such as gzip, bzip2, or (formerly) compress, produces a compressed Tar file with a filename extension indicating the type of compression (e. gzip is a Software application used for File compression. gzip is short for GNU zip; the program is a Free software replacement for the bzip2 is a free and open source Lossless data compression Algorithm and program developed by Julian Seward. compress is a UNIX compression program based on the LZC compression method which is an LZW implementation using variable size pointers as in g. : somefile. tar. gz). A . tar file is commonly referred to as a Tarball, which is usually compressed to save disk space.
As is common with Unix utilities, Tar is a single specialist program. It follows the Unix philosophy in that it can "do only one thing" (archive), "but do it well". The Unix philosophy is a set of cultural norms and philosophical approaches to developing software based on the experience of leading developers of the Unix Operating Tar is most commonly used in tandem with an external compression utility, since it has no built-in data compression facilities. These compression utilities generally only compress a single file, hence the pairing with Tar, which can produce a single file from many files. To ease this common usage, the BSD and GNU versions of Tar support the command line options -z (gzip), and -j (bzip2) which will compress or decompress the archive file it is currently working with, although even in this case the (de)compression is still actually performed by an external program. GNU ( pronounced) is a computer Operating system composed entirely of Free software. In computer Command line interfaces a command line argument is an argument sent to a program being called The GNU version will also extract compressed archives without requiring these options.
Contents |
Some simple examples of using the Tar program.
Creates a GZIP-compressed Tar file of the name eglinux. tar. gz of all files with a . txt suffix.
tar -czf eglinux. tar. gz *. txt
tar -tzf eglinux. tar. gz
Extracts all files from a compressed Tar file of the name eglinux. tar. gz.
tar -xf eglinux. tar. gz
To extract to a specific folder, use:
tar -xf eglinux. tar. gz -C ~/des
Older versions of GNU tar may require the -z option to specify the compression type.
A Tar file is the concatenation of one or more files. Each file is preceded by a header block. In Information technology, header refers to supplemental Data placed at the beginning of a block of data being stored or transmitted The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled blocks.
A limitation of early tape drives was that data could only be written to them in 512 byte blocks. As a result data in Tar files is arranged in 512 byte blocks.
The Tar command can write data to tape in chunks of several 512 byte blocks, to minimize the wasted gaps in the tape between write operations. Each chunk is called a record. The user can specify a blocking factor, which is the number of blocks per record. The end of an archive gets padded with additional blocks of zeros to make its total size equal a multiple of records, regardless of whether tape is used as the storage medium.
The file header block contains metadata about a file. Metadata ( meta data, or sometimes metainformation) is "data about data" of any sort in any media To ensure portability across different architectures with different byte orderings, the information in the header block is encoded in ASCII. American Standard Code for Information Interchange ( ASCII) Thus if all the files in an archive are text files, then the archive is essentially an ASCII file.
The fields defined by the original Unix Tar format are listed in the table below. When a field is unused it is zero filled. The header is padded with zero bytes to make it up to a 512 byte block.
| Field Offset | Field Size | Field |
|---|---|---|
| 0 | 100 | File name |
| 100 | 8 | File mode |
| 108 | 8 | Owner user ID |
| 116 | 8 | Group user ID |
| 124 | 12 | File size in bytes |
| 136 | 12 | Last modification time |
| 148 | 8 | Check sum for header block |
| 156 | 1 | Link indicator |
| 157 | 100 | Name of linked file |
The Link indicator field can have the following values:
| Value | Meaning |
|---|---|
| '0' | Normal file |
| (ASCII NUL)[1] | Normal file |
| '1' | Hard link |
| '2' | Symbolic link[2] |
| '3' | Character special |
| '4' | Block special |
| '5' | Directory |
| '6' | FIFO |
| '7' | Contiguous file[3] |
A directory is also indicated by having a trailing slash(/) in the name. American Standard Code for Information Interchange ( ASCII) In Computing, a hard link is a directory reference or pointer to a file on a storage volume In Computing, a symbolic link (also symlink or soft link) is a special type of file that contains a reference to another In Computing, a named pipe (also FIFO for its behaviour is an extension to the traditional pipe concept on Unix and Unix-like
For historical reasons numerical values are encoded in as ASCII text octal numbers, with leading zeroes. The octal Numeral system, or oct for short is the base -8 number system and uses the digits 0 to 7 The final character is either a null or a space. In writing a space () is a blank area that is devoid of content which separates words letters numbers and punctuation Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files. A gigabyte (derived from the SI prefix Giga-) is a unit of Information or Computer To overcome this limitation some versions of Tar, including the GNU implementation, support an extension in which the file size is encoded in binary. GNU ( pronounced) is a computer Operating system composed entirely of Free software. Additionally, versions of GNU Tar from 1999 and before pad the values with space characters instead of zero characters. GNU ( pronounced) is a computer Operating system composed entirely of Free software. In writing a space () is a blank area that is devoid of content which separates words letters numbers and punctuation
The checksum is calculated by taking the sum of the byte values of the header block with the eight checksum bytes taken to be ascii spaces (value 32). It is stored as a six digit octal number with leading zeroes followed by a nul and then a space.
Most modern Tar programs read and write archives in the new USTAR (Uniform Standard Tape Archive) format, which has an extended header definition as defined by the POSIX (IEEE P1003. 1) standards group. Older Tar programs will ignore the extra information, while newer programs will test for the presence of the "ustar" string to determine if the new format is in use. The USTAR format allows for longer file names and stores extra information about each file.
| Field Offset | Field Size | Field |
|---|---|---|
| 0 | 156 | (as in old format) |
| 156 | 1 | Type flag |
| 157 | 100 | (as in old format) |
| 257 | 6 | USTAR indicator "ustar" |
| 263 | 2 | USTAR version "00" |
| 265 | 32 | Owner user name |
| 297 | 32 | Owner group name |
| 329 | 8 | Device major number |
| 337 | 8 | Device minor number |
| 345 | 155 | Filename prefix |
The example below shows the ASCII dump of a header block from a Tar file created using the GNU Tar program. American Standard Code for Information Interchange ( ASCII) It was dumped with the od program. od is an ''o''ctal d umping program for Unix and Unix-like systems The "ustar" magic string followed by two spaces can be seen, meaning that the Tar file is in GNU format, partially incompatible with the true USTAR standard (in POSIX. 1-1988), which has the signature "ustar" followed by a NUL character.
0000000 e t c / p a s s w d nul nul nul nul nul nul 0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000140 nul nul nul nul 0 1 0 0 6 4 4 nul 0 0 0 0 0000160 0 0 0 nul 0 0 0 0 0 0 0 nul 0 0 0 0 0000200 0 0 4 1 3 5 5 nul 1 0 1 5 5 0 6 1 0000220 1 0 5 nul 0 1 1 5 5 6 nul sp 0 nul nul nul 0000240 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000400 nul u s t a r sp sp nul r o o t nul nul nul 0000420 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul 0000440 nul nul nul nul nul nul nul nul nul r o o t nul nul nul 0000460 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0001000
Note, the OpenBSD 3. OpenBSD is a Unix-like computer Operating system descended from Berkeley Software Distribution (BSD a Unix derivative developed at the 7 Tar does not have the 2 space characters after ustar. They are nul characters.
Note, GNU Tar by default creates incompatible archives in case that the archives contain path names that are longer than 100 characters and GNU Tar writes an incorrect size field in case a sparse file has more than 4 holes!
Like most Unix utilities, Tar doesn't require any particular filename suffix in order to recognize a file as an archive. Conventionally, uncompressed Tar archive files have names ending in ". tar". If an archive is compressed with an external tool, the compression program adds its own suffix as usual, resulting in filename endings like ". tar. Z", ". tar. gz", and ". tar. bz2".
Names like those can't exist on MS-DOS due to its 8.3 filename limitations, so a second set of conventions appeared for storing compressed Tar archives on an MS-DOS file system:
These shortened filename suffixes are still in common use.
Tarbomb is derogatory hacker slang used to refer to a Tarball containing files that untar to the current directory instead of untarring into a directory of their own. The Jargon File is a Glossary of hacker Slang. The original Jargon File was a collection of hacker slang from technical cultures such as the MIT AI This can be a potential problem if it overwrites files using the same name in the current directory. It can also be a pain for the user who then needs to delete all the files that are scattered over the directory amongst other files. Often this ends up happening in the user's home directory. Such behaviour is often considered bad etiquette on the part of the archive's creator.
A tarpipe is a way of directing the creation of a tar archive to standard in and then piping standard in to a new directory and the subsequent tar extraction. This is a useful way to copy directories and subdirectories, especially if the directories contain special files, such as symlinks, and character or block devices.
tar -cf - $srcdir | ( cd $destdir ; tar -xvf - )
A remote tarpipe or ssh tarpipe uses the same methodology of a tarpipe, but instead of simply changing to a new directory on the local host to extract the tar, the user logs into a remote host in order to execute the tar extraction.
tar -cf - $srcdir | ( ssh $user@$remote. host "cd $destdir ; tar -xvf -" )
Tarpit is a term to describe a method of revision control where a Tar is used to capture the state of development of a software module at a particular point in time. The use of a Tarpit typically loosely mirrors the use of a revision control software tag and branching through the use of descriptive names. Revision control (also known as version control (system (VCS, source control or (source code management (SCM) is the management of multiple revisions