Citizendia
Your Ad Here

Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes incorrectly referred to as Code page A bit is a binary digit, taking a value of either 0 or 1 Binary digits are a basic unit of Information storage and communication In Computing, an octet is a grouping of eight Bits Octet, with the only exception noted below always refers to an entity having exactly eight For other uses see Character. In Computer and machine-based Telecommunications terminology a character is a unit of IBM and Microsoft often allocate a code page number to a character set even if that charset is better known by another name. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Microsoft Corporation is an American multinational Computer technology Corporation, which rose to dominate the Home computer A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes incorrectly referred to as Code page

Whilst the term code page originated from IBM's EBCDIC-based mainframe systems, the term is most commonly associated with the IBM PC code pages. Extended Binary Coded Decimal Interchange Code ( EBCDIC) is an 8- Bit Character encoding ( Code page) used on IBM mainframe Operating Microsoft, a maker of PC operating systems, refers to these code pages as OEM code pages, and supplements them with its own "ANSI" code pages. Microsoft Corporation is an American multinational Computer technology Corporation, which rose to dominate the Home computer An operating system (commonly abbreviated OS and O/S) is the software component of a Computer system that is responsible for the management and coordination An original equipment manufacturer, or OEM is typically a company that uses a component made by a second company in its own product or sells the product of the second company

Most well-known code pages, excluding those for the CJK languages and Vietnamese, represent character sets that fit in 8 bits and do not involve anything that cannot be represented by mapping each code to a simple bitmap, such as combining characters, complex scripts, etc. CJK is a collective term for Chinese, Japanese, and Korean, which constitute the main East Asian languages. Vietnamese ( tiếng Việt, or less commonly Việt ngữ) formerly known under French colonization as Annamese ( see Annam)

The text mode of standard (VGA compatible) PC graphics hardware is built around using an 8 bit code page, though it is possible to use two at once with some color depth sacrifice, and up to 8 may be stored in the display adaptor for easy switching [1]. There were a selection of code pages that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this system entirely. The character encodings used by these graphical systems (particularly Windows) are sometimes called code pages as well.

Contents

Relationship to ASCII

The basis of the IBM PC code pages is ASCII, a 7-bit code representing 128 characters and control codes. American Standard Code for Information Interchange ( ASCII) In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. Error detection If an odd number of bits (including the parity bit are changed in transmission of a set of bits then parity bit will be incorrect and will thus indicate When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings. The term extended ASCII (or high ASCII) describes Eight-bit or larger Character encodings that include the standard seven- Bit Extended Binary Coded Decimal Interchange Code ( EBCDIC) is an 8- Bit Character encoding ( Code page) used on IBM mainframe Operating

IBM PC (OEM) code pages

These code pages are most often used under MS-DOS-like operating systems. MS-DOS (short for M icro' s' oft D isk O perating S ystem is an Operating system commercialized by Microsoft. They include a lot of box drawing characters. Box drawing characters, also known as line drawing characters, or pseudographics, are widely used in Text user interfaces to draw various frames and boxes Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. IBM PC or MS-DOS Code page 437, often abbreviated CP437 and also known as DOS-US, OEM-US or sometimes misleadingly referred Microsoft refers to these as the OEM code pages. Examples include:

In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by newer & better international standards, such as Unicode. IBM PC or MS-DOS Code page 437, often abbreviated CP437 and also known as DOS-US, OEM-US or sometimes misleadingly referred Code page 737 (CP 737 IBM 737 OEM 737 is a Code page to be used under MS-DOS to write Greek language. Greek (el ελληνική γλώσσα or simply el ελληνικά — "Hellenic" is an Indo-European language, spoken today by 15-22 million people mainly Code page 775 (CP 775 IBM 775 OEM 775 is a Code page to be used under MS-DOS to write the Estonian, Lithuanian and Latvian languages Estonian (; ˈeːsti ˈkeːl is the official language of Estonia, spoken by about 1 Lithuanian ( lietuvių kalba) is the official state language of Lithuania and is recognised as one of the official languages of the European Union. Latvian language (latviešu valoda is the official state language of Latvia. Code page 850 is a Code page that was used in western Europe under systems such as DOS. Code page 852 (CP 852 IBM 852 OEM 852 is a Code page to be used under MS-DOS with Central European languages that use Latin script (such as CP855 is a Cyrillic Code page to be used under MS-DOS. This code page is not used much The Cyrillic alphabet (səˈrɪlɪk also called azbuka, from the old name of the first two letters is actually a family of Alphabets, subsets of which are used by Code page 857 (CP 857 IBM 857 OEM 857 is a Code page to be used under MS-DOS to write Turkish. Turkish ( tr Türkçe IPA) is a language spoken by over 63 million people worldwide making it the most commonly spoken of the Turkic languages. Code page 858 (CP 858 IBM 858 OEM 858 is a Code page to be used under MS-DOS to write Western European languages Please update other articles as well to avoid contradiction within Wikipedia e Code page 860 (CP 860 IBM 860 OEM 860 is a Code page to be used under MS-DOS to write Portuguese language. Portuguese ( or língua portuguesa) is a Romance language that originated in what is now Galicia (Spain and northern Portugal. Code page 861 (CP 861 IBM 861 OEM 861 is a Code page to be used under MS-DOS to write the Icelandic language (as well as other Nordic languages Icelandic ( is a North Germanic language, the language of Iceland. Code page 862 is a Code page for Hebrew under DOS. Like ISO 8859-8, it encodes only letters not vowel-points or cantillation marks The Hebrew alphabet (אָלֶף-בֵּית עִבְרִי alephbet ’ivri) consists of 22 letters used for writing the Hebrew language. Code page 863 (CP 863 IBM 863 OEM 863 is a Code page to be used under MS-DOS to write French language (mainly in Canada) French ( français,) is a Romance language spoken around the world by 118 million people as a native language and by about 180 to 260 million people Country to "Dominion of Canada" or "Canadian Federation" or anything else please read the Talk Page Code page 865 (CP 865 IBM 865 OEM 865 is a Code page to be used under MS-DOS with Nordic languages (except Icelandic, for which CP861 is used The Nordic countries make up a region in Northern Europe called the Nordic region, consisting of Denmark, Finland, Iceland, CP866 is a Cyrillic Code page to be used with MS-DOS. It is based on the "alternative character set" of GOST 19768-87 The Cyrillic alphabet (səˈrɪlɪk also called azbuka, from the old name of the first two letters is actually a family of Alphabets, subsets of which are used by Code page 869 (CP 869 IBM 869 OEM 869 is a Code page to be used under MS-DOS to write Greek language. Greek (el ελληνική γλώσσα or simply el ελληνικά — "Hellenic" is an Indo-European language, spoken today by 15-22 million people mainly In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's

Other code pages of note

The following code page numbers are specific to Microsoft Windows. IBM uses different numbers for these code pages.

Windows (ANSI) code pages

Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Mac OS Roman is a Character encoding primarily used by Mac OS to represent text Mac OS Roman is a Character encoding primarily used by Mac OS to represent text MacCyrillic encoding is used in Apple Macintosh computers to represent Cyrillic texts MacCyrillic encoding is used in Apple Macintosh computers to represent Cyrillic texts Macintosh Central European encoding is used in Apple Macintosh computers to represent texts in Central European and Southeastern European languages that Macintosh Central European encoding is used in Apple Macintosh computers to represent texts in Central European and Southeastern European languages that Code page 932 (abbreviated as CP932, also known by the IANA name Windows-31J) is Microsoft's extension of Shift JIS to include NEC GBK is an extension of the GB2312 Character set for simplified Chinese characters used in the People's Republic of China. GBK is an extension of the GB2312 Character set for simplified Chinese characters used in the People's Republic of China. Code page 949 is Microsoft 's implementation that appears similar to EUC-KR. Code page 950 is Microsoft's implementation of the defacto standard Big5. In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's UTF-7 (7- Bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode -encoded text using a UTF-7 (7- Bit Unicode Transformation Format) is a variable-length character encoding that was proposed for representing Unicode -encoded text using a In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's UTF-8 (8- Bit UCS / Unicode Transformation Format) is a variable-length Character encoding for Unicode. UTF-8 (8- Bit UCS / Unicode Transformation Format) is a variable-length Character encoding for Unicode. In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's ASMO449+ is a Codepage used to write Arabic (and possibly some other languages that use Arabic script) on an ASCII terminal. MIK is a Cyrillic Code page to be used with MS-DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system Microsoft Corporation is an American multinational Computer technology Corporation, which rose to dominate the Home computer Windows code pages are sets of characters or Code pages (known as Character encodings in other operating systems used in Microsoft Windows systems ISO 8859-1, more formally cited as ISO/IEC 8859-1 is part 1 of ISO/IEC 8859, a standard Character encoding of the Latin alphabet. Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. ISO 8859-1, more formally cited as ISO/IEC 8859-1 is part 1 of ISO/IEC 8859, a standard Character encoding of the Latin alphabet. ISO 8859-1, more formally cited as ISO/IEC 8859-1 is part 1 of ISO/IEC 8859, a standard Character encoding of the Latin alphabet. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252. ISO/IEC 8859 is a joint ISO and IEC standard for 8-bit Character encodings for use by computers

Criticism

Many products by Microsoft and other companies use the Microsoft code pages to encode their text. Windows-1250 is a Code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use Latin Windows-1251 is a popular 8-bit Character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian and The Cyrillic alphabet (səˈrɪlɪk also called azbuka, from the old name of the first two letters is actually a family of Alphabets, subsets of which are used by Windows-1252 (also known as WinLatin1) is a Character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows Windows-1253 is a Windows Code page used to write modern Greek (but not polytonic Greek) The Greek alphabet (Ελληνικό αλφάβητο is a set of twenty-four letters that has been used to write the Greek language since the late 9th or early Windows-1254 is a Code page used under Microsoft Windows to write Turkish. Turkish ( tr Türkçe IPA) is a language spoken by over 63 million people worldwide making it the most commonly spoken of the Turkic languages. Windows-1255 is a Codepage used under Microsoft Windows to write Hebrew. The Hebrew alphabet (אָלֶף-בֵּית עִבְרִי alephbet ’ivri) consists of 22 letters used for writing the Hebrew language. Windows-1256 is a Code page used to write Arabic (and possibly some other languages that use Arabic script) under Microsoft Windows. The Arabic alphabet is the script used for writing several languages of Asia and Africa such as Arabic, Persian, and Urdu. Windows-1257 (Windows Baltic is a Code page used to write Estonian (although that can also be written with Windows-1252) Latvian and Lithuanian The Baltic states (Balti riigid Baltijas valstis Baltijos valstybės or Baltic countries are three countries in Northern Europe, all members of the Windows-1258 is a Codepage used in Microsoft Windows to represent Vietnamese texts Vietnamese ( tiếng Việt, or less commonly Việt ngữ) formerly known under French colonization as Annamese ( see Annam) This means that other software has to choose between

The advent of Unicode and XML has solved most of these problems, because it provides and to some extent enforces clear labels for character encoding. In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's Don't change "Extensible"

Applications may also mislabel text in Windows-1252 as ISO-8859-1, the default character set for HTML. Windows-1252 (also known as WinLatin1) is a Character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows ISO 8859-1, more formally cited as ISO/IEC 8859-1 is part 1 of ISO/IEC 8859, a standard Character encoding of the Latin alphabet. Fortunately the only difference between these code pages is that the range ISO-8859-1 reserves for control characters, Windows-1252 uses for additional printable characters. Since control characters have no function in HTML, web browsers tend to use Windows-1252 rather than ISO-8859-1.

Private code pages

When, early in the history of personal computers, users didn't find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. Terminate and Stay Resident (TSR is a System call in DOS operating systems that returns control to the system as if the program has quit but keeps the program in In Computing, the BIOS (ˈbaɪoʊs An EPROM, or E rasable P rogrammable '''''R'''ead-'''O'''nly '''M'''emory'', is a type of memory chip that retains its In some cases, unofficial code page numbers were invented (e. g. , cp895).

When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. The Kamenický encoding, named for the Kamenický brothers was a very popular Code page for Personal computers running MS-DOS, used in the former Czechoslovakia The Czech alphabet is a version of the Latin alphabet, used when writing Czech The Slovak language ( slovenčina, slovenský jazyk, not to be confused with Slovenščina) sometimes referred to as "Slovakian" Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. Iran System encoding standard was an 8-bit Character encoding scheme and was created by Iran System corporation for Persian language support This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.

See also

External links

Dictionary

code page

-noun

  1. (computing) Alternative spelling of codepage.
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic