A markup language is an artificial language using a set of annotations to text that describe how text is to be structured, laid out, or formatted. An artificial language is a Language created by a person or a group of people for a certain purpose usually when this purpose is hard to achieve by using a Natural Markup languages have been in use for centuries, and in recent years have also been used in computer typesetting and word-processing systems.
A well-known example of a markup language in use today in computing is HyperText Markup Language (HTML), one of the most used in the World Wide Web. HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure The World Wide Web (commonly shortened to the Web) is a system of interlinked Hypertext documents accessed via the Internet. HTML follows some of the markup conventions used in the publishing industry in the communication of printed work between authors, editors, and printers.
Contents |
The term markup is derived from the traditional publishing practice of "marking up"' a manuscript, which involves adding symbolic printer's instructions in the margins of a paper manuscript. A manuscript is any Document that is Written by hand as opposed to being printed or reproduced in some other way Printing is a process for reproducing text and image typically with ink on Paper using a printing press For centuries, this task was done primarily by skilled typographers known as "markup men"[1] who marked up text to indicate what typeface, style, and size should be applied to each part, and then passed the manuscript to others for typesetting by hand. In Typography, a typeface is a set of one or more Fonts designed with stylistic unity each comprising a coordinated set of Glyphs A typeface usually comprises Typesetting involves the presentation of textual material in graphic form on Paper or some other medium. Markup was also commonly applied by editors, proofreaders, and graphic designers.
The idea of markup languages was apparently first presented by publishing executive William W. Tunnicliffe at a conference in 1967, although he preferred to call it "generic coding. William W Tunnicliffe (1922? - September 12, 1996) is credited by Charles Goldfarb as being the first person (1967 to articulate the idea of separating " Tunnicliffe would later lead the development of a standard called GenCode for the publishing industry. Book designer Stanley Fish also published speculation along similar lines in the late 1960s. Brian Reid, in his 1980 dissertation at Carnegie Mellon University, developed the theory and a working implementation of descriptive markup in actual use. Brian Keith Reid (born 1949) is a computer scientist most famous for developing the Scribe word processing system the subject of his 1980 doctoral dissertation for Carnegie Mellon University (also known as CMU) is a private Research University in Pittsburgh, Pennsylvania, United However, IBM researcher Charles Goldfarb is more commonly seen today as the "father" of markup languages, because of his work on IBM GML, and then as chair of the International Organization for Standardization committee that developed SGML, the first widely used descriptive markup system. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Charles F Goldfarb is known as the father of SGML and is a co-inventor of the concept of Markup languages. Generalized Markup Language ( GML) is a set of macros that implement intent-based markup tags for the IBM Text formatter, " The Standard Generalized Markup Language ( ISO 88791986 SGML) is an ISO Standard Metalanguage in which one can define Markup languages Goldfarb hit upon the basic idea while working on an early project to help a newspaper computerize its work flow, although the published record does not clarify when. He later became familiar with the work of Tunnicliffe and Fish, and heard an early talk by Reid that further sparked his interest.
The details of the early history of descriptive markup languages are hotly debated. However, it is clear that the notion was independently discovered several times throughout the 70s (and possibly the late 60s), and became an important practice in the late 80s.
Some early examples of markup languages available outside the publishing industry can be found in typesetting tools on Unix systems such as troff and nroff. Unix (officially trademarked as UNIX, sometimes also written as Unix with Small caps) is a computer See also Trough. TROFF may also refer to a command in the BASIC programming language. nroff (short for "new Roff " is a Unix Text-formatting program; it produces output suitable for simple fixed-width printers In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a trial and error iterative process to get a document printed correctly. Trial and error, or trial by error, is a general method of Problem solving for obtaining Knowledge, both Propositional knowledge and Know-how Availability of WYSIWYG ("what you see is what you get") publishing software supplanted much use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts. WYSIWYG (ˈwɪziwɪg or /ˈwɪzɪwɪg/ is an Acronym for W hat Y ou S ee I s W hat Y ou G
Another major publishing standard is TeX, created and continuously refined by Donald Knuth in the 1970s and 80s. TeX (ˈtɛx as in Greek, often /ˈtɛk/ in English; written with a lowercase 'e' in imitation of the logo is a Typesetting system designed and mostly Donald Ervin Knuth (kəˈnuːθ (born 10 January 1938) is a renowned computer scientist and Professor Emeritus of the Art of Computer TeX concentrated on detailed layout of text and font descriptions in order to typeset mathematical books in professional quality. This required Knuth to spend considerable time investigating the art of typesetting. Typesetting involves the presentation of textual material in graphic form on Paper or some other medium. However, TeX has a steep learning curve, so that it is mainly used in academia, where it is the de facto standard in many scientific disciplines. A TeX macro package known as LaTeX provides a descriptive markup system on top of TeX, and is widely used. LaTeX (ˈleɪtɛ
The first language to make a clear and clean distinction between structure and presentation was certainly Scribe, developed by Brian Reid and described in his doctoral thesis in 1980. Generalized Markup Language ( GML) is a set of macros that implement intent-based markup tags for the IBM Text formatter, " The Standard Generalized Markup Language ( ISO 88791986 SGML) is an ISO Standard Metalanguage in which one can define Markup languages Scribe is a markup language and word processing system which pioneered the use of descriptive markup. [2] Scribe was revolutionary in a number of ways, not least that it introduced the idea of styles separated from the marked up document, and of a grammar controlling the usage of descriptive elements. Grammar is the field of Linguistics that covers the Rules governing the use of any given natural language. Scribe influenced the development of Generalized Markup Language (later SGML) and is a direct ancestor to HTML and LaTeX. Generalized Markup Language ( GML) is a set of macros that implement intent-based markup tags for the IBM Text formatter, " LaTeX (ˈleɪtɛ
In the early 1980s, the idea that markup should be focused on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James A. Marke were also key members of the SGML committee.
SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD) or schema). Document Type Definition ( DTD) is one of several SGML and XML schema languages and is also the term used to describe a document or portion thereof that This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. In Logic and Linguistics, a metalanguage is a Language used to make statements about statements in another language which is called the Object From the late 80s on, most substantial new markup languages have been based on SGML system, including for example TEI and DocBook. The Text Encoding Initiative ( TEI) is a Consortium of institutions and research projects which collectively maintains and develops a standard for the representation DocBook is a semantic Markup language for technical Documentation. SGML was promulgated as an International Standard by International Organization for Standardization, ISO 8879, in 1986.
SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, it was generally found to be cumbersome and difficult to learn, a side effect of attempting to do too much and be too flexible. For example, SGML made end tags (or start-tags, or even both) optional in certain contexts, because it was thought that markup would be done manually by overworked support staff who would appreciate saving keystrokes. A tag is a non-hierarchical keyword or term assigned to a piece of information (such as an Internet bookmark, digital image or Computer file)
By 1991, it appeared to many that SGML would be limited to commercial and data-based applications while WYSIWYG tools (which stored documents in proprietary binary formats) would suffice for other document processing applications. HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure WYSIWYG (ˈwɪziwɪg or /ˈwɪzɪwɪg/ is an Acronym for W hat Y ou S ee I s W hat Y ou G Document Processing involves the conversion of typed and handwritten text on paper-based & electronic documents (eg
The situation changed when Sir Tim Berners-Lee, learning of SGML from co-worker Anders Berglund and others at CERN, used SGML syntax to create HTML. Sir Timothy John Berners-Lee OM KBE FRS FREng FRSA (born 8 June 1955 is an English computer scientist who is credited The European Organization for Nuclear Research (Organisation Européenne pour la Recherche Nucléaire known as CERN HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure HTML resembles other SGML-based tag languages, although it began as simpler than most and a formal DTD was not developed until later. DeRose[3] argues that HTML's use of descriptive markup (and SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled (other factors include the notion of URLs and the free distribution of browsers). HTML is quite likely the most used markup language in the world today.
However, HTML's status as a markup language is disputed by some computer scientists. The argument for this is that HTML restricts the placement of tags, requiring them to be either fully nested inside of other tags, or the root tag of the document. Because of this, these scientists would suggest instead that HTML is a container language, following a Hierarchical model. Hierarchical model redirects here For the statistics usage see Hierarchical linear modeling.
XML (Extensible Markup Language) is a meta markup language that is now widely used. Don't change "Extensible" XML was developed by the World Wide Web Consortium, in a committee created and chaired by Jon Bosak. Jon Bosak led the creation of the XML specification at the W3C. The main purpose of XML was to simplify SGML by focusing on a particular problem — documents on the Internet. [4] XML remains a meta-language like SGML, allowing users to create any tags needed (hence "extensible") and then describing those tags and their permitted uses.
XML adoption was helped because every XML document can also be written in such a way that it is also an SGML document, and existing SGML users and software could switch to XML fairly easily. However, XML eliminated many of the more complex and human-oriented features of SGML to simplify implementation (while increasing markup size and reducing readability and editability). Other improvements rectified some SGML problems in international settings, and made it possible to parse and interpret document hierarchy even if no DTD is available. Document Type Definition ( DTD) is one of several SGML and XML schema languages and is also the term used to describe a document or portion thereof that
XML was designed primarily for semi-structured environments such as documents and publications. However, it appeared to hit a sweet spot between simplicity and flexibility, and was rapidly adopted for many other uses. A sweet spot is a place often numerical as opposed to physical where a combination of factors suggest a particularly suitable solution XML is now widely used for communicating data between applications. Like HTML, it can be described as a 'container' language.
Since January 2000 all W3C Recommendations for HTML have been based on XML rather than SGML, using the abbreviation XHTML (Extensible HyperText Markup Language). The Extensible Hypertext Markup Language, or XHTML, is a January 2000: &larr - January - February - March - April - May - June - July - A W3C Recommendation is the final stage of a Ratification process of the World Wide Web Consortium (W3C working group concerning the Standard. The Extensible Hypertext Markup Language, or XHTML, is a The language specification requires that XHTML Web documents must be well-formed XML documents – this allows for more rigorous and robust documents while using tags familiar from HTML.
One of the most noticeable differences between HTML and XHTML is the rule that all tags must be closed: empty HTML tags such as <br> must either be closed with a regular end-tag, or replaced by a special form: <br /> (the space before the '/' on the end tag is optional, but frequently used because it enables some pre-XML Web browsers, and SGML parsers, to accept the tag). Another is that all attribute values in tags must be quoted. HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure Finally, all tag and attribute names must be lowercase in order to be valid; HTML, on the other hand, was case-insensitive.
Many XML-based applications now exist, including Resource Description Framework (RDF), XForms, DocBook, SOAP and the Web Ontology Language (OWL). The Resource Description Framework (RDF is a family of World Wide Web Consortium (W3C Specifications originally designed as a Metadata Data XForms is an XML format for the specification of a data processing model for XML data and User interface (s for the XML data such as web forms. DocBook is a semantic Markup language for technical Documentation. SOAP (see below for name and origins is a protocol for exchanging XML -based messages over Computer networks normally using The Web Ontology Language ( OWL) is a family of Knowledge representation languages for authoring ontologies, and is endorsed by the World Wide For a partial list of these see List of XML markup languages. This page aims to list articles related XML Markup languages.
A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. This is not necessary; it is possible to isolate markup from text content, using pointers, offsets, IDs, or other methods to co-ordinate the two. Such "standoff markup" is typical for the internal representations programs use to work with marked-up documents. However, embedded or "inline" markup is much more common elsewhere. Here, for example, is a small section of text marked up in HTML:
<h1> Anatidae </h1> <p> The family <i>Anatidae</i> includes ducks, geese, and swans, but <em>not</em> the closely-related screamers. </p>
The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes h1, p, and em are examples of structural markup, in that they describe the intended purpose or meaning of the text they include. Specifically, h1 means "this is a first-level heading", p means "this is a paragraph", and em means "this is an emphasized word or phrase". A program interpreting such structural markup may apply its own rules or styles for presenting the various pieces of text, using diffent typefaces, boldness, font size, indention, colour, or other styles, as desired. A tag such as "h1" (header level 1) might be presented in a large bold sans-serif typeface, for example, or in a monospaced (typewriter-style) document it might be underscored – or it might not change the presentation at all.
In contrast, the i tag in HTML is an example of presentational markup; it is generally used to specify a particular characteristic of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance.
The Text Encoding Initiative (TEI) has published extensive guidelines for how to encode texts of interest in the humanities and social sciences, developed through years of international cooperative work. The Text Encoding Initiative ( TEI) is a Consortium of institutions and research projects which collectively maintains and develops a standard for the representation These guidelines are used by projects encoding historical documents, the works of particular scholars, periods, or genres, and so on.
While the idea of markup language originated with text documents, there is an increasing usage of markup languages in areas like vector graphics, web services, content syndication, and user interfaces. Vector graphics is the use of geometrical primitives such as points lines, Curves and shapes or Polygon (s which are all based A Web service (also Web Service) is defined by the W3C as "a software system designed to support interoperable machine-to-machine interaction Web syndication is a form of syndication in which Website material is made available to multiple other sites The user interface (or Human Computer Interface) is the aggregate of means by which people&mdash the users '&mdash interact with the System Most of these are XML applications because it is a well-defined and extensible language. The use of XML has also led to the possibility of combining multiple markup languages into a single profile, like XHTML+SMIL and XHTML+MathML+SVG[5]