|
|
|---|
|
apostrophe ( ’ ' ) |
| Interword separation |
|
spaces ( ) ( ) ( ) |
| General typography |
|
ampersand ( & ) |
| Uncommon typography |
|
asterism ( ⁂ ) |
In writing, a space ( ) is any empty zone between written sections. Guillemets ( or after French) also called Angle quotes, are line segments pointed as if arrows ( « or ») sometimes forming a complementary A hyphen ( -) is a Punctuation mark It is used for both Words to join and to separate Syllables It is often confused with the dashes The question mark (? also known as an interrogation point, question point, query, or eroteme, is a punctuation mark that replaces Quotation marks or inverted commas (informally referred to as quotes and speech marks) are Punctuation marks used in pairs to set off speech A semicolon (   ) is a conventional Punctuation mark with several usages The slash ( /) is a punctuation mark It is also called a virgule, diagonal, stroke, forward slash, oblique dash, The solidus ( ⁄) is a punctuation mark that is not found on standard keyboards Interword separation is the act and the effect of mutually separating the written representations of Words The early Semitic languages mdashwhich had no vowel An interpunct ( ·) is a small dot used for Interword separation in ancient Latin script, being perhaps the first consistent visual representation of word boundaries Typography is the art and techniques of arranging type, Type design, and modifying type Glyphs Type glyphs are created and modified using a variety An ampersand ( &) also commonly called an " 'and' sign," is a Logogram representing the conjunction "and" The typographic character @, the at sign, denotes a pan-lingual abbreviation of the word 'at' An asterisk ( *) (Latin asteriscum "little star" from Greek ἀστερίσκος) is a Typographical symbol or Glyph The backslash ( \) is a typographical mark ( Glyph) used chiefly in Computing. In Typography, a bullet is a typographical symbol or Glyph used to introduce Items in a list, like below also known as the point of a bullet Caret is the name for the symbol ^ in ASCII and some other Character sets Its Unicode code point is U+005E and its ASCII code in hexadecimal is 5E The currency sign ( ¤) is a character used to denote a currency when the symbol for a particular currency is unavailable In many national currencies, the cent is a monetary unit that equals 1/100 of the basic monetary unit The euro sign (€ is the Currency sign used for the Euro, the official currency of the European Union (EU See also Pound (currency.The pound sign (" £ " or " ₤ " is the symbol for the Pound sterling —the currency of the ¥¥ ₪The sheqel sign ( ₪) A dagger ( †, &dagger U+ 2020 is a typographical symbol or Glyph. The degree symbol (° Unicode: U+00B0 HTML: &deg is a typographical symbol or Glyph, that is used to represent degrees of arc (see The inverted question and exclamation marks are used to begin interrogative and exclamatory sentences respectively in written Spanish. The inverted question and exclamation marks are used to begin interrogative and exclamatory sentences respectively in written Spanish. In Logic and Mathematics, negation or not is an operation on Logical values for example the logical value of a Proposition Number sign is a name for the symbol #; it is the preferred Unicode name for the Code point associated with that Glyph. The Numero sign (U+2116 or Number sign is used in many languages to indicate ordinal numeration especially in names and titles for example instead of writing the long " The percent sign ( %) is the symbol used to indicate a Percentage (that the preceding number is divided by one hundred The pilcrow (¶ Unicode U+00B6 HTML entity &para also called the Paragraph sign or the alinea ( The prime symbol ( ′  double prime symbol ( &Prime  triple prime symbol ( ‴  etc The section sign (§ Unicode U+00A7 HTML entity &sect is a typographical character used mainly to refer to a particular section The tilde (~ (/ˈtɪldə/ is a Grapheme with several uses The name of the character comes from Spanish, from the Latin titulus Diaeresis or trema See also Diaeresis History Historically the diaeresis mark or trema is far older than the umlaut mark The underscore _ (also called understrike, underbar, low line, or low dash is a character that originally appeared on the Typewriter. Note "broken bar" and the glyph "¦" redirect here Typography is the art and techniques of arranging type, Type design, and modifying type Glyphs Type glyphs are created and modified using a variety For other uses of this term please refer to Asterism disambiguation page The symbol ☞ is a Punctuation mark called an index or fist. In a Mathematical proof, the therefore sign (∴ is a symbol that is sometimes placed before a Logical consequence, such as the conclusion of a The interrobang ( ‽, is a nonstandard English -language Punctuation mark intended to combine the functions of the Question mark (also "؟" redirects here For the Arabic question mark see Question mark. This page lists Japanese typographic symbols which are not included in Kana or Kanji. A sarcasm mark or sarcasm point identifies text as being Derogatory or ironic. However, the term is usually used to refer to an empty zone used for interword separation (interword space) or separation between punctuation and words. Interword separation is the act and the effect of mutually separating the written representations of Words The early Semitic languages mdashwhich had no vowel Conventions about the presence and size of interword and intersentence spaces vary from language to language, and in some cases may be quite complex. Many different space characters are available in computing character sets for representing spaces of different sizes and meaning. For other uses see Character. In Computer and machine-based Telecommunications terminology a character is a unit of
Contents |
Modern English uses a standard space to separate words, but not all languages follow this practice. Interword separation is the act and the effect of mutually separating the written representations of Words The early Semitic languages mdashwhich had no vowel Spaces were not used to separate words in Latin until roughly 600 AD – 800 AD. Latin ( lingua Latīna, laˈtiːna is an Italic language, historically spoken in Latium and Ancient Rome. Ancient Hebrew and Arabic did use spaces, partly to compensate in clarity for the lack of vowels. Arabic (ar الْعَرَبيّة (informally ar عَرَبيْ) in terms of the number of speakers is the largest living member of the Semitic language Traditionally, all CJK languages have no spaces: modern Chinese and Japanese (except when written with little or no kanji) still do not, but modern Korean uses spaces. CJK is a collective term for Chinese, Japanese, and Korean, which constitute the main East Asian languages. is a language spoken by over 130 million people in Japan and in Japanese emigrant communities are the Chinese characters that are used in the modern Japanese logographic writing system along with Hiragana (ひらがな 平仮名 Katakana This article is mainly about the spoken Korean language See Hangul for details on the native Korean writing system
There are three main conventions relating to the number of spaces used to separate sentences within the same paragraph:
Note that the term double spacing can also refer to a style of leading: the insertion of a full additional empty line between lines of text. In Typography, leading (ˈlɛdɪŋ rhymes with heading) refers to the amount of added vertical Spacing between lines of type This is commonly used for text which may incorporate later markup or modifications, such as proof-readers' copies or legal documents.
In programming language syntax, spaces are frequently used to explicitly separate tokens. A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer. In Computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens Aside from this use, spaces and other whitespace characters are usually ignored by modern programming languages. In Computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in Typography. Exceptions are Haskell, ABC, and Python, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace, where whitespace is the only meaningful syntactical element. Haskell is a standardized Purely functional Programming language with non-strict semantics, named after the Logician Haskell Curry ABC is an imperative general-purpose Programming language and programming environment developed at CWI, Netherlands by Leo Geurts Python is a general-purpose High-level programming language. Its design philosophy emphasizes programmer productivity and code readability Whitespace is an Esoteric programming language developed by Edwin Brady and Chris Morris at the University of Durham.
Text editors, word processors, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. A text editor is a type of program used for editing plain Text files Text editors are often provided with Operating systems or software development Desktop publishing (also known as DTP) combines a Personal computer and WYSIWYG page layout Software to create Publication Documents In Process management, the White Space as described by Geary A In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct or other symbols. An interpunct ( ·) is a small dot used for Interword separation in ancient Latin script, being perhaps the first consistent visual representation of word boundaries Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.
In computer character encodings, there is a normal general-purpose space (Unicode character U+0020; 32 decimal) whose width will vary according to the design of the typeface. A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes incorrectly referred to as Code page In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's Typical values range from 1/5-em to 1/3-em (in digital typography an em is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3. An em is a unit of measurement in the field of Typography, equal to the point size of the current Font. 3 points). Sophisticated fonts may have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.
In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.
(In monospaced proofreading copy, only em- and en-spaces are represented using this character (which is called an em-quad or an en-quad), while other types of spaces are represented with a number sign.
When rendered, the generic Unicode space is often considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. Proofreading traditionally means reading a proof copy of a text in order to detect and correct any errors Number sign is a name for the symbol #; it is the preferred Unicode name for the Code point associated with that Glyph. The non-breaking space, U+00A0 (160 decimal), renders the same as a normal space but is expressly non-collapsible. In computer-based Text processing and Digital typesetting, a non-breaking space or no-break space ( NBSP) is In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's It is often used to prevent line wrapping or to indent text, though best World Wide Web practice prescribes using CSS for the latter purpose. The World Wide Web (commonly shortened to the Web) is a system of interlinked Hypertext documents accessed via the Internet.
Typically, an en dash is surrounded by two normal spaces, while an em dash is set continuous with the text. A dash is a Punctuation mark It is longer than a Hyphen and is used differently A dash is a Punctuation mark It is longer than a Hyphen and is used differently However, an em dash can optionally be surrounded with a so-called hair space, U+200A (8202 decimal). A dash is a Punctuation mark It is longer than a Hyphen and is used differently In Computing, Unicode is an Industry standard allowing Computers to consistently represent and manipulate text expressed in most of the world's This space should be much thinner than a normal space, and is seldom used on its own. It can be written in HTML by using the numeric character reference   or  . A numeric character reference (NCR is a common markup construct used in SGML and other SGML-based markup languages such as HTML and XML. Unfortunately, very few user agents are able to render a hair space correctly: in most cases the result is an unwanted symbol or a question mark on the screen, depending on the font and renderer capabilities. A user agent is the client application used with a particular Network protocol; the phrase is most commonly used in reference to those which access the World In Typography, a typeface is a set of one or more Fonts designed with stylistic unity each comprising a coordinated set of Glyphs A typeface usually comprises
| Normal space | left right | left right |
|---|---|---|
| Normal space with em dash | left — right | left — right |
| Hair space with em dash | left — right | left — right |
| No space with em dash | left—right | left—right |
Unicode defines several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:
| Code | No break | HTML entity | Name | In Block | Display | Description |
|---|---|---|---|---|---|---|
| U+0020 |   | Space | Basic Latin | ] [ | Normal space, same as ASCII character 0x20 | |
| U+00A0 | ✓ | | No-Break Space | Latin-1 Supplement | ] [ | Identical to U+0020, but not a point at which a line may be broken |
| U+1680 |   | Ogham Space Mark | Ogham | ] [ | Used for interword separation in Ogham text. In computer-based Text processing and Digital typesetting, a non-breaking space or no-break space ( NBSP) is Interword separation is the act and the effect of mutually separating the written representations of Words The early Semitic languages mdashwhich had no vowel Ogham (ogam ˈɔɣam Modern Irish or, English) is an Early Medieval Alphabet used primarily to represent the Old Irish language (and Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font. | |
| U+180E | ᠎ | Mongolian Vowel Separator, or MVS |
Mongolian | ][ | A thin space character used in Mongolian to cause the final two characters of a word to take on different shapes. [1] | |
| U+2002 |   | En Space, or Nut |
General Punctuation | ] [ | Width of one en (half of one em). An en is a Typographic unit, half of the width of an em. By definition it is equivalent to half of the height of the font (e An em is a unit of measurement in the field of Typography, equal to the point size of the current Font. U+2000 En Quad is canonically equivalent to this character (En Space is preferred). | |
| U+2003 |   | Em Space, or Mutton |
General Punctuation | ] [ | Width of one em. An em is a unit of measurement in the field of Typography, equal to the point size of the current Font. U+2001 Em Quad is canonically equivalent to this character (Em Space is preferred). | |
| U+2004 |   | Three-Per-Em Space, or Thick Space |
General Punctuation | ] [ | One third of an em wide | |
| U+2005 |   | Four-Per-Em Space, or Mid Space |
General Punctuation | ] [ | One fourth of an em wide | |
| U+2006 |   | Six-Per-Em Space | General Punctuation | ] [ | One sixth of an em wide. In computer typography sometimes equated to U+2009. | |
| U+2007 | ✓ |   | Figure Space | General Punctuation | ] [ | In fonts with monospaced digits, equal to the width of one digit |
| U+2008 |   | Punctuation Space | General Punctuation | ] [ | As wide as the narrow punctuation in a font | |
| U+2009 |   | Thin Space | General Punctuation | ] [ | One fifth (sometimes one sixth) of an em wide | |
| U+200A |   | Hair Space | General Punctuation | ] [ | Thinner than a thin space | |
| U+200B | ​ | Zero Width Space, or ZWSP |
General Punctuation | ][ | Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing; normally not a visible separation, but it may expand in passages that are fully justified. In Typesetting, justification (can also be referred to as 'full justification' is the Typographic alignment setting of text or Images within In HTML pages this space can be used as a potential line-break in long words as a replacement for the none standard <wbr> tag. HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure However, it is not supported in all web browsers, most notably Internet Explorer version 6 and below). A web browser is a software application which enables a user to display and interact with text images videos music games and other information typically located on a Windows Internet Explorer (formerly Microsoft Internet Explorer abbreviated MSIE) commonly abbreviated to IE, is a series of graphical |
|
| U+200C | ‌ | Zero Width Non Joiner, or ZWNJ |
General Punctuation | ][ | When placed between two characters that would otherwise be connected, a ZWNJ causes them to be printed in their final and initial forms, respectively. The zero width non joiner ( ZWNJ) is a Non-printing character used in the computerized Typesetting of some Cursive script, Korean Hangul | |
| U+200D | ‍ | Zero Width Joiner, or ZWJ |
General Punctuation | ][ | When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms. The zero width joiner ( ZWJ) is a Non-printing character ("" used in the computerized Typesetting of some Cursive scripts such | |
| U+202F | ✓ |   | Narrow No-Break Space | General Punctuation | ] [ | Similar to U+00A0 No-Break Space |
| U+205F |   | Medium Mathematical Space | General Punctuation | ] [ | Used in mathematical formulae | |
| U+2060 | ✓ | ⁠ | Word Joiner | General Punctuation | ][ | Identical to U+200B, but not a point at which a line may be broken. Introduced in Unicode 3. 2 to replace the deprecated "zero width no-break space" function of the U+FEFF character. |
| U+3000 |   | Ideographic Space | CJK Symbols and Punctuation | ] [ | As wide as a CJK character cell (fullwidth) | |
| U+FEFF | ✓ |  | Zero Width No-Break Space = Byte Order Mark (BOM) |
Arabic Presentation Forms-B | ][ | Used primarily as a Byte Order Mark character. CJK is a collective term for Chinese, Japanese, and Korean, which constitute the main East Asian languages. In CJK computing Graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形 elsewhere 全角 and halfwidth A byte-order mark ( BOM) is the Unicode character at code point U+FEFF ("zero-width no-break space" when that character is used to denote Use as an indication of non-breaking is deprecated as of Unicode 3. 2. See U+2060 instead. |
Unicode also provides some visible characters to stand in for space when necessary in the "Control Pictures" block: the Symbol For Space ␠ (U+2420), the Blank Symbol ␢ (U+2422), and the Open Box ␣ (U+2423). The interpunct · is also often used to represent a space in word processing programs such as Microsoft Word. An interpunct ( ·) is a small dot used for Interword separation in ancient Latin script, being perhaps the first consistent visual representation of word boundaries Microsoft Word is Microsoft 's flagship word processing software.
Generalised markup languages, such as SGML, do not treat space characters differently from other characters. The Standard Generalized Markup Language ( ISO 88791986 SGML) is an ISO Standard Metalanguage in which one can define Markup languages
However, special-purpose markup languages may do. In particular, web markup languages such as XML and HTML treat whitespace characters specially, including space characters, for programmers' convenience. Don't change "Extensible" HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure One or more space characters read by conforming Display-time processors of those markup languages are collapsed to 0 or 1 space, depending on their semantic context. A markup language is an Artificial language using a set of annotations to text that give instructions regarding the structure of text or how it is to be displayed For example, double (or more) spaces within text are collapsed to a single space, and spaces which appear on either side of the "=" that separates an attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in XML can contain spaces before the "/>".
In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser. [1] Whitespace in XML element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to element content. An XML document author can use the xml:space="preserve" attribute on an element to force the parser to discourage the downstream application from altering whitespace in that element's content.
In most HTML elements, a sequence of whitespace characters is treated as a single inter-word separator, which may manifest as a single space character when rendering text in a language that normally inserts such space between words. In Computing, an HTML element indicates structure in an HTML document and a way of hierarchically arranging content [2] Conforming HTML renderers are required to apply a more literal treatment of whitespace within a few prescribed elements, such as the pre tag and any element for which CSS has been used to apply pre-like whitespace processing. In such elements, space characters will not be "collapsed" into inter-word separators.
In both XML and HTML, the non-breaking space character, along with other non-"standard" spaces, is not treated as collapsible "whitespace", so it is not subject to the rules above. In computer-based Text processing and Digital typesetting, a non-breaking space or no-break space ( NBSP) is