WordNet is a semantic lexicon for the English language. A semantic lexicon is a Dictionary of Words labeled with semantic classes so associations can be drawn between words that have not previously been encountered English is a West Germanic language originating in England and is the First language for most people in the United Kingdom, the United States It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. In Metadata a Synonym ring or synset is a group of Data elements that are considered Semantically equivalent for the purposes of information retrieval Semantics is the study of meaning in communication The word derives from Greek σημαντικός ( semantikos) "significant" from This article deals with the general meaning of the term "synonym" The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. A dictionary is a book of alphabetically listed Words in a specific language with definitions etymologies pronunciations and other information or a book of alphabetically A thesaurus is a book that contains Synonyms and sometimes Antonyms, in contrast to a Dictionary, which contains Definitions and Pronunciations The database and software tools have been released under a BSD style license and can be downloaded and used freely. BSD licenses represent a family of Permissive free software licences. The database can also be browsed online. A Computer Database is a structured collection of records or data that is stored in a computer system
WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Princeton University is a private Coeducational research university located in Princeton, New Jersey. Psychology (from Greek grc ψῡχή psȳkhē, "breath life soul" and grc -λογία -logia) is an Academic and The meaning of the word professor ( Latin: professor, person who professes to be an expert in some art or science teacher of highest rank) varies George Armitage Miller, born February 3, 1920 in Charleston, West Virginia) is a professor of Psychology at Princeton University Development began in 1985. Year 1985 ( MCMLXXXV) was a Common year starting on Tuesday (link displays 1985 Gregorian calendar) Over the years, the project received about $3 million of funding, mainly from government agencies interested in machine translation. Machine translation, sometimes referred to by the abbreviation In recent years, Dr. Christiane Fellbaum has overseen the development of WordNet. Christiane D Fellbaum, born in Braunschweig, Lower Saxony, Germany, has lived in the United States since 1969
Contents |
As of 2006, the database contains about 150,000 words organized in over 115,000 synsets for a total of 207,000 word-sense pairs; in compressed form, it is about 12 megabytes in size. Year 2006 ( MMVI) was a Common year starting on Sunday of the Gregorian calendar. In Metadata a Synonym ring or synset is a group of Data elements that are considered Semantically equivalent for the purposes of information retrieval A megabyte is a unit of Information or Computer storage equal to either 106 (1000000 Bytes or 220 (1048576 bytes depending on [1]
WordNet distinguishes between nouns, verbs, adjectives and adverbs because they follow different grammatical rules. For English usage of verbs see the wiki article English verbs. In Grammar, an adjective is a word whose main syntactic role is to modify a Noun or Pronoun, giving more information about the Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as "car pool"); different senses of a word are in different synsets. Within the area of Corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance Carpooling (also known as car-sharing, ride-sharing, lift-sharing) is the shared use of a car by the driver and one or more passengers usually The meaning of the synsets is further clarified with short defining glosses (Definitions and/or example sentences). A typical example synset with gloss is:
Most synsets are connected to other synsets via a number of semantic relations. These relations vary based on the type of word, and include:
While semantic relations apply to all members of a synset because they share a meaning but are all mutually synonyms, words can also be connected to other words through lexical relations, including antonyms (opposites of each other) and derivationally related, as well. In Linguistics, a hyponym is a Word or Phrase whose semantic range is included within that of another word In Linguistics, a hyponym is a Word or Phrase whose semantic range is included within that of another word Holonymy (in Greek holon = whole and onoma = name is a semantic relation. Meronymy (from the Greek words meros = part and onoma = name is a semantic relation used in Linguistics. For English usage of verbs see the wiki article English verbs. In Grammar, an adjective is a word whose main syntactic role is to modify a Noun or Pronoun, giving more information about the This article deals with the general meaning of the term "synonym" In Lexical semantics, opposites are words that lie in an inherently incompatible binary relationship as in the opposite pairs male: female, long: short
WordNet also provides the polysemy count of a word: the number of synsets that contain the word. If a word participates in several synsets (i. e. has several senses) then typically some senses are much more common than others. WordNet quantifies this by the frequency score: in which several sample texts have all words semantically tagged with the corresponding synset, and then a count provided indicating how often a word appears in a specific sense.
The morphology functions of the software distributed with the database try to deduce the lemma or root form of a word from the user's input; only the root form is stored in the database unless it has irregular inflected forms. In Linguistics a lemma (plural lemmas or lemmata) has two distinct interpretations morphology / Lexicography: the The root is the primary lexical unit of a Word, which carries the most significant aspects of semantic content and cannot be reduced into smaller constituents A word is a unit of Language that carries meaning and consists of one or more Morphemes which are linked more or less tightly together and has a Phonetic
Both nouns and verbs are organized into hierarchies, defined by hypernym or IS A relationships. For instance, the first sense of the word dog would have the following hypernym hierarchy; the words at the same level are synonyms of each other: some sense of dog is synonymous with some other senses of domestic dog and Canis familiaris, and so on. Each set of synonyms (synset), has a unique index and shares its properties, such as a gloss (or dictionary) definition.
dog, domestic dog, Canis familiaris => canine, canid => carnivore => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature, fauna => . . .
At the top level, these hierarchies are organized into base types, 25 primitive groups for nouns, and 15 for verbs. These groups form lexicographic files at a maintenance level. These primitive groups are connected to an abstract root node that have, for some time, been assumed by various applications that use WordNet.
In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept involved with lexicographic files, do not apply here the same way they do for nouns and verbs.
The network of nouns is far deeper than that of the other parts of speech. Verbs have a far bushier structure, and adjectives are organized into many distinct clusters. Adverbs are defined in terms of the adjectives they are derived from, and thus inherit their structure from that of the adjectives.
The goal of WordNet was to develop a system that would be consistent with the knowledge acquired over the years about how human beings process language. Anomic aphasia, for example, creates a condition that seems to selectively encumber individuals' ability to name objects; this makes the decision to partition the parts of speech into distinct hierarchies more of a principled decision than an arbitrary one.
In the case of hyponymy, psychological experiments revealed that individuals can access properties of nouns more quickly depending on when a characteristic becomes a defining property. In Linguistics, a hyponym is a Word or Phrase whose semantic range is included within that of another word That is, individuals can quickly verify that canaries can sing because a canary is a songbird (only one level of hyponymy), but requires slightly more time to verify that canaries can fly (two levels of hyponymy) and even more time to verify canaries have skin (multiple levels of hyponymy). This suggests that we too store semantic information in a way that is much like WordNet, because we only retain the most specific information needed to differentiate one particular concept from similar concepts. [2]
The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations between conceptual categories. In other words, WordNet can be interpreted and used as a lexical ontology in the computer science sense. An ontology in both Computer science and Information science is a formal representation of a set of concepts within a domain and the relationships between Computer science (or computing science) is the study and the Science of the theoretical foundations of Information and Computation and their However, such an ontology should normally be corrected before being used since it contains hundreds of basic semantic inconsistencies such as (i) the existence of common specializations for exclusive categories and (ii) redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve (i) distinguishing the specialization relations into subtypeOf and instanceOf relations, and (ii) associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2, most projects claiming to re-use WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply re-use it directly.
Unlike other dictionaries, WordNet does not include information about etymology, pronunciation and the forms of irregular verbs and contains only limited information about usage. Etymology is the study of the History of Words &mdash when they entered a language from what source and how their form and meaning have changed over time In contrast to Regular verbs irregular verbs are those Verbs that fall outside the standard patterns of conjugation in the Languages in which they
The actual lexicographical and semantical information is maintained in lexicographer files, which are then processed by a tool called grind to produce the distributed database. Both grind and the lexicographer files are freely available in a separate distribution, but modifying and maintaining the database requires expertise.
Though WordNet contains a sufficient wide range of common words, it does not cover special domain vocabulary. Since it is primarily designed to act as an underlying database for different applications, those applications cannot be used in specific domains that are not covered by WordNet.
WordNet has been used for a number of different purposes in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and even automatic crossword puzzle generation.
A project at Brown University started by Jeff Stibel, James A. Anderson, Steve Reiss and others called Applied Cognition Lab created a disambiguator using WordNet in 1998. Brown University is a highly esteemed private University located in Providence, Rhode Island and is a member of the Ivy League. Jeffrey Stibel is the President and CEO of Webcom, Inc and an Entrepreneur, having started numerous technology and marketing companies James (Jim A Anderson is a Professor of Cognitive Science and Brain Science at Brown University. [3] The project later morphed into a company called Simpli, which is now owned by ValueClick. Simpli was an early Search engine that offered Disambiguation to Search terms ValueClick ( is a Westlake Village CA -based Online advertising company which connects those wanting to host advertisements on websites with advertisers who are George Miller joined the Company as a member of the Advisory Board. Simpli built an Internet search engine that utilized a knowledgebase principally based on WordNet to disambiguate and expand keywords and synsets to help retrieve information online. WordNet was expanded upon to add increased dimensionality, such as intentionality (used for x), people (Albert Einstein) and colloquial terminology more relevant to Internet search (i. Albert Einstein ( German: ˈalbɐt ˈaɪ̯nʃtaɪ̯n; English: ˈælbɝt ˈaɪnstaɪn (14 March 1879 – 18 April 1955 was a German -born theoretical e. , blogging, ecommerce). Neural network algorithms searched the expanded WordNet for related terms to disambiguate search keywords (Java, in the sense of coffee) and expand the search synset (Coffee, Drink, Joe) to improve search engine results. Traditionally the term neural network had been used to refer to a network or circuit of biological neurons. [4] Before the company was acquired, it performed searches across search engines such as Google, Yahoo!, Ask.com and others. Google Inc is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online Askcom is a Search engine. It is a business division of IAC Search & Media, and was founded in 1996 by Garrett Gruener and David Warthen [5]
Another prominent example of the use of WordNet is to determine the similarity between words. Semantic similarity, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / Semantic Various algorithms have been proposed, and these include considering the distance between the conceptual categories of words, as well as considering the hierarchical structure of the WordNet ontology. A number of these WordNet-based word similarity algorithms are implemented in a Perl package called WordNet::Similarity. NOTES FOR EDITORS "Perl" is not an acronym (read the "Name" section below
Princeton maintains a list of related projects that includes links to some of the widely used application programming interfaces available for accessing WordNet using various programming languages and environments.
Other interfaces include the following:
The EuroWordNet project has produced WordNets for several European languages and linked them together; these are not freely available however. EuroWordNet is a system of Semantic networks for European languages based on Wordnet. The Global Wordnet project attempts to coordinate the production and linking of "wordnets" for all languages. Oxford University Press, the publisher of the Oxford English Dictionary, has voiced plans to produce their own online competitor to WordNet. The Oxford English Dictionary ( OED) published by the Oxford University Press (OUP is a comprehensive Dictionary of the English
The eXtended WordNet is a project at the University of Texas at Dallas which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. The eXtended WordNet is a project at the University of Texas at Dallas (and funded by the National Science Foundation) which aims to improve WordNet It is also freely available under a license similar to WordNet's.
The GCIDE project produces a dictionary by combining a public domain Webster's Dictionary from 1913 with some WordNet definitions and material provided by volunteers. GCIDE is the GNU version of the Collaborative International Dictionary of English. The public domain is a range of abstract materials &ndash commonly referred to as Intellectual property &ndash which are not owned or controlled by anyone Webster's Dictionary is the name given to a common type of English language dictionary in the United States. Year 1913 ( MCMXIII) was a Common year starting on Wednesday (link will display the full calendar of the Gregorian calendar (or a Common It is released under the copyleft license GPL. Copyleft is a play on the word Copyright and describes the practice of using copyright law to remove restrictions on distributing copies and modified versions
WordNet is also commonly re-used via mappings between the WordNet categories and the categories from other ontologies. Most often, only the top-level categories of WordNet are mapped. However, the authors of the SUMO ontology have produced a mapping between all of the WordNet synsets, (including nouns, verbs, adjectives and adverbs), and SUMO classes. The Suggested Upper Merged Ontology or SUMO is an upper ontology intended as a foundation ontology for a variety of computer information processing systems The Suggested Upper Merged Ontology or SUMO is an upper ontology intended as a foundation ontology for a variety of computer information processing systems The most recent addition of the mappings provides links to all of the more specific terms in the MId-Level Ontology (MILO), which extends SUMO. OpenCyc has 12,000 terms linked to WordNet synonym sets. Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and Database of everyday Common sense knowledge,
In most works that claim to have integrated WordNet into other ontologies, the content of WordNet has not simply been corrected when semantic problems have been encountered; instead, WordNet has been used as an inspiration source but heavily re-interpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was re-structured according to the OntoClean based approach or when WordNet was used as a primary source for constructing the lower classes of the SENSUS ontology. OntoClean is a methodology for analyzing ontologies based on formal domain-independent properties of classes (the metaproperties due to Nicola Guarino and Chris
FrameNet is a project similar to WordNet. FrameNet is a project housed at the International Computer Science Institute in Berkeley California which produces an electronic resource based on semantic frames. It consists of a lexicon which is based on annotating over 100,000 sentences with their semantic properties. The unit in focus is the lexical frame, a type of state or event together with the properties associated with it.
An independent project titled wordNet with an initial lowercase w is an ongoing project to links words and phrases via a custom Web crawler. A web crawler (also known as a web spider, web robot, or—especially in the FOAF community— web scutter) is a program or automated
Lexical markup framework (LMF) is a work in progress within ISO/TC37 in order to define a common standardized framework for the construction of lexicons, including WordNet. Lexical Markup Framework (LMF is a work in progress within International Organization for Standardization ISO/TC37 in order to define a common standardized framework Title Terminology and other language and content resources Scope Standardization of principles methods and applications relating to terminology and other language and content resources in the
The BalkaNet project has produced WordNets for six European languages (Bulgarian, Czech, Greek, Romanian, Turkish and Serbian). For this project, freely available XML-based WordNet editor was developed. This editor - VisDic - is not in active development anymore, but is still used for the creation of various WordNets. Its successor, DEBVisDic, is client-server application and is currently used for the editing of several WordNets (Dutch in Cornetto project, Polish, Hungarian, several African languages, Chinese).