Citizendia
Your Ad Here

Contents

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Computational linguistics is an Interdisciplinary field dealing with the statistical and/or rule-based modeling of Natural language from a computational Translation is the interpreting of the meaning of a text and the subsequent production of an equivalent text likewise called a " translation In the Philosophy of language, a natural language (or ordinary language) is a Language that is spoken or written in phonemic-alphabetic or phonemically-related At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. Corpus linguistics is the Study of language as expressed in Samples ( corpora) or "real world" text Linguistic Typology is an international Peer-reviewed journal in the field of Linguistic typology, founded in 1997 Recognition (re+ Cognition) is a process that occurs in Thinking when some event, Process, Pattern, or object recurrs An idiom is a Phrase whose meaning cannot be deduced from the literal Definition, but refers instead to a figurative meaning that is known only

Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. The term profession is applied to those persons who have specialized and technical skill or knowledge which they apply for a fee to certain tasks that ordinary and unqualified people cannot Meteorology (from Greek grc μετέωρος metéōros, "high in the sky" and grc -λογία -logia) is the Interdisciplinary This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.

History

The history of machine translation begins in the 1950s, after World War II. The history of machine translation generally starts in the 1950s although work can be found from earlier periods World War II, or the Second World War, (often abbreviated WWII) was a global military conflict which involved a majority of the world's nations, including The Georgetown experiment (1954) involved fully-automatic translation of over sixty Russian sentences into English. The Georgetown-IBM experiment was an influential demonstration of Machine translation, which was performed during January 7 1954. Russian ( transliteration:,) is the most geographically widespread language of Eurasia, the most widely spoken of the Slavic languages English is a West Germanic language originating in England and is the First language for most people in the United Kingdom, the United States The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.

Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. ALPAC (Automatic Language Processing Advisory Committee was a committee of seven scientists led by John R Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation. Computation is a general term for any type of Information processing. Statistical machine translation ( SMT) is a Machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived

The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth and possibly others. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (London Univ. ) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.

Recently, Internet has emerged as global information infrastructure, revolutionizing access to any information, as well as fast information transfer and exchange. Using Internet and e-mail technology, people need to communicate rapidly over long distances across continent boundaries. Not all of these Internet users, however, can use their own language for global communication to different people with different languages. Therefore, using machine translation software, people can possibly communicate and contact one to another around the world in their own mother tongue, in the near future. [1]

Translation process

Main article: Translation process

The translation process may be stated as:

  1. Decoding the meaning of the source text; and
  2. Re-encoding this meaning in the target language. The translation process is an activity during which a person (the translator) establishes equivalences between a text, or segments of a text and another Language The translation process is an activity during which a person (the translator) establishes equivalences between a text, or segments of a text and another Language Decoding is the reverse of Encoding, which is the process of transforming information from one format into another This article is about meaning as it is studied in the discipline of linguistics A source text is a text (sometimes Oral) from which information or ideas are derived This article is about meaning as it is studied in the discipline of linguistics

Behind this ostensibly simple procedure lies a complex cognitive operation. Cognition is a concept used in different ways by different disciplines but is generally accepted to mean the process of awareness or thought To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc. A source text is a text (sometimes Oral) from which information or ideas are derived Grammar is the field of Linguistics that covers the Rules governing the use of any given natural language. Semantics is the study of meaning in communication The word derives from Greek σημαντικός ( semantikos) "significant" from In Linguistics, syntax (from Ancient Greek grc συν- syn-, "together" and grc τάξις táxis, "arrangement" is the An idiom is a Phrase whose meaning cannot be deduced from the literal Definition, but refers instead to a figurative meaning that is known only , of the source language, as well as the culture of its speakers. Translation is the interpreting of the meaning of a text and the subsequent production of an equivalent text likewise called a " translation Culture (from the Latin cultura stemming from colere, meaning "to cultivate" generally refers to patterns of human activity and the symbolic The translator needs the same in-depth knowledge to re-encode the meaning in the target language.

Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that "sounds" as if it has been written by a person.

This problem may be approached in a number of ways.

Approaches

Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.
Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation. Interlingual machine translation is one of the classic approaches to Machine translation.

Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language. An expert system is Software that attempts to reproduce the performance of one or more human Experts most commonly in a specific Problem domain, and is

It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first. Natural language processing ( NLP) is a subfield of Artificial intelligence and Computational linguistics.

Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. Interlingual machine translation is one of the classic approaches to Machine translation. Transfer-based machine translation is a type of Machine translation, it is based on the idea of Interlingua and is currently one of the most widely used methods of These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. In Linguistics, the lexicon (from Greek Λεξικόν of a language is its Vocabulary, including its words and expressions Morphology is the field of Linguistics that studies the internal structure of words In Linguistics, syntax (from Ancient Greek grc συν- syn-, "together" and grc τάξις táxis, "arrangement" is the Semantics is the study of meaning in communication The word derives from Greek σημαντικός ( semantikos) "significant" from

Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. A first language (also mother tongue, native language, arterial language, or L1) is the language a human being learns from birth The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. In Linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.

Rule-based

The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.

Main article: Rule-based machine translation

Transfer-based machine translation

Interlingual

Interlingual machine translation is one instance of rule-based machine-translation approaches. Transfer-based machine translation is a type of Machine translation, it is based on the idea of Interlingua and is currently one of the most widely used methods of Interlingual machine translation is one of the classic approaches to Machine translation. In this approach, the source language, i. e. the text to be translated, is transformed into an interlingual, i. e. source-/target-language-independent representation. The target language is then generated out of the interlingua. Interlinguistics is the study of various aspects of international Communication.

Dictionary-based

Machine translation can use a method based on dictionary entries, which means that the words will be translated as they are by a dictionary. Machine translation can use a method based on Dictionary entries which means that the words will be translated as a dictionary does &mdash word by word usually without much A dictionary is a book of alphabetically listed Words in a specific language with definitions etymologies pronunciations and other information or a book of alphabetically

Statistical

Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Statistical machine translation ( SMT) is a Machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data. Hansard is the traditional name for the printed transcripts of Parliamentary debates in the Westminster system of Government. The European Parliament ( Europarl or EP) is the only directly elected parliamentary institution of the European Union (EU Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Google used SYSTRAN for several years, but has switched to a statistical translation method in October 2007. SYSTRAN, founded by Dr Peter Toma in 1968, is one of the oldest Machine translation companies Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. The United Nations ( UN) is an International organization whose stated aims are to facilitate cooperation in International law, International security Accuracy of the translation has improved. [2]

Example-based

Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. The Example-based machine translation ( EBMT) approach to Machine translation is often characterized by its use of a bilingual Corpus with Parallel It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning. Analogy is both the cognitive process of transferring Information from a particular subject (the analogue or source to another particular subject (the target and Case-based reasoning (CBR broadly construed is the process of solving new problems based on the solutions of similar past problems Machine learning is a subfield of Artificial intelligence that is concerned with the design and development of Algorithms and techniques that allow computers to "learn"

Major issues

Disambiguation

Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel [3]. Yehoshua Bar-Hillel (יהושע בר-הלל born 1915 in Vienna; died 1975 in Jerusalem) was a Philosopher, Mathematician, and linguist He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word [4]. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.

Named entities

Related to named entity recognition in information extraction. Named entity recognition (NER (also known as entity identification (EI and entity extraction) is a subtask of Information extraction that seeks to locate In Natural language processing, information extraction (IE is a type of Information retrieval whose goal is to automatically extract structured information i

Applications

There are now many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and AltaVista's Babel Fish. SYSTRAN, founded by Dr Peter Toma in 1968, is one of the oldest Machine translation companies Google Inc is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online Babel Fish is a web -based application on Yahoo! that machine translates text or web pages from one of several languages into another Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output.

Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. The European Commission (formally the Commission of the European Communities) is the executive branch of the European Union.

Toggletext uses a transfer-based system (known as Kataku) to translate between English and Indonesian. English is a West Germanic language originating in England and is the First language for most people in the United Kingdom, the United States Indonesian or Bahasa Indonesia, based on the Riau version of Malay language, was declared the official language with the declaration of

Google has claimed that promising results were obtained using a proprietary statistical machine translation engine [5]. Google Inc is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English has an overall score of 0. This page is a summary of services and tools provided by Google Inc 4281 over the runner-up IBM's BLEU-4 score of 0. 3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology. [6] [7] [8] Uwe Muegge has implemented a demo website [9] that uses a controlled language in combination with the Google tool to produce fully automatic, high-quality machine translations of his English, German, and French web sites. Uwe Muegge (pronounced mygə}} (b 1960 is an innovator and educator in the field of Translation. Controlled natural languages (CNLs are subsets of natural languages obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate Ambiguity and complexity This page is a summary of services and tools provided by Google Inc

With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. In-Q-Tel [10] (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Venture capital (also known as VC or Venture) is a type of Private equity capital typically provided to immature high-potential growth companies Language Weaver is a Los Angeles California –based company that was founded in 2002 by the University of Southern California 's Kevin Knight and Daniel Marcu Currently the military community is interested in translation and processing of languages like Arabic, Pashto, and Dari. Arabic (ar الْعَرَبيّة (informally ar عَرَبيْ) in terms of the number of speakers is the largest living member of the Semitic language Pashto ( Naskh: پښتو‎ pəʂ'to also rendered as Pakhto, Pushto, Pukhto, Pashtu, Pushtu, also known as Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon Translator. The Defense Advanced Research Projects Agency (DARPA is an agency of the United States Department of Defense responsible for the development of new Technology TIDES is an ambitious technology development effort funded by DARPA. Babylon is a single-click translation and information source utility program US Air Force has awarded a $1 million contract to develop a language translation technology. [11]

Evaluation

There are various means for evaluating the performance of machine-translation systems. Various methods for the evaluation for machine translation have been employed The oldest is the use of human judges to assess a translation's quality. More recent, automated means of evaluation include BLEU, NIST and METEOR. Automation ( Ancient Greek: = self dictated) roboticization or industrial automation or Numerical control is the use of Control systems BLEU ( Bilingual Evaluation Understudy) is a method NIST is a method for evaluating the quality of text which has been translated using Machine translation. METEOR ( Metric for Evaluation of Translation with Explicit

Relying exclusively on machine translation ignores that communication in human language is context-embedded, and that it takes a human to adequately comprehend the context of the original text. In the Philosophy of language, a natural language (or ordinary language) is a Language that is spoken or written in phonemic-alphabetic or phonemically-related Even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human.

It has, however, been asserted that in certain applications, e. g. product descriptions written in a controlled language, a dictionary-based machine translation system has, in a production environment, produced perfect translation results that require no human intervention. Controlled natural languages (CNLs are subsets of natural languages obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate Ambiguity and complexity Machine translation can use a method based on Dictionary entries which means that the words will be translated as a dictionary does &mdash word by word usually without much [12]

See also

References

  1. ^ Hary Gunarto, Building Dictionary as Basic Tool for Machine Translation in Natural Language Processing Applications, Journal of Ritsumeikan Studies in Language and Culture, VOL 15, No 3, Kyoto, February 2004, pp. A Machine translation application is a program which can translate text or speech from one Natural language to another Computational linguistics is an Interdisciplinary field dealing with the statistical and/or rule-based modeling of Natural language from a computational Computer-assisted translation, computer-aided translation, or CAT is a form of Translation wherein a human translator translates texts using Computer Controlled natural languages (CNLs are subsets of natural languages obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate Ambiguity and complexity The history of machine translation generally starts in the 1950s although work can be found from earlier periods Language technology is often called human language technology (HLT or Natural language processing (NLP and consists of Computational linguistics This is a list of emerging technologies. Emerging technologies are new and potentially Disruptive technologies, which may marginalize an existing dominant technology The following is a list of Research laboratories that focus on Machine translation. Translation is the interpreting of the meaning of a text and the subsequent production of an equivalent text likewise called a " translation The universal translator is a fictional device common to many Science fiction works especially on television 177-185.
  2. ^ Google Translator: The Universal Language
  3. ^ Milestones in machine translation - No.6: Bar-Hillel and the nonfeasibility of FAHQT by John Hutchins
  4. ^ Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf
  5. ^ Google Blog: The machines do the translating (by Franz Och)
  6. ^ Geer, David, "Statistical Translation Gains Respect", pp. 18 - 21, IEEE Computer, October 2005
  7. ^ Ratcliff, Evan "Me Translate Pretty One Day", Wired December 2006
  8. ^ "NIST 2006 Machine Translation Evaluation Official Results", November 1, 2006
  9. ^ This demo website uses a controlled language in combination with the Google engine
  10. ^ In-Q-Tel
  11. ^ GCN — Air force wants to build a universal translator
  12. ^ Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16-17 November 2006, London, London: Aslib. ISBN 978-0-85142-483-5.

External links

Software

Dictionary

machine translation

-noun

  1. The act of translating something by means of a machine, especially a computer.
  2. The act of translating a computer language into a form more directly usable by the computer.
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic