Citizendia
Your Ad Here

Unstructured data (or unstructured information) refers to masses of (usually) computerized information which do either not have a data structure or one that is not easily readable by a machine. A data structure in Computer science is a way of storing Data in a computer so that it can be used efficiently The term is imprecise: software that creates machine-processable structure exploits word morphology, sentence syntax, and other small- and large-scale patterns found in source materials to discern linguistic, auditory, and visual structure that is inherent in all forms of human communication. [1] Examples of "unstructured data" may include audio, video and unstructured text such as the body of an email or word processor document. Sound' is Vibration transmitted through a Solid, Liquid, or Gas; particularly sound means those vibrations composed of Frequencies Video is the technology of electronically capturing, Recording, processing storing transmitting and reconstructing a sequence of Still images Electronic mail, often abbreviated to e-mail, email, or originally eMail, is a Store-and-forward method of writing sending receiving

Merrill Lynch estimates that more than 85% of all potentially usable business information originates in unstructured form. Merrill Lynch & Co Inc () is a global financial services firm [2]

Data with some form of structure may also be referred to as unstructured data if the structure is not helpful for the desired processing task. For example, an HTML Web page is tagged, but this form of structure is typically oriented towards formatting rather than capturing the meaning or function of the tagged elements in was that support automated processing of the information content of the page. HTML, an initialism of HyperText Markup Language, is the predominant Markup language for Web pages It provides a means to describe the structure

A lot of the unstructured data is noisy text. Noise in text can be defined as any kind of difference between the surface form of a coded representation of the text and the intended correct or original text Spontaneous communication for example emails, SMS, blogs, web pages contain noisy text and processing noise for example from automatic speech recognition produce noisy text. Noise in text is defined as any kind of difference between the surface form of a coded representation of the text and the intended, correct, or original text.

Contents

Dealing with unstructured data

Data mining and text analytics and noisy text analytics techniques are different methods used to find patterns in, or otherwise interpret, this information. Data mining is the process of Sorting through large amounts of data and picking out relevant information The term text analytics describes a set of linguistic lexical pattern recognitionextraction tagging/structuring visualization and predictive techniques Noisy text analytics is a process of Information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text Common techniques for structuring text usually involve manual tagging with metadata or Part-of-speech tagging for further text mining-based structuring. A tag is a non-hierarchical keyword or term assigned to a piece of information (such as an Internet bookmark, digital image or Computer file) Part-of-speech tagging ( POS tagging or POST) also called grammatical tagging or word-category disambiguation, is the process of marking up the Text mining, sometimes alternately referred to as text Data mining, roughly equivalent to Text analytics, refers generally to the process UIMA provides a common framework for processing this information to extract meaning and create structured data about the information. UIMA stands for Unstructured Information Management Architecture.

Notes

  1. ^  Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005. The term " Intelligent Enterprise " refers to a management approach that applies technology and new service paradigms to the challenge of improving business performance
  2. ^  The problem with unstructured data, DMReview, February 2003.

See also

External links

UIMA stands for Unstructured Information Management Architecture. Data mining is the process of Sorting through large amounts of data and picking out relevant information Metadata ( meta data, or sometimes metainformation) is "data about data" of any sort in any media Noise in text can be defined as any kind of difference between the surface form of a coded representation of the text and the intended correct or original text
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic