Citizendia

Weka

Weka 3. 5. 5 with Explorer window open with Iris UCI dataset
Developed byUniversity of Waikato
Latest release3. The University of California Irvine is a public Coeducational Research university situated in Irvine, California. A software developer is a person or organization concerned with facets of the software development process wider than design and coding a somewhat broader scope of The University of Waikato ( Te Whare Wānanga o Waikato) is located in Hamilton and Tauranga, New Zealand, and was established in 1964 A software release is the distribution whether public or private of an initial or new and upgraded version of a Computer software product 4. 12 (book), 3. 5. 7 (developer) / December 18, 2007
OSCross-platform
GenreMachine Learning
LicenseGPL
Websitewww.cs.waikato.ac.nz/~ml/weka/

Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato. Machine learning is a subfield of Artificial intelligence that is concerned with the design and development of Algorithms and techniques that allow computers to "learn" The University of Waikato ( Te Whare Wānanga o Waikato) is located in Hamilton and Tauranga, New Zealand, and was established in 1964 WEKA is free software available under the GNU General Public License. Free software or software libre is Software that can be used studied and modified without restriction and which can be copied and redistributed in modified or unmodified

Contents

Description

The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. Data analysis is the process of looking at and summarizing Data with the intent to extract useful Information and develop conclusions Predictive modelling is the process by which a model is created or chosen to try to best predict the Probability of an outcome The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling algorithms implemented in other programming languages, plus data preprocessing utilities in C, and a Makefile-based system for running machine learning experiments. Tcl (originally from "Tool Command Language" but nonetheless conventionally rendered as "Tcl" rather than "TCL" pronounced as " tickle " In Computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured In Software development, make is a utility for automatically building executable programs from Source code. This original version was primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Year 1997 ( MCMXCVII) was a Common year starting on Wednesday (link will display full 1997 Gregorian calendar The main strengths of Weka are that it is

Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. Data mining is the process of Sorting through large amounts of data and picking out relevant information In Computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred In statistics regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a Dependent variable (response Feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique commonly All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. A Computer Database is a structured collection of records or data that is stored in a computer system Java Database Connectivity (JDBC is an API for the Java programming language that defines how a client may access a Database. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka[4]. Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modeling.

Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.

The Explorer interface has several panels that give access to the main components of the workbench. The Preprocess panel has facilities for importing data from a database, a CSV file, etc. A Computer Database is a structured collection of records or data that is stored in a computer system The comma separated list (CSL is a Data Format originally known as comma-separated values (CSV in the oldest days of simple computers , and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e. g. , turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria. The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc. Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred In statistics regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a Dependent variable (response Predictive modelling is the process by which a model is created or chosen to try to best predict the Probability of an outcome In Signal detection theory, a receiver operating characteristic ( ROC) or simply ROC curve, is a graphical plot of the sensitivity , or the model itself (if the model is amenable to visualization like, e. g. , a decision tree). In Operations research, specifically in Decision analysis, a decision tree (or tree diagram is a decision support tool that uses a graph or The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data. In Data mining, association rule learning is a popular andwell researched method for discovering interesting relations between variablesin large databases The Cluster panel gives access to the clustering techniques in Weka, e. g. , the simple k-means algorithm. The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions. An expectation-maximization ( EM) algorithm is used in Statistics for finding Maximum likelihood estimates of Parameters in probabilistic The normal distribution, also called the Gaussian distribution, is an important family of Continuous probability distributions applicable in many fields The next panel, Select attributes provides algorithms for identifying the most predictive attributes in a dataset. The last panel, Visualize, shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators. A scatter graph or scatter plot is a type of Display using Cartesian coordinates to display values for two Variables for a set of data

History

See also

References

  1. ^ Ian H. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians RapidMiner (formerly YALE (Yet Another Learning Environment is an environment for Machine learning and Data mining experiments Listed here are a number of computer programs used for performing numerical calculations acslX is a software application for modeling and evaluating the performance Data mining is the process of Sorting through large amounts of data and picking out relevant information Witten; Eibe Frank (2005). Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco. Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians
  2. ^ G. Holmes; A. Donkin and I. H. Witten (1994). Weka: A machine learning workbench. Proc Second Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia. Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians
  3. ^ S. R. Garner; S. J. Cunningham, G. Holmes, C. G. Nevill-Manning, and I. H. Witten (1995). Applying a machine learning workbench: Experience with agricultural databases. Proc Machine Learning in Practice Workshop, Machine Learning Conference, Tahoe City, CA, USA 14-21. Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians
  4. ^ P. Reutemann; B. Pfahringer and E. Frank (2004). Proper: A Toolbox for Learning from Relational Data with Propositional and Multi-Instance Learners. 17th Australian Joint Conference on Artificial Intelligence (AI2004). Springer-Verlag. Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians
  5. ^ Ian H. Witten; Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham (1999). Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems 192-196. Retrieved on 2007-06-26. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 363 - Roman Emperor Julian is killed during the retreat from the Sassanid Empire.
  6. ^ Gregory Piatetsky-Shapiro (2005-06-28). Year 2005 ( MMV) was a Common year starting on Saturday (link displays full calendar of the Gregorian calendar. Events 1098 - Fighters of the First Crusade defeat Kerbogha of Mosul. KDnuggets news on SIGKDD Service Award 2005. Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians
  7. ^ Overview of SIGKDD Service Award winners (2005). Retrieved on 2007-06-25. Year 2007 ( MMVII) was a Common year starting on Monday of the Gregorian calendar in the 21st century. Events 524 - Battle of Vézeronce, the Franks defeat the Burgundians

External links

General

Examples of applications

Extended versions

Quantitative structure-activity relationship (QSAR is the process by which Chemical structure is quantitatively correlated with a well defined process such as An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either Protein -coding or not
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic