| Weka | |
|---|---|
Weka 3. 5. 5 with Explorer window open with Iris UCI dataset | |
| Developed by | University of Waikato |
| Latest release | 3. The University of California Irvine is a public Coeducational Research university situated in Irvine, California. A software developer is a person or organization concerned with facets of the software development process wider than design and coding a somewhat broader scope of The University of Waikato ( Te Whare Wānanga o Waikato) is located in Hamilton and Tauranga, New Zealand, and was established in 1964 A software release is the distribution whether public or private of an initial or new and upgraded version of a Computer software product 4. 12 (book), 3. 5. 7 (developer) / December 18, 2007 |
| OS | Cross-platform |
| Genre | Machine Learning |
| License | GPL |
| Website | www.cs.waikato.ac.nz/~ml/weka/ |
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato. Machine learning is a subfield of Artificial intelligence that is concerned with the design and development of Algorithms and techniques that allow computers to "learn" The University of Waikato ( Te Whare Wānanga o Waikato) is located in Hamilton and Tauranga, New Zealand, and was established in 1964 WEKA is free software available under the GNU General Public License. Free software or software libre is Software that can be used studied and modified without restriction and which can be copied and redistributed in modified or unmodified
Contents |
The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. Data analysis is the process of looking at and summarizing Data with the intent to extract useful Information and develop conclusions Predictive modelling is the process by which a model is created or chosen to try to best predict the Probability of an outcome The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling algorithms implemented in other programming languages, plus data preprocessing utilities in C, and a Makefile-based system for running machine learning experiments. Tcl (originally from "Tool Command Language" but nonetheless conventionally rendered as "Tcl" rather than "TCL" pronounced as " tickle " In Computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured In Software development, make is a utility for automatically building executable programs from Source code. This original version was primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Year 1997 ( MCMXCVII) was a Common year starting on Wednesday (link will display full 1997 Gregorian calendar The main strengths of Weka are that it is
Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. Data mining is the process of Sorting through large amounts of data and picking out relevant information In Computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred In statistics regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a Dependent variable (response Feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique commonly All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. A Computer Database is a structured collection of records or data that is stored in a computer system Java Database Connectivity (JDBC is an API for the Java programming language that defines how a client may access a Database. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka[4]. Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modeling.
Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.
The Explorer interface has several panels that give access to the main components of the workbench. The Preprocess panel has facilities for importing data from a database, a CSV file, etc. A Computer Database is a structured collection of records or data that is stored in a computer system The comma separated list (CSL is a Data Format originally known as comma-separated values (CSV in the oldest days of simple computers , and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e. g. , turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria. The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc. Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred In statistics regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a Dependent variable (response Predictive modelling is the process by which a model is created or chosen to try to best predict the Probability of an outcome In Signal detection theory, a receiver operating characteristic ( ROC) or simply ROC curve, is a graphical plot of the sensitivity , or the model itself (if the model is amenable to visualization like, e. g. , a decision tree). In Operations research, specifically in Decision analysis, a decision tree (or tree diagram is a decision support tool that uses a graph or The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data. In Data mining, association rule learning is a popular andwell researched method for discovering interesting relations between variablesin large databases The Cluster panel gives access to the clustering techniques in Weka, e. g. , the simple k-means algorithm. The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions. An expectation-maximization ( EM) algorithm is used in Statistics for finding Maximum likelihood estimates of Parameters in probabilistic The normal distribution, also called the Gaussian distribution, is an important family of Continuous probability distributions applicable in many fields The next panel, Select attributes provides algorithms for identifying the most predictive attributes in a dataset. The last panel, Visualize, shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators. A scatter graph or scatter plot is a type of Display using Cartesian coordinates to display values for two Variables for a set of data
_logo.png)