Citizendia

Biclustering, co-clustering, or two-mode clustering[1] is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. Data mining is the process of Sorting through large amounts of data and picking out relevant information Clustering is the classification of objects into different groups or more precisely the partitioning of a Data set into Subsets (clusters The term was first introduced by Mirkin[2] (recently by Cheng and Church[3] in gene expression analysis), although the technique was originally introduced much earlier [2] (i. Gene expression is the process by which inheritable information from a Gene, such as the DNA sequence, is made into a functional Gene product, such e. , by J. A. Hartigan[4]).

Given a set of m rows in n columns (i. e. , an m×n matrix), the biclustering algorithm generates biclusters - a subset of rows which exhibit similar behavior across a subset of columns, or vice versa.

Contents

Complexity

The complexity of the biclustering problem depends on the exact problem formulation, and particularly on the merit function used to evaluate the quality of a given bicluster. However most interesting variants of this problem are NP-complete requiring either large computational effort or the use of lossy heuristics to short-circuit the calculation.

Type of Bicluster

Different biclustering algorithms have different definitions of bicluster.

They are:

  1. Bicluster with constant values (a),
  2. Bicluster with constant values on rows or columns (b, c),
  3. Bicluster with coherent values (d, e).

Image:bicluster.JPG

Algorithms

There are many biclustering algorithm developed for bioinformatics, including: Block clustering, CTWC, ITWC, δ-bicluster, δ-pCluster, δ-pattern, FLOC, OPC, Plaid Model, OPSMs, Gibbs, SAMBA, Robust Biclustering Algorithm (RoBA), Crossing Minimization, cMonkey[5], PRMs and DCC. Bioinformatics is the application of information technology to the field of molecular biology Biclustering algorithms have also been proposed and used in other application fields under the names coclustering, biodimentional clustering, and subspace clustering[6].

Some recent algorithms have attempted to include additional support for biclustering rectangular matricies in the form of other datatypes. One such algorithm, cMonkey, has been recently developed and applied to several systems-biology datasets.

There is an ongoing debate about how to judge the results of these methods, as biclustering allows overlap between clusters and some algorithms allow the exclusion of hard to reconcile columns/conditions. Not all of the available algorithms are deterministic and you need to pay attention to the degree to which results represent stable minima. Because this is an unsupervised classification problem, the lack of gold standard makes it difficult to spot errors in the results. One approach is to utilize multiple biclustering algorithms, with majority or super-majority voting amongst them deciding the best result. Another way is to analyse the quality of shifting and scaling patterns in biclusters[7].

See also

References

  1. ^ Van Mechelen I, Bock HH, De Boeck P (2004). Formal concept analysis is a principled way of automatically deriving an ontology from a collection of objects and their properties In the Mathematical field of Graph theory, a complete bipartite graph or biclique is a special kind of Bipartite graph where every In Mathematics, especially in Order theory, a Galois connection is a particular correspondence between two Partially ordered sets (posets "Two-mode clustering methods:a structured overview". Statistical Methods in Medical Research 13 (5): 363-94. doi:10.1191/0962280204sm373ra. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.  
  2. ^ a b Mirkin, Boris (1996). Mathematical Classification and Clustering. Kluwer Academic Publishers. ISBN 0792341597.  
  3. ^ Cheng Y, Church GM (2000). "Biclustering of expression data". Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology: 93–103.  
  4. ^ Hartigan JA (1972). "Direct clustering of a data matrix". Journal of the American Statistical Association 67 (337): 123-9. doi:10.2307/2284710. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.  
  5. ^ Reiss DJ, Baliga NS, Bonneau R (2006). "Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks". BMC Bioinformatics 2 (7): 280-302. doi:10.1186/1471-2105-7-280. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.  
  6. ^ Madeira SC, Oliveira AL (2004). "Biclustering Algorithms for Biological Data Analysis: A Survey". IEEE Transactions on Computational Biology and Bioinformatics 1 (1): 24-45. doi:10.1109/TCBB.2004.2. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.  
  7. ^ Aguilar-Ruiz JS (2005). "Shifting and scaling patterns from gene expression data". Bioinformatics 21 (10): 3840-3845. doi:10.1093/bioinformatics/bti641. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.  

© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org