Citizendia
Your Ad Here

Quantitative structure-activity relationship (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. Chemical structure refers to Molecular geometry, Electronic structure and Crystal structure. In Probability theory and Statistics, correlation, (often measured as a correlation coefficient) indicates the strength and direction of a linear Pharmacological or biological activity is an expression describing the beneficial or adverse effects of a Drug on living matter.

For example, biological activity can be expressed quantitatively as in the concentration of a substance required to give a certain biological response. Additionally, when physiochemical properties or structures are expressed by numbers, one can form a mathematical relationship, or quantitative structure-activity relationship, between the two. The mathematical expression can then be used to predict the biological response of other chemical structures.

QSAR's most general mathematical form is:

Activity = f(physiochemical properties and/or structural properties)

Contents

SAR and SAR paradox

The basic assumption for all molecule based hypotheses is that similar molecules have similar activities. A hypothesis (from Greek) consists either of a suggested explanation for a phenomenon (an event that is observable or of a reasoned proposal suggesting a possible This principle is also called Structure-Activity Relationship (SAR). The underlying problem is therefore how to define a small difference on a molecular level, since each kind of activity, e. g. reaction ability, biotransformation ability, solubility, target activity, and so on, might depend on another difference. A chemical reaction is a process that always results in the interconversion of Chemical substances The substance or substances initially involved in a chemical reaction are called Biotransformation is the chemical modification (or modifications made by an organism on a chemical compound Solubility is the characteristic Physical property referring to the ability of a given substance the Solute, to dissolve in a Solvent. A good example was given in the bioisosterism review of Patanie/LaVoie. In Medicinal chemistry, bioisosteres are Substituents or groups with similar physical or chemical properties that impart similar biological properties to a chemical [1]

In general, one is more interested in finding strong trends. Created hypotheses usually rely on a finite number of chemical data. A hypothesis (from Greek) consists either of a suggested explanation for a phenomenon (an event that is observable or of a reasoned proposal suggesting a possible In Mathematics, a set is called finite if there is a Bijection between the set and some set of the form {1 2. Thus, the induction principle should be respected to avoid overfitted hypotheses and deriving overfitted and useless interpretations on structural/molecular data. Induction or inductive reasoning, sometimes called inductive logic, is the process of Reasoning in which the premises of an argument are believed For the machine learning concept see Overfitting (machine learning In Statistics, overfitting is fitting a Statistical model

The SAR paradox refers to the fact that it is not the case that all similar molecules have similar activities.

Applications

Chemical

One of the first historical QSAR applications was to predict boiling points. History is the study of the past particularly the written record Those who study history as a Profession are called Historians Etymology The boiling point of a liquid is the temperature at which the Vapor pressure of the liquid equals the environmental pressure surrounding the liquid [2]

It is well known for instance that within a particular family of chemical compounds, especially of organic chemistry, that there are strong correlations between structure and observed properties. Chemical classification systems attempt to classify elements or compounds according to certain chemical functional or structural properties A chemical compound is a substance consisting of two or more different elements chemically bonded together in a fixed proportion by Mass. Organic chemistry is a discipline within Chemistry which involves the scientific study of the structure properties composition reactions, and preparation In Probability theory and Statistics, correlation, (often measured as a correlation coefficient) indicates the strength and direction of a linear A simple example is the relationship between the number of carbons in alkanes and their boiling points. Alkanes, also known as Paraffins are Chemical compounds that consist only of the elements Carbon (C and Hydrogen (H (i The boiling point of a liquid is the temperature at which the Vapor pressure of the liquid equals the environmental pressure surrounding the liquid There is a clear trend in the increase of boiling point with an increase in the number carbons and this serves as a means for predicting the boiling points of higher alkanes. Alkanes, also known as Paraffins are Chemical compounds that consist only of the elements Carbon (C and Hydrogen (H (i

A still very interesting application is the Hammett equation, Taft equation and pKa prediction methods. Hammett equation in Organic chemistry describes a Free-energy relationship relating Reaction rates and Equilibrium constants for many reactions The Taft equation is similar to the Hammett equation in that it relates the varying activities of a group of analogous compounds given changes in substituents

Biological

The biological activity of molecules is usually measured in assays to establish the level of inhibition of particular signal transduction or metabolic pathways. An assay is a procedure where a property or concentration of an Analyte is measured In Biology, signal transduction refers to any process by which a cell converts one kind of signal or stimulus into another In Biochemistry, a metabolic pathway is a series of chemical reactions occurring within a cell. Chemicals can also be biologically active by being toxic. Toxicity is the degree to which a substance is able to damage an exposed organism Drug discovery often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific targets and have low toxicity (non-specific activity). In Medicine, Biotechnology and Pharmacology, drug discovery is the process by which drugs are discovered and/or designed A biological target is a Biopolymer such as a Protein or Nucleic acid whose activity can be modified by an external stimulus Toxicity is the degree to which a substance is able to damage an exposed organism Of special interest is the prediction of partition coefficient log P, which is an important measure used in identifying "druglikeness" according to Lipinski's Rule of Five. In the fields of organic and Medicinal chemistry, a partition (P or distribution coefficient (D is Druglikeness is a qualitative concept used in Drug design for how "druglike" a substance is Lipinski's Rule of Five is a Rule of thumb to evaluate Druglikeness, or determine if a Chemical compound with a certain pharmacological or

While many quantitative structure activity relationship analyses involve the interactions of a family of molecules with an enzyme or receptor binding site, QSAR can also be used to study the interactions between the structural domains of proteins. Enzymes are Biomolecules that catalyze ( ie increase the rates of Chemical reactions Almost all enzymes are Proteins In Biochemistry, a receptor is a Protein molecule embedded in either the Plasma membrane or Cytoplasm of a cell to which a mobile signaling A protein domain is a part of protein sequence and structure that can evolve, function and exist independently of the rest of the protein chain Protein-protein interactions can be quantitatively analyzed for structural variations resulted from site-directed mutagenesis. Site-directed Mutagenesis is a Molecular biology technique in which a Mutation is created at a defined site in a DNA molecule usually a circular [3].

It is part of the machine learning method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also MVUE). Machine learning is a subfield of Artificial intelligence that is concerned with the design and development of Algorithms and techniques that allow computers to "learn" In Statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator (often abbreviated as UMVU or MVUE is an unbiased estimator that In general all QSAR problems can be divided into a coding[4] and learning[5] part. In the fields of Neuropsychology, Personal development and Education, Learning is one of the most important Mental function of humans

Data mining

For the coding usually a relatively large number of features or molecular descriptors is calculated, which can lack structural interpretation ability. In combination with the later applied learning method or as preprocessing step occurs a feature selection problem. Feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique commonly

A typical data mining based prediction uses e. Data mining is the process of Sorting through large amounts of data and picking out relevant information g. support vector machines, decision trees, neural networks for inducing a predictive learning model. Support vector machines ( SVMs) are a set of related Supervised learning methods used for classification and regression. In Operations research, specifically in Decision analysis, a decision tree (or tree diagram is a decision support tool that uses a graph or Traditionally the term neural network had been used to refer to a network or circuit of biological neurons. Induction or inductive reasoning, sometimes called inductive logic, is the process of Reasoning in which the premises of an argument are believed

3D-QSAR

3D-QSAR refers to the application of force field calculations requiring three-dimensional structures, e. In the context of Molecular mechanics, a force field (also called a forcefield) refers to the functional form and Parameter sets used g. based on protein crystallography or molecule superposition. Crystallography is the experimental science of determining the arrangement of Atoms in Solids In older usage it is the scientific study of Crystals The It uses computed potentials, e. g. the Lennard-Jones potential, rather than experimental constants and is concerned with the overall molecule rather than a single substituent. A pair of neutral atoms or molecules is subject to two distinct forces in the limit of large separation and small separation an attractive force at long ranges ( van der Waals force, or It examines the steric fields (shape of the molecule) and the electrostatic fields based on the applied energy function. [6]

The created data space is then usually reduced by a following feature extraction (see also dimensionality reduction). In Pattern recognition and in Image processing, Feature extraction is a special form of Dimensionality reduction. In Statistics, dimension reduction is the process of reducing the number of random variables under consideration and can be divided into Feature selection and The following learning method can be any of the already mentioned machine learning methods, e. Machine learning is a subfield of Artificial intelligence that is concerned with the design and development of Algorithms and techniques that allow computers to "learn" g. support vector machines. Support vector machines ( SVMs) are a set of related Supervised learning methods used for classification and regression. [7]

In the literature it can be often found that chemists have a preference for partial least squares (PLS) methods, since it applies the feature extraction and induction in one step. In Statistics, the method of partial least squares regression (PLS-regression bears some relation to Principal component analysis; instead of finding the Hyperplanes In Pattern recognition and in Image processing, Feature extraction is a special form of Dimensionality reduction. Induction or inductive reasoning, sometimes called inductive logic, is the process of Reasoning in which the premises of an argument are believed

Molecule mining

Molecule mining approaches, a special case of structured data mining approaches, apply a similarity matrix based prediction or an automatic fragmentation scheme into molecular substructures. This page describes mining for Molecules. Since molecules may be represented by Molecular graphs this is strongly related to Graph mining and Structure mining or Structured data mining is the process of finding and extracting useful information from semi structured datasets Furthermore there exist also approaches using maximum common subgraph searches or graph kernels. In complexity theory, maximum common subgraph-isomorphism (MCS is an Optimization problem that is known to be NP-hard. Structure mining or Structured data mining is the process of finding and extracting useful information from semi structured datasets [8] [9]

Fragment based (group contribution)

It has been shown that the logP of compound can be determined by the sum of its fragments. In the fields of organic and Medicinal chemistry, a partition (P or distribution coefficient (D is Fragmentary logP values have been determined statistically. This method gives mixed results and is generally not trusted to have accuracy of more than +/- 0. 1 units. [10]

Applicability Domain

As the use of (Q)SAR models for chemical risk management increases steadily and is also used for regulatory purposes (in the EU: Registration, Evaluation, Authorisation and Restriction of Chemicals), it is of crucial importance to be able to assess the reliability of predictions. Registration Evaluation Authorisation and restriction of CHemicals ( REACH) is a new European Union Regulation, EC/2006/1907 of 18 December 2006 The chemical descriptor space spanned by a particular training set of chemicals is called Applicability Domain. It offers the opportunity to assess whether a compound can be reliably predicted.

See also

References

  1. ^ G. Structure-activity relationship s ( SAR) are the traditional practices of Medicinal chemistry which try to modify the effect or the potency (i Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques applied to a range of problems ADME is an Acronym in Pharmacokinetics and Pharmacology for '''a'''bsorption, '''d'''istribution, '''m'''etabolism In the fields of organic and Medicinal chemistry, a partition (P or distribution coefficient (D is In Physics, Chemistry, and Biology, intermolecular forces are forces that act between stable Molecules or between functional groups of Pharmacokinetics (in Greek: “pharmacon” meaning drug and “kinetikos” meaning putting in motion the study of time dependency sometimes abbreviated as “PK” is a A pharmacophore was first defined by Paul Ehrlich in 1909 as "a molecular framework that carries ( phoros) the essential features responsible for a In the fields of organic and Medicinal chemistry, a partition (P or distribution coefficient (D is Protein structure prediction is one of the most important goals pursued by Bioinformatics and Theoretical chemistry. The QSAR & Combinatorial Science (usually abbreviated as QSAR Comb For a broader class of publications which include scientific journals see Academic journal. Short list of molecular mechanics programs Min - Optimization MD - Molecular Dynamics MC - Monte Carlo QM - Quantum mechanics A. Patani, E. J. LaVoie, Bioisosterism: A Rational Approach in Drug Design. Chem. Rev. , 1996, 96, 3147-3176. doi:10.1021/cr950066q
  2. ^ Danail Bonchev, D. A digital object identifier ( DOI) is a permanent identifier given to an Electronic document. H. Rouvray: Chemical Graph Theory: Introduction and Fundamentals. Chemical graph theory is a branch of Mathematical chemistry which applies Graph theory to Mathematical modelling of chemical phenomena Gordon and Breach Science Publishers, 1990, ISBN 0-85626-454-7.
  3. ^ E. K. Freyhult, K. Andersson, M. G. Gustafsson, Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR,J. Biophys. , 2003, 84, ISSN 2264-2272. An International Standard Serial Number ( ISSN) is a unique eight-digit number used to identify a print or electronic Periodical publication. PMID 12668435
  4. ^ Roberto Todeschini, Viviana Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3527299130
  5. ^ R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  6. ^ A. Leach, Molecular Modelling: Principles and Applications, Prentice Hall, 2001. ISBN 0-582-38210-6
  7. ^ Schölkopf, B. , K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  8. ^ Gusfield, D. , Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0-521-58519-8
  9. ^ C. Helma (ed. ), Predictive Toxicology, CRC, 2005. ISBN 0-8247-2397-X
  10. ^ S. A. Wildman, G. M. Crippen, Prediction of Physicochemical Parameters by Atomic Contributions, J. Chem. Inf. Comput. Sci. }, 1999, 39, 868-873. doi:10.1021/ci990307l

External links

A digital object identifier ( DOI) is a permanent identifier given to an Electronic document.
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic