An example power law graph, being used to demonstrate ranking of popularity. To the right is the long tail, to the left are the few that dominate (also known as the 80-20 rule). The Pareto principle (also known as the 80-20 rule, the law of the vital few and the principle of factor sparsity) states that for many events 80%

A power law is any polynomial relationship that exhibits the property of scale invariance. In Mathematics, a polynomial is an expression constructed from Variables (also known as indeterminates and Constants using the operations In Physics and Mathematics, scale invariance is a feature of objects or laws that do not change if length scales (or energy scales are multiplied by a common factor The most common power laws relate two variables and have the form

$f(x) = ax^k\! +o(x^k)$

where a and k are constants, and o(xk) is an asymptotically small function of x. In mathematics big O notation (so called because it uses the symbol O) describes the limiting behavior of a function for very small or very large arguments Here, k is typically called the scaling exponent, denoting the fact that a power-law function (or, more generally, a kth order homogeneous polynomial) satisfies the criterion $f(c x) \propto f(x)$ where c is a constant. In Mathematics, a homogeneous polynomial is a Polynomial whose terms are Monomials all having the same total degree; or are elements of the same That is, scaling the function's argument changes the constant of proportionality as a function of the scale change, but preserves the shape of the function itself. This relationship becomes more clear if we take the logarithm of both sides (or, graphically, plotting on a log-log graph)

$\log\left(f(x)\right) = k \log x + \log a$ . In Mathematics, the logarithm of a number to a given base is the power or Exponent to which the base must be raised in order to produce In Science and Engineering, a log-log graph or log-log plot is a two-dimensional graph of numerical data that uses Logarithmic scales on both

Notice that this expression has the form of a linear relationship with slope k, and scaling the argument induces a linear shift (up or down) of the function, and leaves both the form and slope k unchanged.

Power-law relations characterize a staggering number of natural patterns, and it is primarily in this context that the term power law is used rather than polynomial function. For instance, inverse-square laws, such as gravitation and the Coulomb force are power laws, as are many common mathematical formulae such as the quadratic law of area of the circle. In Physics, an inverse-square law is any Physical law stating that some physical Quantity or strength is inversely proportional Gravitation is a natural Phenomenon by which objects with Mass attract one another ---- Bold text Coulomb's law', developed in the 1780s by French physicist Charles Augustin de Coulomb, may be stated in scalar form Circles are simple Shapes of Euclidean geometry consisting of those points in a plane which are at a constant Distance, called the Also, many probability distributions have tails that asymptotically follow power-law relations, a topic that connects tightly with the theory of large deviations (also called extreme value theory), which considers the frequency of extremely rare events like stock market crashes, and large natural disasters. In Probability theory and Statistics, a probability distribution identifies either the probability of each value of an unidentified Random variable In pure and Applied mathematics, particularly the Analysis of algorithms, real analysis and engineering asymptotic analysis is a method of describing Extreme value theory is a branch of Statistics dealing with the extreme Deviations from the Median of Probability distributions The general Extreme value theory is a branch of Statistics dealing with the extreme Deviations from the Median of Probability distributions The general A stock market crash is a sudden dramatic decline of Stock prices across a significant cross-section of a Stock market. A natural disaster is the consequence of a Natural hazard (eg

Scientific interest in power law relations, whether functions or distributions, comes primarily from the ease with which certain general classes of mechanisms can generate them. That is, the observation of a power-law relation in data often points to specific kinds of mechanisms that underly the natural phenomenon in question, and can often indicate a deep connection with other, seemingly unrelated systems (for instance, see both the reference by Simon and the subsection on universality below). The ubiquity of power-law relations in physics is partly due to dimensional constraints, while in complex systems, power laws are often thought to be signatures of hierarchy and robustness. Dimensional analysis is a conceptual tool often applied in Physics, Chemistry, Engineering, Mathematics and Statistics to understand This article describes complex systems as field of Science. For other meanings see Complex system. A few notable examples of power laws are the Gutenberg-Richter law for earthquake sizes, Pareto's law of income distribution, or structural self-similarity of fractals, and scaling laws in biological systems. In Seismology, the Gutenberg–Richter law expresses the relationship between the magnitude and total number of Earthquakes in any given region and time The Pareto principle (also known as the 80-20 rule, the law of the vital few and the principle of factor sparsity) states that for many events 80% A fractal is generally "a rough or fragmented geometric shape that can be split into parts each of which is (at least approximately a reduced-size copy of the whole" An allometric law describes the relationship between two attributes of living organisms and is usually expressed as a Power-law: y \propto x^{a} \\! Research on the origins of power-law relations, and efforts to observe and validate them in the real world, is extremely active in many fields of modern science, including physics, computer science, linguistics, geophysics, sociology, economics and more. Physics (Greek Physis - φύσις in everyday terms is the Science of Matter and its motion. Computer science (or computing science) is the study and the Science of the theoretical foundations of Information and Computation and their Linguistics is the scientific study of Language, encompassing a number of sub-fields Geophysics, a major discipline of Earth sciences, is the study of the Earth by quantitative physical methods especially by seismic, electromagnetic Sociology (from Latin: socius "companion" and the suffix -ology "the study of" from Greek λόγος lógos "knowledge" Economics is the social science that studies the production distribution, and consumption of goods and services.

Properties of power laws

Scale invariance

The main property of power laws that makes them interesting is their scale invariance. In Physics and Mathematics, scale invariance is a feature of objects or laws that do not change if length scales (or energy scales are multiplied by a common factor Given a relation f(x) = axk, or, indeed any homogeneous polynomial, scaling the argument x by a constant factor causes only a proportionate scaling of the function itself. In Mathematics, a homogeneous polynomial is a Polynomial whose terms are Monomials all having the same total degree; or are elements of the same That is,

$f(c x) = a(c x)^k = c^{k}f(x) \propto f(x)\!$.

That is, scaling by a constant simply multiplies the original power-law relation by the constant ck. Thus, it follows that all power laws with a particular scaling exponent are equivalent up to constant factors, since each is simply a scaled version of the others. This behavior is what produces the linear relationship when both logarithms are taken of both f(x) and x, and the straight-line on the log-log plot is often called the signature of a power law. Notably, however, with real data, such straightness is necessary, but not a sufficient condition for the data following a power-law relation. In fact, there are many ways to generate finite amounts of data that mimic this signature behavior, but, in their asymptotic limit, are not true power laws. Thus, accurately fitting and validating power-law models is an active area of research in statistics. Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data.

Universality

The equivalence of power laws with a particular scaling exponent can have a deeper origin in the dynamical processes that generate the power-law relation. In physics, for example, phase transitions in thermodynamic systems are associated with the emergence of power-law distributions of certain quantities, whose exponents are referred to as the critical exponents of the system. In Thermodynamics, phase transition or phase change is the transformation of a thermodynamic system from one phase to another Critical exponents describe the behaviour of physical quantities near continuous Phase transitions. Diverse systems with the same critical exponents — that is, which display identical scaling behaviour as they approach criticality — can be shown, via renormalization group theory, to share the same fundamental dynamics. In Physical chemistry, Thermodynamics, Chemistry and Condensed matter physics, a critical point, also called a critical state In Theoretical physics, renormalization group (RG refers to a mathematical apparatus that allows one to investigate the changes of a physical system as one views For instance, the behavior of water and CO2 at their boiling points fall in the same universality class because they have identical critical exponents. In fact, almost all material phase transitions are described by a small set of universality classes. Similar observations have been made, though not as comprehensively, for various self-organized critical systems, where the critical point of the system is an attractor. In Physics, self-organized criticality (SOC is a property of (classes of Dynamical systems which have a critical point as an Attractor. An attractor is a set to which a Dynamical system evolves after a long enough time Formally, this sharing of dynamics is referred to as universality, and systems with precisely the same critical exponents are said to belong to the same universality class. In Statistical mechanics, universality is the observation that there are properties for a large class of systems that are independent of the dynamical details of the In Theoretical physics, renormalization group (RG refers to a mathematical apparatus that allows one to investigate the changes of a physical system as one views

Power-law functions

The general power-law function follows the polynomial form given above, and is a ubiquitous form throughout mathematics and science. Notably, however, not all polynomial functions are power laws because not all polynomials exhibit the property of scale invariance. Typically, power-law functions are polynomials in a single variable, and are explicitly used to model the scaling behavior of natural processes. For instance, allometric scaling laws for the relation of biological variables are some of the best known power-law functions in nature. An allometric law describes the relationship between two attributes of living organisms and is usually expressed as a Power-law: y \propto x^{a} \\! In this context, the o(xk) term is most typically replaced by a deviation term ε, which can represent uncertainty in the observed values (perhaps measurement or sampling errors) or provide a simple way for observations to deviate from the no power-law function (perhaps for stochastic reasons):

$y = ax^k + \epsilon\!$. A stochastic process, or sometimes random process, is the counterpart to a deterministic process (or Deterministic system) in Probability theory.

Estimating the exponent from empirical data

There are many methods for fitting power-law functions to data, and the best option typically depends strongly on the kind of question being asked. For instance, prediction-type questions should rely on nonlinear regression, while descriptive-type summary questions, such as those found in allometry, should use a method that allows for uncertainty in both the x and y measurements. In statistics nonlinear regression is a form of Regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters Allometry is the study of the relationship between Size and Shape, first outlined by Otto Snell in 1892 and Julian Huxley in 1932 If the residuals are log normally distributed, e. g. if the spread in y is multiplicative (increasing proportionally with x), a simple least-squares linear regression on log-transformed data can be performed, since the log transformed residues are normally distributed after transformation. In statistics linear regression is a form of Regression analysis in which the relationship between one or more Independent variables and another variable called Otherwise, the logarithmic transformation produces residuals that are log-normally distributed, while the least squares method requires normally distributed errors. In Probability and Statistics, the log-normal distribution is the single-tailed Probability distribution of any Random variable whose In this latter context, the method of standardized major axis (SMA) regression (sometimes called reduced major axis, but this term should be avoided) is preferred. In Geometry, the semi-major axis (also semimajor axis) is used to describe the dimensions of ellipses and hyperbolae

The major axis is the linear equation that minimizes the sum of squares of the shortest (perpendicular) distance between data points and the equation. This axis is equivalent to the first principal component axis of the covariance matrix. In Statistics and Probability theory, the covariance matrix is a matrix of Covariances between elements of a vector From this observation, the estimator for the slope can be derived

$\hat{k} = \frac{ \sigma_{y} }{ \sigma_{x} } = \sqrt{ \frac{ \sum_{i=1}^{N} (y_i - \mu_{y})^2 }{ \sum_{i=1}^{N} (x_i - \mu_{x})^2 } }$

where μx and μy are the sample means of the x and y data, respectively. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the

More about this method, and the conditions under which it can be used, can be found in the Warton reference below. Further, Warton's comprehensive review article also provides usable code (C++, R, and Matlab) for estimation and testing routines for power-law functions.

Power-law distributions

A power-law distribution is any that, in the most general sense, has the form

$p(x) \propto L(x) x^{-\alpha}$

where α > 1, and L(x) is a slowly varying function, which is any function that satisfies $\lim_{x\rightarrow\infty} L(t\,x) / L(x) = 1$ with t constant. The Stefan–Boltzmann law, also known as Stefan's law, states that the total Energy radiated per unit surface Area of a Black body in unit The Gompertz-Makeham law states that death rate is a sum of age-independent component (Makeham term and age-dependent component (Gompertz function which increases exponentially with The Ramberg-Osgood equation was created to describe the non linear relationship between stress and strain &mdashthat is the Stress-strain curve &mdashin In Physics, an inverse-square law is any Physical law stating that some physical Quantity or strength is inversely proportional Newton 's law of universal Gravitation is a physical law describing the gravitational attraction between bodies with mass The initial mass function (IMF is an Empirical function that describes the Mass distribution (the histogram of stellar masses of a population of stars in terms Gamma correction, gamma nonlinearity, gamma encoding, or often simply gamma, is the name of a nonlinear operation used to code and decode luminance Kleiber's law, named after Max Kleiber 's biological work in the early 1930s is the observation that for the vast majority of animals an animal's metabolic rate An allometric law describes the relationship between two attributes of living organisms and is usually expressed as a Power-law: y \propto x^{a} \\! In Thermodynamics, phase transition or phase change is the transformation of a thermodynamic system from one phase to another Critical exponents describe the behaviour of physical quantities near continuous Phase transitions. Experience curve re-directs here For its use in video games see Experience point. For the 1962 Bruce Conner film see Cosmic Ray (film Cosmic rays are energetic particles originating from space that impinge on In Physics, an inverse-square law is any Physical law stating that some physical Quantity or strength is inversely proportional The square-cube law is a principle drawn from the mathematics of proportion, that is applied in Engineering and Biomechanics. The constructal theory of global optimization under local constraints attempts to explain in a simple manner the shapes that arise in nature A fractal is generally "a rough or fragmented geometric shape that can be split into parts each of which is (at least approximately a reduced-size copy of the whole" This property of L(x) follows directly from the requirement that p(x) be asymptotically scale invariant; thus, the form of L(x) only controls the shape and finite extent of the lower tail. For instance, if L(x) is the constant function, then we have a power-law that holds for all values of x. In many cases, it is convenient to assume a lower bound xmin from which the law holds. Combining these two cases, and where x is a continuous variable, the power law has the form

$p(x) = \frac{\alpha-1}{x_{\mathrm{min}}} \left(\frac{x}{x_{\mathrm{min}}}\right)^{-\alpha}$,

where the constant is necessary to guarantee that the distribution is properly normalized. Briefly, we can consider several properties of this distribution.

In general, the moments of this distribution are given by

$\langle x^{m} \rangle = \int_{x_{\mathrm{min}}}^{\infty} x^{m} p(x) \mathrm{d}x = \frac{\alpha-1}{\alpha-1-m}x_{\mathrm{min}}^m$

which is only well defined for m < α − 1. That is, all moments $m \geq \alpha - 1$ diverge: when α < 2, the average and all higher-order moments are infinite; when 2 < α < 3, the mean exists, but the variance and higher-order moments are infinite, etc. For finite-size samples drawn from such distribution, this behavior implies that the central moment estimators (like the mean and the variance) for diverging moments will never converge - as more data is accumulated, they continue to grow.

Another kind of power-law distribution, which does not satisfy the general form above, is the power law with an exponential cutoff

$p(x) \propto L(x) x^{-\alpha} \mathrm{e}^{-\lambda x}$

where we introduce an exponential decay term e − λx that overwhelms the power-law behavior at large values of x. This distribution does not scale and is thus not asymptotically a power law; however, it does approximately scale over a finite region before the cutoff. (Note that the pure form above is a subset of this family, with λ = 0. ) This distribution is a common alternative to the asymptotic power-law distribution because it naturally captures finite-size effects. For instance, although the Gutenberg-Richter Law is commonly cited as an example of a power-law distribution, the distribution of earthquake magnitudes cannot scale as a power law in the limit $x\rightarrow\infty$ because there is a finite amount of energy in the Earth's crust. In Seismology, the Gutenberg–Richter law expresses the relationship between the magnitude and total number of Earthquakes in any given region and time Thus, there must be some maximum size earthquake, and the scaling behavior must taper off as it approaches this size.

Plotting power-law distributions

In general, power-law distributions are plotted on doubly logarithmic axes, which emphasizes the upper tail region. In Science and Engineering, a log-log graph or log-log plot is a two-dimensional graph of numerical data that uses Logarithmic scales on both The most convenient way to do this is via the (complementary) cumulative distribution (cdf), P(x) = Pr(X > x),

$P(x) = \mathrm{Pr}(X > x) = C \int_{x}^{\infty} p(X)\mathrm{d}X = \frac{\alpha-1}{x_{\mathrm{min}}^{-\alpha+1}} \int_{x}^{\infty} X^{-\alpha}\mathrm{d}X = \left(\frac{x}{x_{\mathrm{min}}} \right)^{(-\alpha+1)}.$

Note that the cdf is also a power-law function, but with a smaller scaling exponent. In Probability theory and Statistics, the cumulative distribution function (CDF, also probability distribution function or just distribution function For data, an equivalent form of the cdf is the rank-frequency approach, in which we first sort the n observed values in ascending order, and plot them against the vector $\left[1,\frac{n-1}{n},\frac{n-2}{n},\dots,\frac{1}{n}\right]$.

Although it can be convenient to log-bin the data, or otherwise smooth the probability density (mass) function directly, these methods introduce an implicit bias in the representation of the data, and thus should be avoided. The cdf, on the other hand, introduces no bias in the data and preserves the linear signature on doubly logarithmic axes.

Estimating the exponent from empirical data

There are many ways of estimating the value of the scaling exponent for a power-law tail, however not all of them yield unbiased and consistent answers. Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data The most reliable techniques are often based on the method of maximum likelihood. Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data Alternative methods are often based on making a linear regression on either the log-log probability, the log-log cumulative distribution function, or on log-binned data, but these approaches should be avoided as they can all lead to highly biased estimates of the scaling exponent (see the Clauset et al. reference below).

For real-valued data, we fit a power-law distribution of the form

$p(x) = \frac{\alpha-1}{x_{\mathrm{min}}} \left(\frac{x}{x_{\mathrm{min}}}\right)^{-\alpha}$

to the data $x\geq x_{\mathrm{min}}$. Given a choice for xmin, a simple derivation by this method yields the estimator equation

$\hat{\alpha} = 1 + n \left[ \sum_{i=1}^{n} \ln \frac{x_{i}}{x_{\mathrm{min}}} \right]^{-1}$

where {xi} are the n data points $x_{i}\geq x_{\mathrm{min}}$. (For a more detailed derivation, see Hall or Newman below. ) This estimator exhibits a small finite sample-size bias of order O(n − 1), which is small when n > 100. Further, the uncertainty in the estimation can be derived from the maximum likelihood argument, and has the form $\sigma = \frac{\alpha-1}{\sqrt{n}}$. This estimator is equivalent to the popular Hill estimator from quantitative finance and extreme value theory. Mathematical finance is the branch of Applied mathematics concerned with the Financial markets. Extreme value theory is a branch of Statistics dealing with the extreme Deviations from the Median of Probability distributions The general

For a set of n integer-valued data points {xi}, again where each $x_{i}\geq x_{\mathrm{min}}$, the maximum likelihood exponent is the solution to the transcendental equation

$\frac{\zeta'(\hat{\alpha},x_{\mathrm{min}})}{\zeta(\hat{\alpha},x_{\mathrm{min}})} = -\frac{1}{n} \sum_{i=1}^{n} \ln \frac{x_{i}}{x_{\mathrm{min}}}$

where ζ(α,xmin) is the incomplete zeta function. In Mathematics, the Riemann zeta function, named after German mathematician Bernhard Riemann, is a function of great significance in The uncertainty in this estimate follows the same formula as for the continuous equation. However, the two equations for $\hat{\alpha}$ are not equivalent, and the continuous version should not be applied to discrete data, nor vice versa.

Further, both of these estimators require the choice of xmin. For functions with a non-trivial L(x) function, choosing xmin too small produces a significant bias in $\hat{\alpha}$, while choosing it too small increases the uncertainty in $\hat{\alpha}$, and reduces the statistical power of our model. The power of a statistical test is the probability that the test will reject a false Null hypothesis (that it will not make a Type II error) In general, the optimum choice of xmin depends strongly on the particular form of the lower tail, represented by L(x) above.

More about these methods, and the conditions under which they can be used, can be found in the Clauset et al. reference below. Further, this comprehensive review article provides usable code (Matlab and R) for estimation and testing routines for power-law distributions.

Examples of power-law distributions

A great many power-law distributions have been conjectured in recent years. The Pareto distribution, named after the Italian Economist Vilfredo Pareto, is a Power law Probability distribution that coincides with In Probability theory and Statistics, the zeta distribution is a discrete Probability distribution. In Probability and Statistics, Student's t -distribution (or simply the t -distribution) is a Probability distribution The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous Probability distribution. WikipediaWikiProject Probability#Standards for a discussion of standards used for probability distribution articles such as this one A scale-free network is a network whose Degree distribution follows a Power law, at least asymptotically A bibliogram is a verbal construct made when Noun phrases from extended stretches of text are ranked high to low by their frequency of Co-occurrence with In Seismology, the Gutenberg–Richter law expresses the relationship between the magnitude and total number of Earthquakes in any given region and time An earthquake is the result of a sudden release of energy in the Earth 's crust that creates Seismic waves Earthquakes are recorded with a Seismometer Robert Elmer Horton ( May 18, 1875 - April 22, 1945) was an American ecologist and soil scientist, considered For instance, power laws are thought to characterize the behavior of the upper tails for the popularity of websites, number of species per genus, the popularity of given names, the size of financial returns, and many others. However, much debate remains as to which of these tails are actually power-law distributed and which are not. For instance, it is commonly accepted now that the famous Gutenberg-Richter Law decays more rapidly than a pure power-law tail because of a finite exponential cutoff in the upper tail. In Seismology, the Gutenberg–Richter law expresses the relationship between the magnitude and total number of Earthquakes in any given region and time

Validating power laws

Although power-law relations are attractive for many theoretical reasons, demonstrating that data do indeed follow a power-law relation requires more than simply fitting such a model to the data. In general, many alternative functional forms can appear to follow a power-law form for some extent. Thus, the preferred method for validation of power-law relations is by testing many orthogonal predictions of a particular generative mechanism against data, and not simply fitting a power-law relation to a particular kind of data. As such, the validation of power-law claims remains a very active field of research in many areas of modern science.