Feature detection Scale-space Scale-space axioms Scale-space implementation Edge detection Blob detection Corner detection Ridge detection Interest point detection This box: view • talk • edit

Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. Scale-space theory is a framework for multi-scale signal representation developed by the Computer vision, Image processing and In Image processing and Computer vision, a Scale-space framework can be used to represent an image as a family of gradually smoothed images The linear Scale space representation of an N-dimensional continuous signal f_C(x_1 x_2 \dots x_N t is obtained by convolving f_C In Computer vision and Image processing the concept of feature detection refers to methods that aim at computing abstractions of image information and making Edge detection is a terminology in Image processing and Computer vision, particularly in the areas of feature detection and Feature extraction In the area of Computer vision, ' blob detection' refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker Corner detection or the more general terminology Interest point detection is an approach used The ridges (or the ridge set) of a smooth function of two variables is a set of curves whose points are loosely speaking local maxima in at least one dimension Interest point detection is a recent terminology in Computer vision that refers to the detection of interest points for subsequent processing Scale-space theory is a framework for multi-scale signal representation developed by the Computer vision, Image processing and Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation based on the computation of image descriptors at multiple A scale model is a representation or copy of an object that is larger or smaller than the actual size of the object. In the fields of communications, Signal processing, and in Electrical engineering more generally a signal is any time-varying or spatial-varying quantity Knowledge representation is an area in Artificial intelligence that is concerned with how to formally "think" that is how to use a symbol system to represent Computer vision is the science and technology of machines that see Image processing is any form of Signal processing for which the input is an image such as photographs or frames of video the output of image processing can be either an image Signal processing is the analysis interpretation and manipulation of signals Signals of interest include sound, images, biological signals such as It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The concept of scale is applicable if a system is represented proportionally by another system In Mathematics and in particular Functional analysis, convolution is a mathematical operation on two functions f and The parameter t in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about $\sqrt{t}$ have largely been smoothed away in the scale-space level at scale t.

The main type of scale-space is the linear (Gaussian) scale-space, which has wide applicability as well as the attractive property of being possible to derive from a small set of scale-space axioms. In Image processing and Computer vision, a Scale-space framework can be used to represent an image as a family of gradually smoothed images The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, due to the facts that real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances. In Physics and Mathematics, scale invariance is a feature of objects or laws that do not change if length scales (or energy scales are multiplied by a common factor

## Definition

The notion of scale-space applies to signals of arbitrary numbers of variables. The most common case in the literature applies to two-dimensional images, which is what is presented here. For a given image f(x,y), its linear (Gaussian) scale-space representation is a family of derived signals L(x,y;t) defined by the convolution of f(x,y) with the Gaussian kernel

$g(x, y; t) = \frac {1}{2{\pi} t}e^{-(x^2+y^2)/2t}\,$

such that

$L(x, y; t)\ = g(x, y; t) * f(x, y),$

where the semicolon in the argument of g implies that the convolution is performed only over the variables x,y, while the scale parameter t after the semicolon just indicates which scale level is being defined. In Mathematics and in particular Functional analysis, convolution is a mathematical operation on two functions f and In Mathematics, a Gaussian function (named after Carl Friedrich Gauss) is a function of the form f(x = a e^{- { (x-b^2 \over 2 This definition of L works for a continuum of scales $t \geq 0$, but typically only a finite discrete set of levels in the scale-space representation would be actually considered.

t is the variance of the Gaussian filter and for t = 0 the resulting filter g becomes an impulse function such that L(x,y,0) = f(x,y), that is, the scale-space representation at scale level t = 0 is the image f itself. As t increases, L is the result of smoothing f with a larger and larger filter, thereby removing more and more of the details which it contains. Since the standard deviation of the filter is $\sqrt{t}$, details which are significantly smaller than this value are to a large extent removed from the image at scale parameter t.

### Why a Gaussian filter?

When faced with the task of generating a multi-scale representation one may ask: Could any filter g of low-pass type and with a parameter t which determines its width be used to generate a scale-space? This is, however, not the case. It is of crucial importance that the smoothing filter does not introduce new spurious structures at coarse scales that do not correspond to simplifications of corresponding structures at finer scales. In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms.

The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale-space constitutes the canonical way to generate a linear scale-space, based on the essential requirement that new structures must not be created from a fine scale to any coarser scale. [1][2][3][4][5][6] Conditions, referred to as scale-space axioms, that have been used for deriving the uniqueness of the Gaussian kernel include linearity, shift-invariance, semi-group structure, non-enhancement of local extrema, scale invariance and rotational invariance. In Image processing and Computer vision, a Scale-space framework can be used to represent an image as a family of gradually smoothed images

Equivalently, the scale-space family can be defined as the solution of the diffusion equation (for example in terms of the heat equation),

$\partial_t L = \frac{1}{2} \nabla^2 L$,

with initial condition L(x,y;0) = f(x,y). The diffusion equation is a Partial differential equation which describes density fluctuations in a material undergoing Diffusion. The heat equation is an important Partial differential equation which describes the distribution of Heat (or variation in temperature in a given region over time This formulation of the scale-space representation L means that it is possible to interpret the intensity values of the image f as a "temperature distribution" in the image plane and that the process which generates the scale-space representation as a function of t corresponds to heat diffusion in the image plane over time t (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant ½). Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale-space, thus within the framework of partial differential equations. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale-spaces, which also generalizes to non-linear scale-spaces, for example, using anisotropic diffusion. Hence, one may say that the primary way to generate a scale-space is by the diffusion equation, and that the Gaussian kernel arises as the Green's function of this specific partial differential equation. In Mathematics, Green's function is a type of function used to solve inhomogeneous Differential equations subject to boundary conditions

## Motivations

The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. The concept of scale is applicable if a system is represented proportionally by another system This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation. In Geometry, Topology and related branches of mathematics a spatial point describes a specific point within a given space that consists of neither Volume For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a machine vision system analysing an unknown scene, there is no way to know a priori what scales are appropriate for describing the interesting structures in the image data. Machine vision (MV System is the application of Computer vision to industry and manufacturing The concept of scale is applicable if a system is represented proportionally by another system Hence, the only reasonable approach is to consider descriptions at multiple scales in order to be able to capture the unknown scale variations that may occur. Taken to the limit, a scale-space representation considers representations at all scales.

Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from by measurement process, one has to apply operators of non-infinitesimal size to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement.

There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex. In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.

## Gaussian derivatives and the notion of a visual front-end

At any scale in scale-space, we can apply local derivative operators to the scale-space representation:

$L_{x^m y^n}(x, y; t) = \partial_{x^m y^n} \left( L(x, y; t) \right).$

Due to the commutative property between the derivative operator and the Gaussian smoothing operator, such scale-space derivatives can equivalently be computed by convolving the original image with Gaussian derivative operators. For this reason they are often also referred to as Gaussian derivatives:

$L_{x^m y^n}(x, y; t) = \left( \partial_{x^m y^n} g(x, y; t) \right)* f(x, y).$

Interestingly, the uniqueness of the Gaussian derivative operators as local operations derived from a scale-space representation can be obtained by similar axiomatic derivations as are used for deriving the uniqueness of the Gaussian kernel for scale-space smoothing. [7][3]

These Gaussian derivative operators can in turn be combined by linear or non-linear operators into a larger variety of different types of feature detectors, which in many cases can be well modelled by differential geometry. Differential geometry is a mathematical discipline that uses the methods of differential and integral Calculus to study problems in Geometry Specifically, invariance (or more appropriately covariance) to local geometric transformations, such as rotations or local affine transformations, can be obtained by considering differential invariants under the appropriate class of transformations or alternatively by normalizing the Gaussian derivative operators to a locally determined coordinate frame determined from e. g. a preferred orientation in the image domain or by applying a preferred local affine transformation to a local image patch (see the article on affine shape adaptation for further details). Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood

When Gaussian derivative operators and differential invariants are used in this way as basic feature detectors at multiple scales, the uncommitted first stages of visual processing are often referred to as a visual front-end. This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification, image segmentation, image matching, motion estimation, computation of shape cues and object recognition. The notion of feature detection is used with somewhat different but nevertheless related meanings in different areas that deal with automated interpretation of sensory or measurement data Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation based on the computation of image descriptors at multiple In Computer vision, sets of Data acquired by sampling the same scene or object at different times or from different perspectives will be in different coordinate systems Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another usually from adjacent frames in a video sequence The shape ( OE sceap Eng created thing) of an object located in some space refers to the part of space occupied by the object as determined Object recognition in Computer vision is a task of finding given object in an image or video sequence The set of Gaussian derivative operators up to a certain order is often referred to as the N-jet and constitutes a basic type of feature within the scale-space framework. An N-jet is the set of (partial derivatives of a function f(x up to order N

## Examples of multi-scale feature detectors expressed within the scale-space framework

Following the idea of expressing visual operation in terms of differential invariants computed at multiple scales using Gaussian derivative operators, we can express an edge detector from the set of points that satisfy the requirement that the gradient magnitude

$L_v = \sqrt{L_x^2 + L_y^2}^T$

should assume a local maximum in the gradient direction

$\nabla L = (L_x, L_y)^T$. Edge detection is a terminology in Image processing and Computer vision, particularly in the areas of feature detection and Feature extraction

By working out the differential geometry, it can be shown [3] that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant

${\tilde L}_v^2 = L_x^2 \, L_{xx} + 2 \, L_x \, L_y \, L_{xy} + L_y^2 \, L_{yy} = 0$

that satisfy the following sign condition on a third-order differential invariant:

${\tilde L}_v^3 = L_x^3 \, L_{xxx} + 3 \, L_x^2 \, L_y \, L_{xxy} + 3 \, L_x \, L_y^2 \, L_{xyy} + L_y^3 \, L_{yyy} < 0$. Edge detection is a terminology in Image processing and Computer vision, particularly in the areas of feature detection and Feature extraction

Similarly, multi-scale blob detectors at any given fixed scale can be obtained from local maxima and local minima of either the Laplacian operator (also referred to as the Laplacian of Gaussian)

$\nabla^2 L = L_{xx} + L_{yy}$
$\operatorname{det} H L(x, y; t) = (L_{xx} L_{yy} - L_{xy}^2)$. In the area of Computer vision, ' blob detection' refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker In Mathematics and Physics, the Laplace operator or Laplacian, denoted by \Delta\  or \nabla^2  and named after In the area of Computer vision, ' blob detection' refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker In the area of Computer vision, ' blob detection' refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker

In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives. The algebraic expressions for the corner and ridge detection operators are, however, somewhat more complex and the reader is referred to the articles on corner detection and ridge detection for further details. Corner detection or the more general terminology Interest point detection is an approach used The ridges (or the ridge set) of a smooth function of two variables is a set of curves whose points are loosely speaking local maxima in at least one dimension

Scale-space operations have also been frequently used for expressing coarse-to-fine methods, in particular for tasks such as image matching and for multi-scale image segmentation. In Computer vision, sets of Data acquired by sampling the same scene or object at different times or from different perspectives will be in different coordinate systems Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation based on the computation of image descriptors at multiple

## Automatic scale selection and scale invariant feature detection

The theory presented so far describes a well-founded framework for representing image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for scale selection originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown a priori. A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selection[8][9] based on local maxima (or minima) over scales of normalized derivatives

$L_{\xi^m \eta^n}(x, y; t) = t^{(m+n) \gamma/2} L_{x^m y^n}(x, y; t)$

where $\gamma \in [0,1]$ is a parameter that is related to the dimensionality of the image feature. In Mathematics, maxima and minima, known collectively as extrema, are the largest value (maximum or smallest value (minimum that In Mathematics, maxima and minima, known collectively as extrema, are the largest value (maximum or smallest value (minimum that In Calculus, a branch of mathematics the derivative is a measurement of how a function changes when the values of its inputs change This algebraic expression for scale normalized Gaussian derivative operators originates from the introduction of γ-normalized derivatives according to

$\partial_{\xi} = t^{\gamma/2} \partial_x\quad$ and $\quad\partial_{\eta} = t^{\gamma/2} \partial_y$.

It can be theoretically shown that a scale selection module working according to this principle will satisfy the following scale invariance property: if for a certain type of image feature a local maximum is assumed in a certain image at a certain scale t0, then under a rescaling of the image by a scale factor s the local maximum over scales in the rescaled image will be transformed to the scale level s2t0.

## Related multi-scale representations

Pyramid representation is a predecessor to scale-space representation, constructed by simultaneously smoothing and subsampling a given signal. [14][15] In this way, computationally highly efficient algorithms can be obtained. In a pyramid, however, it is usually algorithmically harder to relate structures at different scales, due to the discrete nature of the scale levels. In a scale-space representation, the existence of a continuous scale parameter makes it conceptually much easier to express this so-called deep structure. For features defined as zero-crossings of differential invariants, the implicit function theorem directly defines trajectories across scales [3], and at those scales where bifurcations occur, the local behaviour can be modelled by singularity theory. In the branch of Mathematics called Multivariable calculus, the implicit function theorem is a tool which allows relations to be converted to functions For other mathematical uses see Mathematical singularity. For non-mathematical uses see Gravitational singularity.

Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes. [16][17] These non-linear scale-spaces often start from the equivalent diffusion formulation of the scale-space concept, which is subsequently extended in a non-linear fashion. A large number of evolution equations have been formulated in this way, motivated by different specific requirements (see the abovementioned book references for further information). It should be noted, however, that not all of these non-linear scale-spaces satisfy similar "nice" theoretical requirements as the linear Gaussian scale-space concept. Hence, unexpected artefacts may sometimes occur and one should be very careful of not using the term "scale-space" for just any type of one-parameter family of images.

A first-order extension of the isotropic Gaussian scale-space is provided by the affine (Gaussian) scale-space [3]. One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a perspective camera model. To handle such non-linear deformations locally, partial invariance (or more correctly covariance) to local affine deformations can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure, see the article on affine shape adaptation for theory and algorithms. Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood Indeed, this affine scale-space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear partial differential equations. In Mathematics, partial differential equations ( PDE) are a type of Differential equation, i

There are strong relations between scale-space theory and wavelet theory, although these two notions of multi-scale representation have been developed from somewhat different premises. A wavelet is a mathematical function used to divide a given function or continuous-time signal into different frequency components and study each component with a resolution There has also been work on other multi-scale approaches, such as pyramids and a variety of other kernels, that do not exploit or require the same requirements as true scale-space descriptions do. The Scale-space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties Scale-space axioms, which make it into a special

## Relations to biological vision

There are interesting relations between scale-space representation and biological vision. Neurophysiological studies have shown that there are receptive field profiles in the mammalian retina and visual cortex, which can be well modelled by linear Gaussian derivative operators, in some cases also complemented by a non-isotropic affine scale-space model and/or non-linear combinations of such linear operators[18][19]

## Implementation issues

When implementing scale-space smoothing in practice there are a number of different approaches that can be taken in terms of continuous or discrete Gaussian smoothing, implementation in the Fourier domain, in terms of pyramids based on binomial filters that approximate the Gaussian or using recursive filters. The receptive field of a sensory Neuron is a region of space in which the presence of a stimulus will alter the firing of that neuron The vertebrate retina is a light sensitive part inside the inner layer of the Eye. The term visual cortex refers to the primary visual cortex (also known as striate cortex or More details about this are given in a separate article on scale-space implementation. The linear Scale space representation of an N-dimensional continuous signal f_C(x_1 x_2 \dots x_N t is obtained by convolving f_C

## References

1. ^ Witkin, A. P. "Scale-space filtering", Proc. 8th Int. Joint Conf. Art. Intell. , Karlsruhe, Germany,1019–1022, 1983.
2. ^ Koenderink, Jan "The structure of images", Biological Cybernetics, 50:363–370, 1984
3. ^ a b c d e Lindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994, ISBN 0-7923-9418-6
4. ^ Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.
5. ^ Sporring, Jon et al (Eds), Gaussian Scale-Space Theory, Kluwer Academic Publishers, 1997.
6. ^ Romeny, Bart ter Haar, Front-End Vision and Multi-Scale Image Analysis, Kluwer Academic Publishers, 2003.
7. ^ Koenderink, Jan and van Doorn, Ans: "Generic neighbourhood operators", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 14, pp 597-605, 1992
8. ^ a b Lindeberg, Tony "Feature detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp 77–116, 1998.
9. ^ a b Lindeberg, Tony "Edge detection and ridge detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp 117–154, 1998.
10. ^ Lindeberg, T. and Garding, J.: Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure, Image and Vision Computing, 15,~415–434, 1997.
11. ^ Baumberg, A.: Reliable feature matching across widely separated views, Proc. Computer Vision Pattern Recognition, I:1774–1781, 2000.
12. ^ Mikolajczyk, K. and Schmid, C.: Scale and affine invariant interest point detectors, Int. Journal of Computer Vision, 60:1, 63 - 86, 2004.
13. ^ Lowe, D. G., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
14. ^ Burt, Peter and Adelson, Ted, "The Laplacian Pyramid as a Compact Image Code", IEEE Trans. Communications, 9:4, 532–540, 1983.
15. ^ Crowley, J. L. and Sanderson, A. C. "Multiple resolution representation and probabilistic matching of 2-D gray-scale shape", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113-121, 1987.
16. ^ Romeny, Bart (Ed), Geometry-Driven Diffusion in Computer Vision, Kluwer Academic Publishers, 1994.
17. ^ Weickert, J Anisotropic diffusion in image processing, Teuber Verlag, Stuttgart, 1998.
18. ^ Young, R. A. "The Gaussian derivative model for spatial vision: Retinal mechanisms", Spatial Vision, 2:273–293, 1987.
19. ^ DeAngelis, G. C., Ohzawa, I., and Freeman, R. D., "Receptive-field dynamics in the central visual pathways", Trends Neurosci. 18: 451–458, 1995.