In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices, where it defines the matrix derivative. Mathematics is the body of Knowledge and Academic discipline that studies such concepts as Quantity, Structure, Space and Multivariable calculus is the extension of Calculus in one Variable to calculus in several variables the functions which are differentiated and integrated involve In Mathematics, a matrix (plural matrices) is a rectangular table of elements (or entries) which may be Numbers or more generally This notation is well-suited to describing systems of differential equations, and taking derivatives of matrix-valued functions with respect to matrix variables. A differential equation is a mathematical Equation for an unknown function of one or several variables that relates the values of the In Calculus, a branch of mathematics the derivative is a measurement of how a function changes when the values of its inputs change This notation is commonly used in statistics and engineering, while the tensor index notation is preferred in physics. Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data. Engineering is the Discipline and Profession of applying technical and scientific Knowledge and In Mathematics, especially in applications of Linear algebra to Physics, the Einstein notation or Einstein summation convention is a notational Physics (Greek Physis - φύσις in everyday terms is the Science of Matter and its motion.

## Notice

This article uses another definition for vector and matrix calculus than the form often encountered within the field of estimation theory and pattern recognition. Estimation theory is a branch of Statistics and Signal processing that deals with estimating the values of parameters based on measured/empirical data Pattern recognition is a sub-topic of Machine learning. It is "the act of taking in raw data and taking an action based on the category of the data" The resulting equations will therefore appear to be transposed when compared to the equations used in textbooks within these fields.

## Notation

Let M(n,m) denote the space of real n×m matrices with n rows and m columns, whose elements will be denoted F, X, Y, etc. In Mathematics, the real numbers may be described informally in several different ways An element of M(n,1), that is, a column vector, is denoted with a boldface lowercase letter x, while xT denotes its transpose row vector. In Linear algebra, a column vector or column matrix is an m × 1 matrix, i This article is about the Matrix Transpose operator For other uses see Transposition In Linear algebra, the transpose of a An element of M(1,1) is a scalar, and denoted a, b, c, f, t etc. All functions are assumed to be of differentiability class C1 unless otherwise noted. In Mathematical analysis, a differentiability class is a classification of functions according to the properties of their Derivatives Higher order differentiability

## Vector calculus

Main article: Vector calculus

Because the space M(n,1) is identified with the Euclidean space Rn and M(1,1) is identified with R, the notations developed here can accommodate the usual operations of vector calculus. Vector calculus (also called vector analysis) is a field of Mathematics concerned with multivariable Real analysis of vectors in an Inner

• The tangent vector to a curve x : RRn is
$\frac{\partial \mathbf{x}} {\partial t} = \begin{bmatrix}\frac{\partial x_1}{\partial t} \\\vdots \\\frac{\partial x_n}{\partial t} \\\end{bmatrix}.$
• The gradient of a scalar function f : RnR
$\frac{\partial f}{\partial \mathbf{x}} = \begin{bmatrix}\frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n} \\\end{bmatrix}.$
The directional derivative of f in the direction of v is then
$\nabla_\mathbf{v} f = \frac{\partial f}{\partial \mathbf{x}}\mathbf{v}.$
• The pushforward or differential of a function f : RmRn is described by the Jacobian matrix
$\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_m}\\\vdots & \ddots & \vdots\\\frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_m}\\\end{bmatrix}.$
The pushforward along f of a vector v in Rm is
$d\,\mathbf{f}(\mathbf{v}) = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} \mathbf{v}.$

## Matrix calculus

For the purposes of defining derivatives of simple functions, not much changes with matrix spaces; the space of n×m matrices is after all isomorphic as a vector space to Rnm. In Vector calculus, the gradient of a Scalar field is a Vector field which points in the direction of the greatest rate of increase of the scalar In Mathematics, the directional derivative of a multivariate Differentiable function along a given vector V at a given point P intuitively represents the Suppose that &phi: M → N is a smooth map between smooth manifolds then the differential of &phi at a point x is in some In Vector calculus, the Jacobian is shorthand for either the Jacobian matrix or its Determinant, the Jacobian determinant. In Abstract algebra, an isomorphism ( Greek: ἴσος isos "equal" and μορφή morphe "shape" is a bijective In Mathematics, a vector space (or linear space) is a collection of objects (called vectors) that informally speaking may be scaled and added The three derivatives familiar from vector calculus have close analogues here, though beware the complications that arise in the identities below.

• The tangent vector of a curve F : RM(n,m)
$\frac{\partial \mathbf{F}}{\partial t} =\begin{bmatrix}\frac{\partial F_{1,1}}{\partial t} & \cdots & \frac{\partial F_{1,m}}{\partial t}\\\vdots & \ddots & \vdots\\\frac{\partial F_{n,1}}{\partial t} & \cdots & \frac{\partial F_{n,m}}{\partial t}\\\end{bmatrix}.$
• The gradient of a scalar function f : M(n,m) → R
$\frac{\partial f}{\partial \mathbf{X}} =\begin{bmatrix}\frac{\partial f}{\partial X_{1,1}} & \cdots & \frac{\partial f}{\partial X_{n,1}}\\\vdots & \ddots & \vdots\\\frac{\partial f}{\partial X_{1,m}} & \cdots & \frac{\partial f}{\partial X_{n,m}}\\\end{bmatrix}.$
Notice that the indexing of the gradient with respect to X is transposed as compared with the indexing of X. The directional derivative of f in the direction of matrix Y is given by
$\nabla_\mathbf{Y} f = \operatorname{tr} \left(\frac{\partial f}{\partial \mathbf{X}} \mathbf{Y}\right),$
where tr denotes the trace. In Linear algebra, the trace of an n -by- n Square matrix A is defined to be the sum of the elements on the Main diagonal
• The differential or the matrix derivative of a function F : M(n,m) → M(p,q) is an element of M(p,q) M(m,n), a fourth rank tensor (the reversal of m and n here indicates the dual space of M(n,m)). History The word tensor was introduced in 1846 by William Rowan Hamilton to describe the norm operation in a certain type of algebraic system (eventually History The word tensor was introduced in 1846 by William Rowan Hamilton to describe the norm operation in a certain type of algebraic system (eventually In Mathematics, any Vector space V has a corresponding dual vector space (or just dual space for short consisting of all Linear functionals In short it is an m×n matrix each of whose entries is a p×q matrix.
$\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=\begin{bmatrix}\frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\\vdots & \ddots & \vdots\\\frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\\end{bmatrix},$
and note that each ∂F/∂Xi,j is a p×q matrix defined as above. Note also that this matrix has its indexing transposed; m rows and n columns. The pushforward along F of an n×m matrix Y in M(n,m) is then
$d\mathbf{F}(\mathbf{Y}) = \operatorname{tr}\left(\frac{\partial\mathbf{F}} {\partial\mathbf{X}}\mathbf{Y}\right).$
Note that this definition encompasses all of the preceding definitions as special cases.

## Identities

Note that matrix multiplication is not commutative, so in these identities, the order must not be changed. In Mathematics, commutativity is the ability to change the order of something without changing the end result

• Chain rule: If Z is a function of Y which in turn is a function of X
$\frac{\partial \mathbf{Z}} {\partial \mathbf{X}} = \frac{\partial \mathbf{Z}} {\partial \mathbf{Y}} \frac{\partial \mathbf{Y}} {\partial \mathbf{X}}$
• Product rule:
$\frac{\partial (\mathbf{Y}^T\mathbf{Z})}{\partial \mathbf{X}} = (\mathbf{Z}^T)\frac{\partial\mathbf{Y}}{\partial \mathbf{X}} + (\mathbf{Y}^T)\frac{\partial\mathbf{Z}}{\partial \mathbf{X}}$

## Examples

### Derivative of linear functions

This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector. In Calculus, the chain rule is a Formula for the Derivative of the composite of two functions. In Calculus, the product rule also called Leibniz's law (see derivation) governs the differentiation of products of differentiable

$\frac{\partial \; \textbf{a}^T\textbf{x}}{\partial \; \textbf{x}} = \frac{\partial \; \textbf{x}^T\textbf{a}}{\partial \; \textbf{x}} = \textbf{a}^T$
$\frac{\partial \; \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{A}$

### Derivative of quadratic functions

This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

$\frac{\partial \; \textbf{x}^T \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{x}^T(\textbf{A}^T + \textbf{A})$
$\frac{\partial \; (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} (\textbf{D}\textbf{x} + \textbf{e}) }{\partial \; \textbf{x}} = (\textbf{D}\textbf{x} + \textbf{e})^T \textbf{C}^T \textbf{A} + (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} \textbf{D}$

Related to this is the derivative of the Euclidean norm:

$\frac{\partial \; \|\mathbf{x}-\mathbf{a}\|}{\partial \; \textbf{x}} =\frac{(\mathbf{x}-\mathbf{a})^T}{\|\mathbf{x}-\mathbf{a}\|}$

### Derivative of matrix traces

This section shows examples of matrix differentiation of common trace equations. In Linear algebra, Functional analysis and related areas of Mathematics, a norm is a function that assigns a strictly positive length In Linear algebra, the trace of an n -by- n Square matrix A is defined to be the sum of the elements on the Main diagonal

$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B})}{\partial \; \textbf{X}} = \frac{\partial \; \operatorname{tr}( \textbf{B}^T \textbf{X}^T \textbf{A}^T)}{\partial \; \textbf{X}} = \textbf{A}^T \textbf{B}^T$
$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B} \textbf{X}^T \textbf{C}) }{\partial \; \textbf{X}} = \textbf{A}^T \textbf{C}^T \textbf{X} \textbf{B}^T + \textbf{C} \textbf{A} \textbf{X} \textbf{B}$

## Relation to other derivatives

There are other commonly used definitions for derivatives in multivariable spaces. In Calculus, a branch of mathematics the derivative is a measurement of how a function changes when the values of its inputs change For topological vector spaces, the most familiar is the Fréchet derivative, which makes use of a norm. In Mathematics, a topological vector space is one of the basic structures investigated in Functional analysis. In Mathematics, the Fréchet derivative is a Derivative defined on Banach spaces Named after Maurice Fréchet, it is commonly used to formalize In Linear algebra, Functional analysis and related areas of Mathematics, a norm is a function that assigns a strictly positive length In the case of matrix spaces, there are several matrix norms available, all of which are equivalent since the space is finite-dimensional. In Mathematics, a matrix norm is a natural extension of the notion of a Vector norm to matrices. However the matrix derivative defined in this article makes no use of any topology on M(n,m). Topological spaces are mathematical structures that allow the formal definition of concepts such as Convergence, connectedness, and continuity. It is defined solely in terms of partial derivatives, which are sensitive only to variations in a single dimension at a time, and thus are not bound by the full differentiable structure of the space. In Mathematics, a partial derivative of a function of several variables is its Derivative with respect to one of those variables with the others held constant In Mathematics, an n -dimensional differential structure (or differentiable structure on a set M makes it into an n -dimensional Differential For example, it is possible for a map to have all partial derivatives exist at a point, and yet not be continuous in the topology of the space. See for example Hartogs' theorem. NB that the terminology is inconsistent and Hartogs' theorem may also mean Hartogs' lemma on removable singularities or the result on Hartogs number In Mathematics The matrix derivative is not a special case of the Fréchet derivative for matrix spaces, but rather a convenient notation for keeping track of many partial derivatives for doing calculations, though in the case that a function is Fréchet differentiable, the two derivatives will agree.

## Usages

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers. In mathematical optimization problems the method of Lagrange multipliers, named after Joseph Louis Lagrange, is a method for finding the extrema of This includes the derivation of:

## Alternatives

The tensor index notation with its Einstein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. The Kalman filter is an efficient Recursive filter that estimates the state of a Dynamic system from a series of noisy measurements In Signal processing, the Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in 1949 In Mathematics, especially in applications of Linear algebra to Physics, the Einstein notation or Einstein summation convention is a notational In Mathematics, especially in applications of Linear algebra to Physics, the Einstein notation or Einstein summation convention is a notational It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. Note that a matrix can be considered simply a tensor of rank two.