Belief propagation, also known as the sum-product algorithm, is an iterative algorithm for computing marginals of functions on a graphical model most commonly used in artificial intelligence and information theory. In Mathematics, Computing, Linguistics and related subjects an algorithm is a sequence of finite instructions often used for Calculation In Probability theory, given two jointly distributed Random variables X and Y, the marginal distribution of X is simply the Probability In Probability theory, Statistics, and Machine learning, a graphical model (GM is a graph that represents independencies among Random variables Information theory is a branch of Applied mathematics and Electrical engineering involving the quantification of Information. Judea Pearl in 1982[1] formulated this algorithm on trees, and Kim and Pearl (in 1983)[2] on polytrees. Judea Pearl is a Computer scientist and Philosopher, best known for developing the probabilistic approach to Artificial intelligence, in Pearl (1988)[3] has then suggested this algorithm as an approximation for general (loopy) network. It is an efficient inference algorithm on trees and has demonstrated empirical success in numerous applications including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability. In Graph theory, a tree is a graph in which any two vertices are connected by exactly one path. In Information theory, a low-density parity-check code (LDPC code is an Error correcting code, a method of transmitting a message over a noisy transmission In Electrical engineering and Digital communications, turbo codes (originally in French Turbocodes) are a class of high-performance error correction In Thermodynamics, the term thermodynamic free energy refers to the amount of work that can be extracted from a System, and is helpful in Engineering It is commonly used in pairwise Markov random fields (which have a maximum clique size of 2), Bayesian networks, and factor graphs. A Markov network, or Markov random field, is a model of the (full Joint probability distribution of a set \mathcal{X} of Random variables In Graph theory, a clique in an Undirected graph G is a set of vertices V such that for every two vertices in V there exists an edge connecting A Bayesian network (or a belief network) is a Probabilistic graphical model that represents a set of Variables and their probabilistic independencies In Mathematics, a factor graph is an XF- Bipartite graph where X=\{X_1X_2\dotsX_n\} is a set of variables and F=\{f_1f_2\dotsf_m\}
Recall that the marginal distribution of a single random variable Xi is simply the summation of a joint distribution over all variables except Xi, and let
be an assignment of all variables in the joint distribution:

For the purposes of explaining this algorithm, consider the marginal function, which is simply an unnormalized marginal distribution with a generic global function
:

Contents |
This algorithm functions by passing positive real vector valued messages across edges in a graphical model. In Probability theory, given two jointly distributed Random variables X and Y, the marginal distribution of X is simply the Probability A random variable is a rigorously defined mathematical entity used mainly to describe Chance and Probability in a mathematical way In the study of Probability, given two Random variables X and Y, the joint distribution of X and Y is the distribution The concept of a normalizing constant arises in Probability theory and a variety of other areas of Mathematics. In Mathematics, the real numbers may be described informally in several different ways More precisely, in trees: a vertex sends a message to an adjacent vertex if (a) it has received messages from all of its other adjacent vertices and (b) hasn't already sent one. In Graph theory, a tree is a graph in which any two vertices are connected by exactly one path. For other uses see Vertex. In Graph theory, a vertex (plural vertices) or node is the fundamental unit out Adjacent is an adjective meaning contiguous, adjoining or abutting. Adjacent is an adjective meaning contiguous, adjoining or abutting. So in the first iteration, the algorithm sends messages from all leaf nodes to each of the lone vertices adjacent to those respective leaves and continues sending messages in this manner until all messages have been sent exactly once, hence explaining the term propagation. In Computer science, a leaf node or external node is a node of a Tree data structure that has zero Child nodes Often leaf nodes are It is easily proven that all messages will be sent (there are twice the number of edges of them). Upon termination, the marginal of a variable is simply the product of the incoming messages of all its adjacent vertices. A simple proof of this fact, though somewhat messy, can be done by mathematical induction. Mathematical induction is a method of Mathematical proof typically used to establish that a given statement is true of all Natural numbers It is done by proving that
The message definitions will be described in the factor graph setting, as the algorithms for other graphical models are nearly identical. In Mathematics, a factor graph is an XF- Bipartite graph where X=\{X_1X_2\dotsX_n\} is a set of variables and F=\{f_1f_2\dotsf_m\} In Probability theory, Statistics, and Machine learning, a graphical model (GM is a graph that represents independencies among Random variables Since factor graphs have variable and factor nodes, there are two types of messages to define:
A variable message is a real-valued function that is a message sent from a variable to a factor, and defined as

A factor message is a real-valued function that is a message sent from a factor to a variable, and defined as

where N(u) is defined as the set of neighbours (adjacent vertices in a graph) of a vertex u. In Mathematics, a factor graph is an XF- Bipartite graph where X=\{X_1X_2\dotsX_n\} is a set of variables and F=\{f_1f_2\dotsf_m\} The Mathematical concept of a function expresses dependence between two quantities one of which is given (the independent variable, argument of the function The Mathematical concept of a function expresses dependence between two quantities one of which is given (the independent variable, argument of the function In Mathematics and Computer science, a graph is the basic object of study in Graph theory.
is an assignment to the vertices affecting fm (i. e. vertices in N(fm)).
As mentioned in the description of the algorithm, the marginal of Xi can be computed in the following manner:

One can also compute the marginal of a factor fj, equivalently, the marginal of the subset of variables Xj in the following manner:

Curiously, nearly the same algorithm is used in general graphs. In Mathematics and Computer science, a graph is the basic object of study in Graph theory. The algorithm is then sometimes called "loopy" belief propagation, because graphs typically contain cycles, or loops. Cycle in Graph theory and Computer science has several meanings A closed walk with repeated vertices allowed The procedure must be adjusted slightly because graphs might not contain any leaves. Instead, one initializes all variable messages to 1 and uses the same message definitions above, updating all messages at every iteration (although messages coming from known leaves or tree-structured subgraphs may no longer need updating after sufficient iterations). It is easy to show that in a tree, the message definitions of this modified procedure will converge to the set of message definitions given above within a number of iterations equal to the diameter of the tree. Geometry, a diameter of a Circle is any straight Line segment that passes through the center of the circle and whose Endpoints are on the
The precise conditions under which loopy belief propagation will converge are still not well understood; it is known that graphs containing a single loop will converge to a correct solution. [4] Several sufficient (but not necessary) conditions for convergence of loopy belief propagation to a unique fixed point exist. [5] There exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations. Techniques like EXIT charts can provide an approximate visualisation of the progress of belief propagation and an approximate test for convergence. An Extrinsic information transfer chart, commonly called an EXIT chart, is a technique to aid the construction of good iteratively-decoded Error-correcting codes
There are other approximate methods for marginalization including variational methods and Monte Carlo methods. The variational method is in Quantum mechanics, one way of finding approximations to the lowest energy eigenstate or Ground state, and some excited states Monte Carlo methods are a class of Computational Algorithms that rely on repeated Random sampling to compute their results
One method of exact marginalization in general graphs is called the junction tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree. The junction tree algorithm is a method used in Machine learning for Exact marginalization in general graphs In essence it entails performing Belief The basic premise is to eliminate cycles by clustering them into single nodes.
A similar algorithm is commonly referred to as the Viterbi algorithm, but also known as the max-product or min-sum algorithm, which solves the related problem of maximization, or most probable explanation. The Viterbi algorithm is a Dynamic programming Algorithm for finding the most likely sequence of hidden states &ndash called the Viterbi path Instead of attempting to solve the marginal, the goal here is to find the values
that maximises the global function (i. e. most probable values in a probabilistic setting), and it can be defined using the arg max:

An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxima in the definitions. In Mathematics, arg max (or argmax) stands for the argument of the maximum, that is to say the value of the given argument for which the value
It is worth noting that inference problems like marginalization and maximization are NP-hard to solve exactly and approximately (at least for relative error) in a graphical model. Inference is the act or process of deriving a Conclusion based solely on what one already knows NP-hard (nondeterministic Polynomial-time hard in Computational complexity theory, is a class of problems informally "at least as hard as the hardest problems In the mathematical field of Numerical analysis, the approximation error in some data is the discrepancy between an exact value and some approximation to it More precisely, the marginalization problem defined above is #P-complete and maximization is NP-complete. #P, pronounced "sharp P" or "number P" is a complexity class in Computational complexity theory. In Computational complexity theory, the Complexity class NP-complete (abbreviated NP-C or NPC) is a class of problems having two properties
The sum-product algorithm is related to the calculation of free energy in thermodynamics. In Thermodynamics, the term thermodynamic free energy refers to the amount of work that can be extracted from a System, and is helpful in Engineering In Physics, thermodynamics (from the Greek θερμη therme meaning " Heat " and δυναμις dynamis meaning " A probability distribution

(as per the factor graph representation) can be viewed as a measure of the internal energy present in a system, computed as

The free energy of the system is then

It can then be shown that the points of convergence of the sum-product algorithm represent the points where the free energy in such a system is minimized. In Thermodynamics, the internal energy of a Thermodynamic system, or a body with well-defined boundaries, denoted by  U, or sometimes  Similarly, it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation.
Belief propagation algorithms are normally presented as messages update equations on a factor graph, involving messages between variable nodes and their neighboring factor nodes and vice versa. Considering messages between regions in a graph is one way of generalizing the belief propagation algorithm. There are several ways of defining the set of regions in a graph that can exchange messages. One method uses ideas introduced by Kikuchi in the physics literature, and is known as Kikuchi's cluster variation method.
Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields (messages). This generalization leads to a new kind of algorithm called Survey Propagation (SP), which have proved to be very efficient in NP-complete problems like satisfiability and graph coloring. In Computational complexity theory, the Complexity class NP-complete (abbreviated NP-C or NPC) is a class of problems having two properties In Graph theory, graph coloring is a special case of Graph labeling; it is an assignment of labels traditionally called "colors" to elements of a
The cluster variational method and the survey propagation algorithms are two different improvements to belief propagation. The name generalized survey propagation (GSP) is waiting to be assigned to the algorithm that merges both generalizations.