In probability theory, Bayes' theorem (often called Bayes' law) relates the conditional and marginal probabilities of two random events. Probability theory is the branch of Mathematics concerned with analysis of random phenomena Conditional probability is the Probability of some event A, given the occurrence of some other event B. In Probability theory, an event is a set of outcomes (a Subset of the Sample space) to which a probability is assigned It is often used to compute posterior probabilities given observations. The posterior probability of a Random event or an uncertain proposition is the Conditional probability that is assigned after the relevant evidence is taken For example, a patient may be observed to have certain symptoms. Bayes' theorem can be used to compute the probability that a proposed diagnosis is correct, given that observation. (See example 2)
As a formal theorem, Bayes' theorem is valid in all common interpretations of probability. In Probability theory, Bayes' theorem (often called Bayes' law after Thomas Bayes) relates the conditional and marginal probabilities of two In Mathematics, a theorem is a statement proven on the basis of previously accepted or established statements See also Philosophy of probability The word Probability has been used in a variety of ways since it was first coined in relation to Games of chance However, it plays a central role in the debate around the foundations of statistics: frequentist and Bayesian interpretations disagree about the ways in which probabilities should be assigned in applications. Foundations of statistics is the usual name for the epistemological debate over how one should conduct Inductive inference from data Frequency probability is the interpretation of probability that defines an event's Probability as the limit of its relative frequency in a large Bayesian probability interprets the concept of Probability as 'a measure of a state of knowledge'. Frequentists assign probabilities to random events according to their frequencies of occurrence or to subsets of populations as proportions of the whole, while Bayesians describe probabilities in terms of beliefs and degrees of uncertainty. The articles on Bayesian probability and frequentist probability discuss these debates at greater length. Bayesian probability interprets the concept of Probability as 'a measure of a state of knowledge'. Frequency probability is the interpretation of probability that defines an event's Probability as the limit of its relative frequency in a large
Contents |
Bayes' theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability:

Each term in Bayes' theorem has a conventional name:
Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing 'A' are updated by having observed 'B'.
Bayes' theorem can also be interpreted in terms of likelihood:

Here L(A|b) is the likelihood of A given fixed b. In Statistics, the likelihood function (often simply the likelihood) is a function of the Parameters of a Statistical model that plays a key role The rule is then an immediate consequence of the relationship
.
With this terminology, the theorem may be paraphrased as

(where α is a normalising constant equal to P(B)).
In words: the posterior probability is proportional to the product of the prior probability and the likelihood.
To derive the theorem, we start from the definition of conditional probability. Conditional probability is the Probability of some event A, given the occurrence of some other event B. The probability of event A given event B is

Equivalently, the probability of event B given event A is

Rearranging and combining these two equations, we find

This lemma is sometimes called the product rule for probabilities. In Mathematics, a lemma (plural lemmata or lemmas from the Greek λήμμα "lemma" meaning "anything which is received Dividing both sides by P(B), providing that it is non-zero, we obtain Bayes' theorem:

Bayes' theorem is often embellished by noting that

where AC is the complementary event of A (often called "not A"). In Discrete mathematics and predominantly in Set theory, a complement is a concept used in comparisons of sets to refer to the unique values of one set in relation So the theorem can be restated as

More generally, where {Ai} forms a partition of the event space,

for any Ai in the partition. In Mathematics, a partition of a set X is a division of X into non-overlapping " parts " or " blocks "
See also the law of total probability. In Probability theory, the law of total probability is that "the Prior probability of A is equal to the prior Expected value of the
Bayes' theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as

where
are the odds of A given B,
and
are the odds of A by itself,
while
is the likelihood ratio. In Statistics, the likelihood function (often simply the likelihood) is a function of the Parameters of a Statistical model that plays a key role In Probability theory and Statistics the odds in favour of an event or a Proposition are the quantity p  / (1 &minus 
There is also a version of Bayes' theorem for continuous distributions. In Probability theory and Statistics, a probability distribution identifies either the probability of each value of an unidentified Random variable It is somewhat harder to derive, since probability densities, strictly speaking, are not probabilities, so Bayes' theorem has to be established by a limit process; see Papoulis (citation below), Section 7. In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability 3 for an elementary derivation. Bayes' theorem for probability densities is formally similar to the theorem for probabilities:

There is an analogous statement of the law of total probability, which is used in the denominator:

As in the discrete case, the terms have standard names. In Probability theory, the law of total probability is that "the Prior probability of A is equal to the prior Expected value of the
is the joint distribution of X and Y,
is the posterior distribution of X given Y=y,
is (as a function of x) the likelihood function of X given Y=y,and

and

are the marginal distributions of X and Y respectively, with
being the prior distribution of X.
Given two absolutely continuous probability measures P˜Q on the probability space
and a sigma-algebra
, the abstract Bayes theorem for a
-measurable random variable X becomes
. In Mathematics, one may talk about absolute continuity of functions and absolute continuity of measures, and these two notions are closely connected A probability space, in Probability theory, is the conventional Mathematical model of Randomness. This formulation is used in Kalman filtering to find Zakai equations. The Kalman filter is an efficient Recursive filter that estimates the state of a Dynamic system from a series of noisy measurements The Zakai equation is a linear recursive filtering Equation for the un-normalized density of a hidden state It is also used in financial mathematics for change of numeraire techniques. Mathematical finance is the branch of Applied mathematics concerned with the Financial markets. Numéraire is a basic standard by which values are measured such as gold in a monetary system
Theorems analogous to Bayes' theorem hold in problems with more than two variables. For example:

This can be derived in a few steps from Bayes' theorem and the definition of conditional probability:

Similarly,

which can be regarded as a conditional Bayes' Theorem, and can be derived by as follows:

A general strategy is to work with a decomposition of the joint probability, and to marginalize (integrate) over the variables that are not of interest. In the study of Probability, given two Random variables X and Y, the joint distribution of X and Y is the distribution In Probability theory, given two jointly distributed Random variables X and Y, the marginal distribution of X is simply the Probability Depending on the form of the decomposition, it may be possible to prove that some integrals must be 1, and thus they fall out of the decomposition; exploiting this property can reduce the computations very substantially. A Bayesian network, for example, specifies a factorization of a joint distribution of several variables in which the conditional probability of any one variable given the remaining ones takes a particularly simple form (see Markov blanket). A Bayesian network (or a belief network) is a Probabilistic graphical model that represents a set of Variables and their probabilistic independencies In the study of Probability, given two Random variables X and Y, the joint distribution of X and Y is the distribution In Machine learning, the Markov blanket for a node A in a Bayesian network is the set of nodes \partial A composed of A's
Suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?
Intuitively, this should be greater than half since bowl #1 contains the same number of cookies as bowl #2, yet it has more plain.
We can clarify the situation by rephrasing the question to "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” The event A is that Fred picked bowl #1, and the event B is that Fred picked a plain cookie. To compute P(A|B), we first need to know:
Given all this information, we can compute the probability of Fred having selected bowl #1 given that he got a plain cookie by substitution:

As we expected, it is more than half.
It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. In Statistics the frequency of an event i is the number ni of times the event occurred in the Experiment or the study The tables below illustrate the use of this method for the cookies.
| Number of cookies in each bowl by type of cookie |
Relative frequency of cookies in each bowl by type of cookie |
|||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, i. e. dividing each number by 80.
Bayes' theorem is useful in evaluating the result of drug tests. For the episode of the American television series The Office, see " Drug Testing " Suppose a certain drug test is 99% sensitive and 99% specific, that is, the test will correctly identify a drug user as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. This would seem to be a relatively accurate test, but Bayes' theorem will reveal a potential flaw. Let's assume a corporation decides to test its employees for opium use, and 0. Opium is a Narcotic formed from the Latex (ie sap released by lacerating (or "scoring" the immature seed pods of opium poppies ( 5% of the employees use the drug. We want to know the probability that, given a positive drug test, an employee is actually a drug user. Probability is the likelihood or chance that something is the case or will happen Let "D" be the event of being a drug user and "N" indicate being a non-user. Let "+" be the event of a positive drug test. We need to know the following:
Given this information, we can compute the posterior probability P(D|+) of an employee who tested positive actually being a drug user:

Despite the high accuracy of the test, the probability that an employee who tested positive actually did use drugs is only about 33%, so it is actually more likely that the employee is not a drug user. The rarer the condition for which we are testing, the greater the percentage of positive tests that will be false positives.
Applications of Bayes' theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities. Bayesian probability interprets the concept of Probability as 'a measure of a state of knowledge'. One such example follows. For additional worked out examples, including simpler examples, please see the article on the examples of Bayesian inference. Bayesian inference is Statistical inference in which evidence or observations are used to update or to newly infer the Probability that a hypothesis may be true
We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the 'prior'. A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. The conditional distribution of A given the "data" B is the posterior probability distribution or just the 'posterior'. The posterior probability of a Random event or an uncertain proposition is the Conditional probability that is assigned after the relevant evidence is taken
Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". In Probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes' theorem we can calculate the probability distribution function for r using

From this we see that from the prior probability density function f(r) and the likelihood function L(r) = f(m = 7|r, n = 10), we can compute the posterior probability density function f(r|n = 10, m = 7).
The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval [0, 1]. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely.
Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution. WikipediaWikiProject Probability#Standards for a discussion of standards used for probability distribution articles such as this one

As with the prior, the likelihood is open to revision -- more complex assumptions will yield more complex likelihood functions. Maintaining the current assumptions, we compute the normalizing factor,

and the posterior distribution for r is then

for r between 0 and 1, inclusive.
One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i. e. , the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is

which is about an "89% chance".
We are presented with three doors - red, green, and blue - one of which has a prize. The Monty Hall problem is a Probability puzzle loosely based on the American television game show Let's Make a Deal. We choose the red door, which is not opened until the presenter performs an action. The presenter who knows what door the prize is behind, and who must open a door, but is not permitted to open the door we have picked or the door with the prize, opens the blue door and reveals that there is no prize behind it and subsequently asks if we wish to change our mind about our initial selection of red. What is the probability that the prize is behind each of the green and red doors?
Let us call the situation that the prize is behind a given door Ar, Ag, and Ab.
To start with,
, and to make things simpler we shall assume that we have already picked the red door.
Let us call B "the presenter opens the blue door". Without any prior knowledge, we would assign this a probability of 50%.
Thus,

Note how this depends on the value of P(B).
An investigation by a statistics professor (Stigler 1983) suggests that Bayes' theorem was discovered by Nicholas Saunderson some time before Bayes. Nicholas Saunderson (1682&ndash19 April 1739 was an English Scientist and Mathematician.
Bayes' theorem is named after the Reverend Thomas Bayes (1702–1761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). Thomas Bayes (c 1702 &ndash 17 April 1761) was a British Mathematician and Presbyterian minister known for having formulated Year 1702 ( MDCCII) was a Common year starting on Sunday (link will display the full calendar of the Gregorian calendar (or a Common year Year 1761 ( MDCCLXI) was a Common year starting on Thursday (link will display the full calendar of the Gregorian calendar (or a WikipediaWikiProject Probability#Standards for a discussion of standards used for probability distribution articles such as this one His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. Richard Price ( February 23, 1723 &ndash April 19, 1791) was a Welsh moral and political philosopher Year 1763 ( MDCCLXIII) was a Common year starting on Saturday (link will display the full calendar of the Gregorian calendar (or a Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work. Year 1774 ( MDCCLXXIV) was a Common year starting on Saturday (link will display the full calendar of the Gregorian calendar (or a
One of Bayes' results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur:
Note that the expression says nothing about the order in which the events occurred; it measures correlation, not causation. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes' Theorem (as described above), but it does not appear that Bayes himself emphasized or focused on that result.
Bayes' main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is

where m is the number of observed successes and n the number of observed failures. A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. In Elementary algebra, a binomial is a Polynomial with two terms the sum of two Monomials It is the simplest kind of polynomial except for a monomial
What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind.
Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball.
Stephen Fienberg [[1]] describes the evolution of the field from "inverse probability" at the time of Bayes and Laplace, and even of Harold Jeffreys (1939) to "Bayesian" in the 1950's. Stephen Fienberg (born November 27, 1942) is the Maurice Falk Professor of Statistics and Social Science in the Department of Statistics the Sir Harold Jeffreys ( 22 April 1891 &ndash 18 March 1989) was a mathematician statistician geophysicist and astronomer The irony is that this label was introduced by R.A. Fisher in a derogatory sense. Sir Ronald Aylmer Fisher, FRS ( 17 February 1890 – 29 July 1962) was an English Statistician, Evolutionary So, historically, Bayes was not a "Bayesian". It is actually unclear whether or not he was a Bayesian in the modern sense of the term, i. e. whether or not he was interested in inference or merely in probability: the 1763 essay is more of a probability paper.