Bayesian probability

Bayesian probability is an interpretation of the probability calculus which holds that the concept of probability can be defined as the degree to which a person (or community) believes that a proposition is true. Bayesian theory also suggests that Bayes' theorem can be used as a rule to infer or update the degree of belief in light of new information.

History

File:Thomasbayes.jpg

Thomas Bayes. (The correct identification of this portrait has been questioned.)

Bayesian theory and Bayesian probability are named after Thomas Bayes (1702 — 1761), who proved a special case of what is now called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and it is not clear that Bayes would have endorsed the narrow specifically subjectivist interpretation of probability that is now associated with his name. Laplace proved a more general version of Bayes' theorem and used it to solve problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this general theorem to be important for the conceptual definition of probability. He instead adhered to the classical definition of probability.

The subjective theory of probability which interprets 'probability' as 'subjective degree of belief in a proposition' was proposed independently and at about the same time by Bruno de Finetti in Italy in Fondamenti Logici del Ragionamento Probabilistico (1930) and Frank Ramsey in Cambridge in The Foundations of Mathematics (1931).^[1] It was devised to solve the problems of the classical definition of probability and replace it. L. J. Savage expanded the idea in The Foundations of Statistics (1954).

Formal attempts have been made to define and apply the intuitive notion of a "degree of belief". One interpretation is based on betting: a degree of belief is reflected in the odds and stakes that the subject is willing to bet on the proposition at hand. However, there may be problems with trying to use betting to measure the strength of someone's belief in a universal scientific law such as Newton's law of inertia or his law of universal gravitation. ^[2]

On the Bayesian interpretation, the theorems of probability relate to the rationality of partial belief in the way that the theorems of logic are traditionally seen to relate to the rationality of full belief.

The Bayesian approach has been explored by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman, and many philosophers of the 20th century.

Recently, it has been shown that Bayes' Rule and the Principle of Maximum Entropy (MaxEnt) are completely compatible and can be seen as special cases of the Method of Maximum (relative) Entropy (ME). This method reproduces every aspect of orthodox Bayesian inference methods. In addition this new method opens the door to tackling problems that could not be addressed by either the MaxEnt or orthodox Bayesian methods individually.^[3]

Varieties

The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis. Some of the people mentioned here would not call themselves Bayesians.

Subjective Bayesian probability interprets 'probability' as 'the degree of belief (or strength of belief) an individual has in the truth of a proposition', and is in that respect subjective. Some people who call themselves Bayesians do not accept this subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.

Advocates of logical (or objective epistemic) probability, such as Harold Jeffreys, Rudolf Carnap, Richard Threlkeld Cox and E.T. Jaynes, hope to codify techniques whereby any two persons having the same information relevant to the truth of an uncertain proposition would calculate the same probability. Such probabilities are not relative to the person but to the epistemic situation, and thus lie somewhere between subjective and objective. The methods proposed are not without controversy. Critics challenge the claim that there are grounds for preferring one degree of belief over another in the absence of information about the facts to which those beliefs refer. However, these criticisms are usually reconciled once the question one is trying to ask is clear. It now has been shown that Principle of Maximum Entropy and Bayes' Rule are completely compatible and can be seen as special cases of the Method of Maximum (relative) Entropy (ME).

The Controversy between Bayesian and Frequentist Probability

Bayesian probability - sometimes called credence (i.e. degree of belief) - contrasts with frequency probability, in which probability is derived from observed frequencies in defined distributions or proportions in populations.

The theory of statistics and probability using frequency probability was developed by R.A. Fisher, Egon Pearson and Jerzy Neyman during the first half of the 20th century. A. N. Kolmogorov also used frequency probability to lay the mathematical foundation of probability in measure theory via the Lebesgue integral in Foundations of the Theory of Probability (1933). Savage, Koopman, Abraham Wald and others have developed Bayesian probability since 1950.

The difference between Bayesian and Frequentist interpretations of probability has important consequences in statistical practice. For example, when comparing two hypotheses using the same data, the theory of hypothesis tests, which is based on the frequency interpretation of probability, allows the rejection or non-rejection of one model/hypothesis (the 'null' hypothesis) based on the probability of mistakenly inferring that the data support the other model/hypothesis more. The probability of making such a mistake, called a Type I error, requires the consideration of hypothetical data sets derived from the same data source that are more extreme than the data actually observed. This approach allows the inference that 'either the two hypotheses are different or the observed data are a misleading set'. In contrast, Bayesian methods condition on the data actually observed, and are therefore able to assign posterior probabilities to any number of hypotheses directly. The requirement to assign probabilities to the parameters of models representing each hypothesis is the cost of this more direct approach.

Although there is no reason why different interpretations (senses) of a word cannot be used in different contexts, there is a history of antagonism between Bayesians and frequentists, with the latter often rejecting the Bayesian interpretation as ill-grounded. The groups have also disagreed about which of the two senses reflects what is commonly meant by the term 'probable'. More importantly, the groups have agreed that Bayesian and Frequentist analyses answer genuinely different questions, but disagreed about which class of question it is more important to answer in scientific and engineering contexts.

Applications

Since the 1950s, Bayesian theory and Bayesian probability have been widely applied through Cox's theorem, Jaynes' principle of maximum entropy and the Dutch book argument. In many applications, Bayesian methods are more general and appear to give better results than frequency probability. Bayes factors have also been applied with Occam's Razor. See Bayesian inference and Bayes' theorem for mathematical applications.

Some regard the scientific method as an application of Bayesian probabilist inference because they claim Bayes's Theorem is explicitly or implicitly used to update the strength of prior scientific beliefs in the truth of hypotheses in the light of new information from observation or experiment. This is said to be done by the use of Bayes's Theorem to calculate a posterior probability using that evidence and is justified by the Principle of Conditionalisation that P'(h) = P(h/e), where P'(h) is the posterior probability of the hypothesis 'h' in the light of the evidence 'e', but which principle is denied by some ^[4] Adjusting original beliefs could mean (coming closer to) accepting or rejecting the original hypotheses.

Bayesian techniques have recently been applied to filter spam e-mail. A Bayesian spam filter uses a reference set of e-mails to define what is originally believed to be spam. After the reference has been defined, the filter then uses the characteristics in the reference to define new messages as either spam or legitimate e-mail. New e-mail messages act as new information, and if mistakes in the definitions of spam and legitimate e-mail are identified by the user, this new information updates the information in the original reference set of e-mails with the hope that future definitions are more accurate. See Bayesian inference and Bayesian filtering.

Probabilities of probabilities

One criticism levelled at the Bayesian probability interpretation has been that a single probability assignment cannot convey how well grounded the belief is—i.e., how much evidence one has. Consider the following situations:

You have a box with white and black balls, but no knowledge as to the quantities
You have a box from which you have drawn n balls, half black and the rest white
You have a box and you know that there are the same number of white and black balls

The Bayesian probability of the next ball drawn being black is 0.5 in all three cases. Keynes called this the problem of the "weight of evidence". One approach is to reflect difference in evidential support by assigning probabilities to these probabilities (so-called metaprobabilities) in the following manner:

1. You have a box with white and black balls, but no knowledge as to the quantities

Letting <math>\theta = p</math> represent the statement that the probability of the next ball being black is <math>p</math>, a Bayesian might assign a uniform Beta prior distribution:

<math>\forall \theta \in [0,1]</math>

<math>P(\theta) = \Beta(\alpha_B=1,\alpha_W=1) = \frac{\Gamma(\alpha_B + \alpha_W)}{\Gamma(\alpha_B)\Gamma(\alpha_W)}\theta^{\alpha_B-1}(1-\theta)^{\alpha_W-1} = \frac{\Gamma(2)}{\Gamma(1)\Gamma(1)}\theta^0(1-\theta)^0=1.</math>

Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, <math>P(\theta|m,n)</math>, after drawing m additional black balls and n white balls is still a Beta distribution, with parameters <math>\alpha_B=1+m</math>, <math>\alpha_W=1+n</math>. An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.

2. You have a box from which you have drawn N balls, half black and the rest white

Letting <math>\theta = p</math> represent the statement that the probability of the next ball being black is <math>p</math>, a Bayesian might assign a Beta prior distribution, <math>\Beta(N/2+1,N/2+1)</math>. The maximum aposteriori estimate (MAP estimate) of <math>\theta</math> is <math>\theta_{MAP}=\frac{N/2+1}{N+2}</math>, precisely Laplace's rule of succession.

3. You have a box and you know that there are the same number of white and black balls

In this case a Bayesian would define the prior probability <math>P\left(\theta\right)=\delta\left(\theta - \frac{1}{2}\right)</math>.

Other Bayesians have argued that probabilities need not be precise numbers.

Because there is no room for metaprobabilities on the frequency interpretation, frequentists have had to find different ways of representing difference of evidential support. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.

Footnotes

↑ See p50-1, Gillies 2000 "The subjective theory of probability was discovered independently and at about the same time by Frank Ramsey in Cambridge and Bruno de Finetti in Italy." See Gillies' discussion for its explanation of how the wrong impression came about that Ramsey proposed it first.
↑ e.g. see Gillies 2000, p55: "My own view is that betting does give a reasonable measure of the strength of a belief in many cases, but not in all. In particular, betting cannot be used to measure the strength of someone's belief in a universal scientific law or theory."
↑ See Giffin and Caticha 2007 "Updating Probabilities with Data and Moments",(http://arxiv.org/abs/0708.1593)
↑ See Updating Belief, Chapter 6 of Howson & Urbach 1993, p99-114 and its references to the discussions of Bayesian Conditionalisation of Hacking 1967, Kyburg, Skyrms 1987 and Jeffrey 1965 etc.

External links and references

tutorial on Bayesian probabilities
On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; arguments in favour of Bayesian methods (in the style of Edwin Jaynes); state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
A nice on-line introductory tutorial to Bayesian probability from Queen Mary University of London
An Intuitive Explanation of Bayesian Reasoning A very gentle introduction by Eliezer Yudkowsky
Giffin, A. and Caticha, A. 2007 Updating Probabilities with Data and Moments
Gillies, D.Philosophical theories of probability Routledge 2000
Hacking, I. 1965 The Logic of Statistical Inference CUP
Hacking, I. 1967 'Slightly More Realistic Personal Probability' Philosophy of Science vol34
Hacking, I. 2006 The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference: A Philosophical Study of Early ... on Statistical and Probabilistic Mathematics Cambridge University Press
Jaynes, E.T. (2003) Probability Theory : The Logic of Science Cambridge University Press.
Jaynes, E.T. (1998) Probability Theory : The Logic of Science.
Jeffrey, R.C. 1983 The Logic of Decision University of Chicago Press
Jeffrey, R.C. 2004 Subjective Probability: The Real Thing, Cambridge University Press
Kyburg, H.E. 1974 The Logical Foundations of Statistical Inference Reidel
Kyburg, H.E. 1983 Epistemology and Inference University of Minnesota Press
Kyburg, H.E. 1987 'Bayesian versus non-Bayesian Evidential Updating' Artificial Intelligence 31
Kyburg & Smokler (eds) 1980 Studies in Subjective Probability Robert E. Krieger
Lakatos, I. 1968 'Changes in the Problem of Inductive Logic' published as Chapter 8 of Philosophical Papers Volume 2 Cambridge University Press 1978
Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
http://www-groups.dcs.st-andrews.ac.uk/history/Mathematicians/Ramsey.html
David Howie: Interpreting Probability, Controversies and Developments in the Early Twentieth Century, Cambridge University Press, 2002, ISBN 0-521-81251-8
Colin Howson and Peter Urbach: Scientific Reasoning: The Bayesian Approach, Open Court Publishing, 2nd edition, 1993, ISBN 0-8126-9235-7, focuses on the philosophical underpinnings of Bayesian and frequentist statistics. Argues for the subjective interpretation of probability.
Luc Bovens and Stephan Hartmann: Bayesian Epistemology. Oxford: Oxford University Press 2003. Extends the Bayesian program to more complex decision scenarios (e.g. dependent and partially reliable witnesses and measurement instruments) using Bayesian Network models. The book also proofs an impossibility theorem for coherence orderings over information sets and offers a measure that induces a partial coherence ordering.
Jeff Miller "Earliest Known Uses of Some of the Words of Mathematics (B)"
James Franklin The Science of Conjecture: Evidence and Probability Before Pascal, history from a Bayesian point of view.
Paul Graham "Bayesian spam filtering"
Howard Raiffa Decision Analysis: Introductory Lectures on Choices under Uncertainty. McGraw Hill, College Custom Series. (1997) ISBN 0-07-052579-X
Devender Sivia, Data Analysis: A Bayesian Tutorial. Oxford: Clarendon Press (1996), pp. 7-8. ISBN 0-19-851889-7
Skyrms, B. 1987 'Dynamic Coherence and Probability Kinematics' Philosophy of Science vol 54
Henk Tijms: Understanding Probability, Cambridge University Press, 2004
Is the portrait of Thomas Bayes authentic? Who Is this gentleman? When and where was he born? The IMS Bulletin, Vol. 17 (1988), No. 3, pp. 276-278
Ask the experts on Bayes's Theorem, from Scientific American

Philosophy of science

de:Bayesscher Wahrscheinlichkeitsbegriff et:Bayesiaanlus th:ทฤษฎีความน่าจะเป็นแบบเบย์

Template:WikiDoc Sources

[1] See p50-1, Gillies 2000 "The subjective theory of probability was discovered independently and at about the same time by Frank Ramsey in Cambridge and Bruno de Finetti in Italy." See Gillies' discussion for its explanation of how the wrong impression came about that Ramsey proposed it first.

[2] .g. see Gillies 2000, p55: "My own view is that betting does give a reasonable measure of the strength of a belief in many cases, but not in all. In particular, betting cannot be used to measure the strength of someone's belief in a universal scientific law or theory."

[3] See Giffin and Caticha 2007 "Updating Probabilities with Data and Moments",(http://arxiv.org/abs/0708.1593)

[4] See Updating Belief, Chapter 6 of Howson & Urbach 1993, p99-114 and its references to the discussions of Bayesian Conditionalisation of Hacking 1967, Kyburg, Skyrms 1987 and Jeffrey 1965 etc.

[1]

[2]

[3]

[4]