This article is part of the series on distributed Bayesian reasoning. It assumes you have read the previous article on the Basic Math.
The Problem: Small Samples
In Basic Math, we used the informed probability $ P_i $ as the prior beliefs of the meta-reasoner. We calculated this as the ratio:
$$ P_i(๐ด=1|๐ต=1) = \frac{P(๐ด=1,๐ต=1|Bโฅ0)}{P(๐ต=1|B โฅ 0)} = \frac{c(๐ด=1,๐ต=1)}{c(๐ต=1) } $$
But this is actually a little Naive. Suppose only one person actually voted on both ๐ด and ๐ต and accepts both. Then $๐(๐ด=1, ๐ต=1) = ๐(๐ต=1) = 1$, and this ratio is 100%. Is this really a good estimate of the probability that the meta-reasoner would accept ๐ด given they accepted ๐ต?
Certainly not. A single vote from a single user is not a great deal of information. We need a more sophisticated way of estimating the priors of the meta-reasoner based on the evidence we have in the form of arguments and votes.
The Bayesian approach to answering this question requires us to have priors: we actually need to start with an estimate of this probability – or rather, a distribution of possible probabilities – even before we have any data! Then we can use the rules of Bayesian belief updating to combine our priors with our data to come up with a posterior belief.
The Beta-Bernoulli Model
It turns out, we are actually dealing with a textbook example of a problem that can be solved with a simple Bayesian hierarchical model. The solution, using a beta-Bernoulli distribution, is amply described elsewhere (I learned about them from this book). Here is the solution:
Let:
- ฯ = our prior estimate of the probability that the average juror accepts ๐ด before getting any vote data
- ฮบ = our prior estimate of the concentration of likely values around ฯ (high ฮบ means low variance)
- ๐ = $c(A >= 0)$ = the number of users who have voted on ๐ด
- z = $c(A=1)$ = the number of those users who also agree with ๐ด
Then our posterior estimate of the probability that the average user accepts ๐ด is given they have voted on it is:
$$ \label{0} \begin{aligned} \frac{ฯ(ฮบ - 2) + 1 + z}{ฮบ + N} \end{aligned} \tag{0} $$
What should we use as our prior ฯ? That depends on the context. If this method is being implemented in a social platform, then this can be based on historical data. For example if in the past, the average accept/reject ratio for arguments submitted to the platform was 80%, then having nothing else to go on, 80% is a good estimate of ฯ. Our estimate of ฮบ can also be made using historical data.
What we have done here is sometimes called Bayesian Averaging. The above formula essentially gives us a weighted average of our prior ฯ and the observed ratio z/๐, with our data z/๐ getting higher weight the larger the value of N relative to ฮบ.
The Bayesian-Average Probability Function
When calculating values of ๐ up to this point, we have just taking ratios of counts from our votes table (the ๐ function). For example, the formula for ๐(๐ด=a) is just:
$$ P(๐ด=a) = \frac{c(๐ด=a)}{c()} $$
Where c() is the total number of voters. To use a Bayesian approach to estimating probabilities, instead of taking a ratio, we plug these same two counts into $\eqref{0}$.
Let’s define a new function ๐แตฅ that does this for us.
So where, by definition
$$ P(ฮฑ) = \frac{c(ฮฑ)}{c()} $$
We have instead:
$$ P_v(ฮฑ) = \frac{ฯ(ฮบ - 2) + 1 + c(ฮฑ)}{ฮบ + c()} \tag{1}\label{1} $$
And where by definition of conditional probability:
$$ P(ฮฑ|ฮฒ) = \frac{c(ฮฑ,ฮฒ)}{c(ฮฒ)} $$
We have instead
$$ P_v(ฮฑ |\vert ฮฒ) = \frac{ฯ(ฮบ - 2) + 1 + c(ฮฑ,ฮฒ)}{ฮบ + c(ฮฒ)} \tag{2}\label{2} $$
Now let’s compute an actual value of ๐แตฅ(๐ด=1). First, we need to choose priors. Let’s suppose that historically, on average 80% voters accept root claims initially. So ฯ=80%. And let’s suppose the variation in this distribution can be represented by ฮบ=10. So
$$ \begin{aligned} P_v(A=1) &= \frac{ฯ(ฮบ - 2) + 1 + c(A=1)}{ฮบ + c()}\cr &= \frac{(80\%)(10-2) + 1 + 500}{10 + 1000} โ 50.23\% \end{aligned} $$
In this case, the large amount of votes overwhelms our relatively weak prior, and so our result is very close to $๐แตข(๐ด=a) = 50\%$.
Two-Level Bayesian Averaging
Reviewing where we are going with this, recall from the Basic Math article that the justified opinion formula in the case of an argument tree with a single premise argument is:
$$ \label{3} P_h(๐ด=1) = \sum_{b=0}^{1} P_i(๐ด=1|๐ต=b)P_h(๐ต=b) \tag{3} $$
Now we are saying that $ ๐แตข(๐ด=1 \vert ๐ต=b) $ may not be a good estimate that the average person would accept/reject A given they accepted/rejected B. So instead, we want to use Bayesian averaging and use the formula for $ ๐แตฅ(A=1 \vert B=b)$ in place of $๐แตข(๐ด=1 \vert ๐ต=b)$. So substituting $\eqref{2}$ into $\eqref{3}$
$$ \label{4} P_h(๐ด=1) = \sum_{b=0}^{1} \frac{ฯ(ฮบ - 2) + 1 + c(A=a,B=b)}{ฮบ + c(B=b)}P_h(๐ต=b) \tag{4} $$
But what are our priors ฯ and ฮบ?
Recall that we have just used Bayesian averaging to estimate of the probability that the average person accepts ๐ด: $๐แตฅ(๐ด=1)=50.23\%$. This seems like an reasonable prior for our estimate of $๐แตฅ(๐ด=1 \vert ๐ต=b)$. Before considering the 150 users who voted on ๐ต, we have a large amount of data telling us the average user has a roughly even chance of accepting ๐ด, and we have no prior reason to believe that accepting/rejecting ๐ต either increases or decreases this probability. Unless we have strong evidence showing accepting/rejecting ๐ต changes the probability that people accept/reject ๐ด, we should assume it doesn’t.
However if we use $๐แตฅ(๐ด=1)$ as a prior for $๐แตฅ(๐ด=1 \vert ๐ต=b)$, there is a subtle problem: we will be “double counting”. We are counting votes of users for whom ๐ด=1 and ๐ต=b as evidence for estimating $๐แตฅ(๐ด=1)$, and then counting the same votes as evidence for estimating $๐แตฅ(๐ด=1 \vert ๐ต=b)$. So to avoid double counting, our prior should actually be $๐แตฅ(๐ด=1 \vert ๐ตโ b)$.
The priors for $๐แตฅ(๐ด=1 \vert ๐ตโ b)$, on the other hand, can be the same priors we used to calculate $๐แตฅ(๐ด=1)$, because we don’t have anything to go on besides historical data. So let’s ฯ=80% and ฮบ=10. Then let’s start with ๐ต=1, and calculate:
$$ \begin{aligned} P_v(A=1|Bโ 1) &= \frac{ฯ(ฮบ - 2) + 1 + c(A=1,Bโ 1)}{ฮบ + c(Bโ 1)}\cr &= \frac{80\%(10 - 2) + 1 + 420}{10 + 900} โ 46.96% \end{aligned} $$
Now we can set $ฯ=๐แตฅ(๐ด=1 \vert ๐ตโ 1)$ as the prior for calculating $๐แตฅ(๐ด=1 \vert ๐ต=1)$.
What is our prior estimate of ฮบ? We might think that it should be proportional to the number of people who voted on ๐ด, but this is mistaken. A large number of votes on ๐ด provide strong evidence for estimating ฯ = ๐แตฅ(๐ด=a). But our estimate for ฮบ is based on our prior expectations about the degree to which people are influenced by arguments. This information can come from observation of actual variance in the case of past arguments. If this is historical very high, then ฮบ should be low, and vice versa.
For simplicity, let’s use the same prior ฮบ=10 that we used before.
We can now finally calculate:
$$ \begin{aligned} P_v(๐ด=1|B=1) &= \frac{๐แตฅ(๐ด=1|Bโ 1)(ฮบ - 2) + 1 + c(A=1,B=1)}{ฮบ + c(B=1)}\cr &โ \frac{(46.96\%)(10 - 2) + 1 + 80}{10 + 100} โ 77.05\% \end{aligned} $$
This is slightly lower than $๐แตข(๐ด=1 \vert ๐ต=1) = 80%$. This is because we still have a reasonably large number of votes on ๐ต, and these votes provide strong evidence for a posterior value of 80% that overpower the prior estimate.
Clearly, we can extend this reasoning to long argument threads, though we will not do this here.
Further Development
This document is a work in progress – these models have not been fully developed. In fact, we are looking for collaborators. If you are an expert in Bayesian hierarchical models and causal inference, please contact collaborations@deliberati.io.