Prior width

Markus · December 2016

Hello!
Right now I am a bit confused regarding the interpretation of the prior width. On the one hand I read some posts in which it is stated that the size of the prior (e.g. .2 or .7) represents “my believe” in the strength of H1 (weak believe .2, strong believe .7). Higher effect sizes are better because they are centered more far away from 0. Lower effect sizes (e.g. .2) are centered nearer to 0 and therefore the probabilities can be less well differentiated. On the other hand I read that breath of the distribution represents my uncertainty of my estimates. Therefore, a broader (width?) distribution represents more uncertainty and a more narrow distribution less uncertainty (e.g., .2 less uncertainty, .7 higher uncertainty; i.e. the other way around compared to my statement above). Can someone help me to understand these things a bit better?
Thank you very much!
Best,
Markus

PieterHog · December 2016

Hi Markus, I think there are three different things at play here:
1. the width of the prior distribution
2. the location of the prior distribution (= the effect size that you expect before you get the data)
3. H1 and H0

_1. Regarding the width of the prior distribution, _let me quote EJ: "...the width equals the interquartile range. So when r=.707 you are 50% confident that the true effect size will lie between -.707 and .707."
/unquote.

If I were very sure that the average effect size is in fact zero, I would chose a very narrow prior. Let's say: 0.1. This means I am 50% confident that the average real effect size is between -0.1 and + 0.1. If I think the effect size is zero, but I am not very confident about it, I may chose a wide prior. Let's say 1. This means I expect (with 50% confidence) the real effect size to be somewhere between -1 and + 1.
tl;dr: a narrower prior equals more confidence; a wider prior equals more uncertainty.

2. The location of the prior. _In JASP it is centered around zero. It would be nice to have priors that are non-zero, and I understand that this request is on the todo list of the team that works on JASP :-) To drive this home: the _location _of the prior has nothing to do with the _width of the prior. I can very be confident or unsure that there is a zero-effect; or I can be uncertain or very confident that there is some strong effect. Let's look at the effect of a painkiller on head ache: I do expect a strong effect, but I am not sure how big this effect will be exactly. This would mean I want a wide prior, centered over some positive effect size.

3. H1 and H0 H0 is not plotted in the JASP graphs, but it is a spike over the zero. In other words: H0 states that the effect size is 0, and H0 is 100% confident that it is zero. Now I see a disadvantage of H1 also centered on zero: if we have a very narrow H1 prior (centered on zero, as is default in JASP), there is not much difference compared to H0 (which is an extremely narrow distribution centered on zero). This would mean that H0 always wins because H1 offers no better explanation for the data than H0 does.

I think one of the sources of confusion (at least to me) is the mix of frequentist concepts (H0 and H1) with Bayesian concepts (prior, evidence, posterior). But I think I get it now - if I am wrong in this post, please feel free to correct me, I'm learning every day.

EJ · December 2016

To chime in:
1. Yes, a narrower prior indicates more confidence. The prior distribution under H1 reflects your certainty about the value of the parameter assuming the effect exists. So the width does not speak to your certainty/belief that H1 is true; this is already assumed. For instance, for a specific genetic effect I might simultaneously believe that the effect is likely to be present (based on prior knowledge) and that the effect (given that it exists) is very small.
2. Yesterday we went through a "prior elicitation" exercise with an expert, in order to obtain a prior distribution whose location is away from zero. This will yield a more informative test. We are working on a paper and a JASP module to allow such "informative hypotheses" with prior distributions away from zero. I am very enthusiastic about this development.
3. Yes, with a narrow prior centered on zero, H1 becomes very similar to H0. This does not mean, however, that H0 always wins; instead, because the models make similar predictions, the data become uninformative and the Bayes factor is driven to 1 (in the limit, H1 -> H0 and the BF -> 1 regardless of the data).
Cheers,
E.J.

Markus · December 2016

PieterHog and EJ: Thank you very much for your great answers. They really helped me a lot!
Best,
Markus

PieterHog · December 2016

E.J wrote: _" .....a prior distribution whose location is away from zero....will yield a more informative test. We are working on a paper and a JASP module to allow such "informative hypotheses" with prior distributions away from zero. I am very enthusiastic about this development." _Great news !

Would the ideal not be, when comparing two hypotheses, that one could chose the location and the dispersion of both priors (and ideally even chosing the function)? Like this:

Prior A
- Function: Cauchy
- location: 1,5
- dispersion: 0,9 (interquartile range)

Prior B
- Function: Cauchy
- location: 0,5
- dispersion: 0,3 (interquartile range)

Then we can see if the data are more likely under hypotheses A versus hypotheses B.

In reality, with standardisation of data, it is not a problem if H0 is centered on zero. But the fact that H0 is a spike is a problem. In fact, I'd be glad to get rid of this weakness that comes from NHST. To quote Andrew Gelman: ["...in social science and public health, I’ve never come across a null hypothesis that could actually be true, or a parameter that could actually be zero." http://andrewgelman.com/2004/12/29/type_1_type_2_t/

As always, please correct me if I'm wrong.

EJ · December 2016

Hi Pieter,

With respect to assigning probability to a spike: I don't have an issue with it, for the following reasons:
1. What the BF assesses is not whether H0 is true. The BF compares the predictive performance of two models (H0 and H1 here). Both models may be wrong. There is nothing in the BF that needs the model to be "true" in some sense.
2. Induction is only possible when we assign mass to a spike. For instance, take the H0: "all crows are black". If you do not assign a spike of probability to the general law (theta=0), you will always be completely convinced that in the infinite population, there is a non-black crow somewhere, regardless of how large your sample of black crows that you've seen (Wrinch & Jeffreys, 1921; see also the overview paper by Alex Etx and myself: https://arxiv.org/abs/1511.08180).
3. If the effect is truly minuscule (in Gelman's analogy: a feather being weighted on a bathroom scale, and the feather is resting loosely in the pouch of a kangaroo, who is vigorously jumping up and down) nobody cares if you treat the statistical problem as if that feather was absent -- the results will be the same.
4. I do believe in true nulls, in science in general and in the social sciences in particular.

Cheers,
E.J.

Howdy, Stranger!

Categories

Prior width

Comments

Howdy, Stranger!

Quick Links

Categories

Prior width

Comments