Potentially daft question about data accumulation
Hi,
I just wanted to canvass opinion here of people who know much more than me on 2 related issues.
Issue 1
Say we have 2 datasets both with the same treatment and control in. We could analyse in 2 ways:
- Perform our test on Dataset 1, with a default 0.707 prior. Use the posterior as the prior to perform the test on dataset 2
Or
- Put all the data together and perform the test with the 0.707 prior.
Are these two equivalent, are they very close or coudl they feasibly be very differnt?
Issue 2
What are the board's thoughts on using Bayesian for exploratory analysis and using the data from which the hypotheses were generated in a more confirmatory analysis, after collecting more data? My opinion is, as long as you are open about how many exploratory outcomes you tested (the more tested the more likely findings are spurious) and as long as you collect a decent chunk of data in a more confirmatory study (Maybe 2-4 times the amount as in an exploratory study) this should be fine. In other words, as long as you are open about what you did, and the evidence can be judged accordingly. Obviously, if you look at 10,000 outcomes, and only add 2x the data in the hypothesis generating group, your results are not going to be very credible.
Specifically, in some data collected for an MA we explored 8 possibly models. We found anecdotal evidence for an effect in (n=51) participants. We collected another n=79, to probe this outcome further, and we find extreme evidence for this effect. I tend to believe that the effect is not spurious based on that evidence.
Best,
Gareth.
Comments
Hi Gareth,
Cheers,
E.J.
Hi,
Oh that is Brilliant! It is nice to get a second oppinion. (I was beiang a bit lazy with the first question)
Regrding Issue 2. I am also writing a paper about this at the moment. I'll add in my perspective as I think it might be interesting.
So I am a medical statistician by trade but I used to be a quantitative linguist. The paper I am writing is for a lingusitics journal. I do not think that you should explore and comfirm (unplanned) on the same data in med stats, I do think that it is "fine", not ideal, but "fine" in Linguistics. I will explain why I think this.
Med stats is really well funded and it is possiblee to get as much data as you need, also the consequenses of getting things wrong can be really serious. Linguistics is really poorly funded, outside a few key areas, the consequenses of getting somethign wrong are not quite as serious. Most of the data that is currently collecetd inlingustics is on small datasets by individual researchers and, although many lingusits don't know it, has very little right to be called comfirmitory. There is a movement towards the pre-registration of studies in linguistics at the moment, which I think is great, but I do think that that might stifle small scale exploratory resaerch somewhat. That is why I'm suggesting a Bayesian framework for upgrading Exploratory to more comfirmitory outcomes in lingusitics research. Ideally, you woudl do a new study from scratch, but that is often not possible becasue of lack of funding in lingusitics. I think that a "Bayesian upgrade" is a good half way house that is achievable for smaller-scale researchers to have some claim to having produces comfirmitory results. Not allowing them to add the data on which the hypotheses were generated limits the ammount of this kind of work that will ever see the light of day.
In summary, I think you need to make a judgement call on the consequences of making a wrong inference and weigh that up against the narrowing of reasearch focus that requiring a completely new dataset woudl bring.
Anyway, that is my rather pragmatic perspective on it. I also see your point too and would 100% agree in an ideal world.
Gareth.