A few years ago I set out to write a blog on applying statistical methods to NT scholarship. It was something I was experimenting with. One of the previous diagrams that I have linked to on this blog came from that effort.
The great quarry for this was a statistical analysis of the synoptic problem.
The synoptic problem comes from the observation that the gospels of Matthew, Mark and Luke are very similar, even down to using identical wording in many places. It is the similarity of exact phrases that means that they can’t be just connected by events: they aren’t similar because they are all describing the same story. They are more than that. They must have a literary dependency. There must be copying going on.
So the question scholars want to answer is: what order were they written, and who was copying from what?
Grizel linked me to some work online that applies simple statistical tests to this question. There was some similar work done on the authorship of Isaiah, out of BYU. That study wasn’t controlled at all well, statistically. The study in the link above is a little better designed. As part of my NT Math project, I also ran a statistical analysis of the synoptic problem, using a slightly different method that looked at larger phrases (a wider n-gram), but was basically the same. My results were very much the same as Dave Gentile’s.
The results show that, statistically, there’s almost nothing one can say about the synoptic problem beyond some minimal statistical evidence for Markan priority: i.e. it confirms that Mark came first and was used by Luke and Matthew.
His study shows that the bits that are shared between Mark and the others are a fraction more similar, linguistically, to the bits that are unique to Mark, than to the bits unique to either of the others. So the bits that Mark shares with the others are much more likely to have been written by Mark. Good result.
But Markan priority has long-since been settled in the academy anyway, so the rather weak statistical result is unlikely to set the world on fire.
The interesting question is whether Luke used Matthew, or whether both used a lost source. (The ‘Farrer’ hypothesis says the former, the ‘Q’ hypothesis the latter). And Dave Gentile, and I, both found that the error in our statistical analysis was far too great to make any conclusion on that. The experiment neither confirmed or denied either hypotheses. And, as Dave points out in his analysis, there are many many other possible situations with intermediate forms of the gospels which the statistics are also consistent with.
So one of those (very common) statistical experiments where the results tell you nothing of interest. Which is a shame.
I came to the conclusion that the decisive arguments were likely to arise out of close analysis of textual patterns, like Mark Goodacre’s beautiful fatigue argument for Markan priority, rather than from coarse aggregate statistics.