The Answer to How Many Strains in a Farmhouse Yeast
|
How many strains are there in here? We're finally closer to knowing. (Photo from this brewing session.) |
I don't intend the blog to turn into a pure microbiology blog, but I just had to share this discovery, because I suddenly stumbled on the answer to a question that's been bothering me for years. Let's start by repeating the question.
For the recent farmhouse yeast paper the researchers at VIB in Belgium took 40 samples from each of 44 farmhouse yeast cultures, 1760 samples in total. They then used PCR to figure out which strains were the same, and which ones were different. We got from 2 to 35 different strains strains from each culture. But that only counts the strains we found. When you take 40 samples from one culture and 35 of them turn out to be different, you really have to wonder how many more unique strains were not found simply because they weren't in any of the 40 samples.
This figure, which shows how many strains were found once (the height of the bar on 1 in the graph), twice (the bar on 2), and so on, makes it very clear that there must be more strains in there, because two thirds of the strains were only found once.
Figure 10 from the supplemental note. Sideways axis shows how many times each strain was found, and the height of the bars is how many strains were found that often. |
I made that diagram a couple of years ago, and ever since I've had a nagging feeling that from the shape of that graph it ought to be possible to estimate how many strains there are in total. I started on a couple of attempts at estimating it via simulation, but never took it very far, partly for lack of time.
But, of course, I'm not the first person to have this problem. By chance I stumbled over a Bluesky post that led straight to someone else who'd struggled with this.
In the 1920s and 1930s Alexander Steven Corbet worked in Malaysia for the Rubber Research Institute of Malaysia, collecting butterflies in his spare time. On returning to the UK he organized his collection, eventually publishing in 1934 a massive tome on Malaysian butterflies that appears to have been a minor classic, as it has been reprinted several times over the decades.
Corbet had collected 9000 butterflies, which turned out to belong to 316 separate species. But how many species of butterflies were there actually in Malaysia? Corbet tried to work out a formula, and in 1940 wrote a letter to legendary statistician R. A. Fisher, who was able to solve the problem.
They had a table showing that Corbet had found 118 species once, 74 species twice, 44 species three times, and so on. Based on this, Fisher calculated that if Corbet were to go back to Malaysia and collect another 9000 butterflies, he could expect to find 75 new species in addition to the 316 he already had. Together with C. B. Williams, a third biologist who contributed another two datasets, they published a paper with a general formula for this problem.
The first illustration in Corbet's The Butterflies of the Malay Peninsula. |
This is exactly my problem! I was able to take the data in that diagram at the top, which is exactly the same form as Corbet's table, feed it into Fisher's formula, and find out how many new strains the Belgians could expect to find if they took another 1760 samples from the same cultures.
284 new strains is the answer.
That's quite a lot, which immediately suggests the question: what if we did it more times? How many more new strains would we get then? Unfortunately, Fisher's formula cannot answer that question. But the Wikipedia page pointed to another paper, published in 2016, promising an optimal formula for taking this prediction as far as it's mathematically possible for it to go.
It turns out the new formula is only accurate up to log(n), where n is the number of samples you have. So for our case we can only estimate what happens if we take 7 times 1760 more samples. After some struggle I was able to compute this more sophisticated estimate, and found that if we sampled seven more times we could expect 952 new strains. Which is rather a lot, as it would give a total of 1529 strains.
Let's stop and consider that for a second. 1529 strains has to be more than all the modern ale yeast in existence. Even the biggest-ever yeast genetics study has only 310 strains in the modern beer superclade, and even that number includes a lot of baking yeasts and other yeasts that aren't necessarily from a pure brewing background. And 1529 strains is from only 44 cultures, but so far we know of 65 farmhouse brewing cultures. So if we looked at all the cultures we'd most likely get more than 2000 strains. That's more than 2/3 the total number of strains in that biggest-ever study, which covers wine yeast, sake yeast, wild yeast, etc etc. There is a lot of farmhouse yeast.
However, even that is not the maximum number. Since this new, more sophisticated formula lets me do an estimate for collecting samples 2 times, 3 times, etc, we can do a diagram. And I can cheat by fitting a curve to that and just projecting it onwards, hoping that the shape stays roughly the same. If it does, the total number of new strains seems likely to be less than 1100, for a total of about 1750. (This last number is obviously just a very rough ballpark estimate, but at least it gives some sense of what the number might be.)
Chart showing how many strains we could expect to find if we take 1760 more samples X times. |
Having done this for all strains across the 44 cultures we can also do it for each culture, since I have the same data for each culture. However, since we only have 40 samples per culture I can only predict what happens if we take 3 times 40 more samples. Still, at least that gives some sense of the scale. See below.
Chart showing the number of strains found in each culture, and the estimated number we'd find if we sampled three more times. |
Apparently the most complex cultures can have well over fifty strains each, but even if we sample many more times they're probably still below a hundred. You'll note that the more strains a culture has, the more new strains we can expect to find, although the estimate is a bit more sophisticated than that. We found fewer strains in #21 than in #20, but #21 should have more strains overall. The same goes for #53 and #54.
Having a sense of these numbers is important, because it gives us a sense of the odds that different researchers working on the same cultures pick different strains. For some cultures, it turns out, that's quite likely, which explains some of the differences in the results that Richard Preiss et al got from the same cultures that VIB also studied. It also shows how many samples are necessary if we want to look at the evolution of a culture over time.
But mostly getting these results makes me happy because finally I have something like an answer to how many strains there actually are overall. I imagined there was a possibility that there might be a lot more strains than this, but it seems there isn't, and in a way that's a relief. There are a lot of strains, but not a number that's completely impossible to handle. That makes future research somewhat more manageable.
Similar posts
What's in a Farmhouse Yeast?
A new major paper on farmhouse yeast has just been published, which gives more insight into what's in a farmhouse yeast than we've ever had before
Read | 2026-04-06 21:24
How Many Strains in a Farmhouse Yeast?
The new Verstrepen lab paper on farmhouse yeast (blog post introducing it) gives us a level of insight into these cultures that we've never had before
Read | 2026-04-12 11:48
Emil Chr. Hansen and the yeast revolution
In the late summer of 1883, disaster struck at the Carlsberg brewery in Copenhagen
Read | 2017-09-10 14:05
Comments
No comments.