Tuesday, June 19, 2012

Some thoughts about beer reviewing

Go read this article about wine, then come back here.

All done? Good. I want to make a few points about beer reviewing here.

First, I have to say I don't think this actually applies neatly to beer. There really is a difference between Pliny the Elder and the average DIPA. One can pick Pliny out of a blind sample, I've done it. There are meaningful differences in taste there. Enjoyment, however, is much more complicated. In terms of enjoyment, there's very little separating Pliny from, say, Heady Topper, or Duet, or most of the "world-class" DIPAs. At that level (and, hell, even between levels) personal preferences are going to completely determine what you prefer. This point is so banal that it's actually tautological, but I think it gets forgotten entirely too often.

The reason that I wanted you to read that post was because it touches on a lot of the reasons that I loathe beer rankings, and especially beer judging at events (such as the World Beer Cup). Fair warning, this post is going to be loooong. I've been wanting to write this for a long time, and now that I have a stupid blog again I have a place for it.

My problem with beer judging takes three forms, which I call Platonic Forms, Begging the Question, and Significance. I'll go over each in detail, then talk a bit about ranking in Objectification.

Platonic Forms

The idea of Platonic forms is that there exists a perfect "form" of certain qualities, such as Justice or Beauty. They don't materially exist, and can never be achieved per se, but they can be contemplated, and they are what one strives for in their realm (so when building a state you want to be as close to the form of Justice as possible). I'm butchering this a bit, but it doesn't really matter here. My point is that beer judging does the same thing, except without the admission that the form doesn't exist. Judging is done to a style, usually BJCP, meaning that beer is judged to some definition that a bunch of dudes in some room somewhere agree on.

The fact that this makes the whole endeavor pointless should be pretty obvious. Even if you can get enough people to agree on styles (which you actually can't, as anyone who has ever tried to talk about this with other opinionated people will know) you're left with a bunch of boundaries drawn on continua. It's the same problem that taxonomy has, except stupider because it doesn't matter for anything. You end up with a bunch of dudes arguing over where to place a thing while the thing itself just keeps on doing what it does.

So from the beginning the very idea is inane. Rating a beer as best to style isn't useful, it's arbitrary and pointless. (Of course, rating at all is arbitrary and pointless, but judging acts like it's not.)

Begging the Question

As a scientist this bothers me more. At its core doing an experiment is simply asking a question and figuring out the best way to get the answer. Ideally you want to design it such that minor changes in experimental design don't affect things much. So if I'm trying to figure out how much radioactivity is in a sample, if I count it for N hours or N+1 hours (or even 2N hours) I should get the same answer. If the answer you get is highly dependent on the input parameters, then it's clear that you have a bad experimental design or are asking a bad question.

As you likely guessed, beer judging has bad experimental design. The number of judges is small enough that changing out individuals can affect results, it's unclear how careful a given judging session is with sampling order (which can clearly affect things, palate fatigue is real), and the scoring system is completely arbitrary. It's the last thing that I'm most clearly referring to with "begging the question", because the conclusion assumes its premises. That is, they're not asking "what is the best beer of these options?" but rather "Which of the beers in this list is ranked the highest according to these judges and this formula?" And the answer to that is always "The one that's ranked highest according to these judges and this formula." If you change the formula, change the judges, change the order, change the snacks they ate beforehand, change who-knows-what-else, you can change the results.

Once again, this makes the whole thing arbitrary and pointless.


This also irks me as a scientist, no one in the beer judging/ranking world seems to understand statistical uncertainty. I don't think that BJCP competitions release their raw numbers, but based on the much larger samples at BA or RB it's pretty likely that the actual results are "we can't tell any of these apart". For instance, right now the BA #1 beer in the world, Pliny the Younger, is within one "pDev" of the #100 beer, Alpine's Great. It's not completely clear to me that pDev is the proper tool for this, but I'm lazy and it purports to be a measure of the variance so I'm using it. This is likely only a one-sigma measurement, too, meaning even if you want only a 95% CI the numbers are even bigger.

At least with BA/RB they provide this number so any intelligent person can see it and say "Oh, well knowing nothing but this score I can say that I would like these about the same." Which is true! BA also has very broad score ranges, which are really what you should look at. (That is, all "A" beers are pretty much the same as each other, all "B" beers are pretty much the same as each other, but there should be a real drop between A and B.) But with judging they award medals to the winners! Winners that likely aren't even close to being actually different from each other!

This alone makes the whole thing look like it's designed and run by morons, but combined with the rest you really have to wonder if the people at the BJCP have even thought about the pointlessness of this endeavor.


This is really my main problem with the whole thing, which is that it's trying to make the subjective into something objective. For whatever reason people love debating the undebatable, and beer ratings/competitions are simply a manifestation of that in the beer community. This is all well and good if you take it only as a frivolity, something to be laughed at, or something to debate while drunk. But some people seem to really care, to think that there's meaning behind the medals, behind the lists. There's not. It's all masturbation. Again, that's fine, but you can't do it too often and it's unseemly in public.

This is entirely too many words for the obviously banal statement "Beer judging/rankings are retarded." But there you go.


