How the AUC of a ROC curve is like the Journal Impact Factor

dist3 The Journal Impact Factor (or JIF) is the mean number of citations to articles published in a journal in the previous 2 years. Now, the mean is often a good measure of the average but not always. To decide whether it’s a good measure, it is often sufficient to look at a histogram of the data. The image above from a blogpost by Steve Royle shows the citation data for Nature. It is exactly as you would expect: a large number of papers have a small number of citations, while a small number of papers have a large number of citations. In other words, it is exactly the sort of curve for which the mean does not provide any meaningful (an ironic pun) result.

Why? Well, it’s the long tail that really kills it (although we could talk about how skewed it is too). Take 101 papers, 100 of which have 1 citation but one has 100. What’s the mean? 2.0. Say if that one had 1000 citations instead, then the mean is 11.0. The mean is heavily influenced by outliers, and here the long tail provides lots of these. For this reason, the mean does not give any useful measure of the average number of citations as it is just pulled up and down by whatever papers got most cited.

So what’s the link to the AUC of a ROC curve in a typical virtual screening experiment? The AUC has a linear dependance on the mean rank of the actives (see the BEDROC paper), and guess what, that distribution looks very similar to that for citations. For any virtual screening method that is better than random, most of the actives are clustered at the top of the ranked list, while any active that is not recognised by the method floats at random among the inactives. So the AUC is at best a measure of the rank of the actives not recognised by the method, and at worst a random value.

Naturally, the AUC is the most widely used ranking method in the field.