Sunday, May 13, 2007

Impact Factors and Other Metrics for Faculty Evaluations

A few months back, as a department we had to submit to the school a list of the "A" journals in our field. The ultimate reason for this request was to have a list of prime venues for each field, and thus facilitate the task of the promotion & tenure committees that include researchers with little (or no) knowledge of the candidates field.

Generating such a list can be a difficult task, especially if we try to keep the size of the list small, and directly connected with the problem of ranking journals. There are many metrics that can be used to for such a ranking, and the used metric is the "impact factor", proposed by Eugene Garfield. The impact factor ranks the "impact" of the research published in each journal by counting the average number of citations to the 2- and 3-year old articles published in the journal, over the last year. The basic idea is that journals with a large number of recent incoming citations (from last year) that also point to relatively recent articles (2- and 3-year old papers), show the importance of the topics published in the given journal. The choice of the time window makes comparisons across fields difficult, but generally within a single field the impact factor is a good metric for ranking journals.

Garfield, most probably expecting the outcome, explicitly warned that the impact factor should not be used to judge the quality of the research of an individual scientist. The simplest reason is the fact that the incoming citations to the papers that are published in the same journal follow a power-law: a few papers receive a large number of citations, while many others get only a few. Copying from a related editorial in Nature: "we have analysed the citations of individual papers in Nature and found that 89% of last year’s figure was generated by just 25% of our papers. [...snip...] Only 50 out of the roughly 1,800 citable items published in those two years received more than 100 citations in 2004. The great majority of our papers received fewer than 20 citations." So, the impact factor is a pretty bad metric for really examining the quality of an individual article, even though the article might have been published in an "A" journal.

The impact factor (and other journal-ranking metrics) was devised be used as a guideline for librarians to allocate the subscription resources, and as a rough metric for providing guidance to scientists that are trying to decide which journals to follow. Unfortunately, such metrics have been mistakenly used as convenient measures for summarily evaluating the quality of someone's research. ("if published in a journal with high impact factor, it is a good paper; if not, it is a bad paper"). While impact factor can (?) be a prior, it almost corresponds to a Naive Bayes classifier that does not examine any feature of the classified object before making the classification decision.

For the task of evaluating the work of an individual scientist, other metrics appear to be better. For example, the much-discussed h-index and its variants seem to get traction. (I will generate a separate post for this subject.) Are such metrics useful? Perhaps. However, no metric, no matter how carefully crafted can substitute careful evaluation of someone's research. These metrics are only useful as auxiliary statistics, and I do hope that they are being used like that.