Monday, September 14, 2009

Citation Tracker: Monitoring Citations to your Publications

One of the common pastimes of academics is checking services such as Google Scholar to see the number of papers that cite our work. Quite often the statistics from Google Scholar, or from other services such as Web of Science, are used to create a citation report that is used for promotion and tenure purposes.


While Google Scholar is extremely valuable for finding papers that cite a particular piece of work, it has some shortcomings, especially when creating a citation report for promotion. First, Google Scholar does not differentiate between peer-reviewed (journal, conference, or workshop papers), and other publications (such as tech reports, or term papers); so, when preparing a citation report, I have to go over the list of papers, keeping the "legitimate" citations and removing the citations that are not admissible. Second, Google Scholar is noisy sometimes, and lists twice the same paper, or splits citations for the same paper into two different entries; some other times it does not include papers that are possible to find through a web search.

Another feature that I would really like to see is the ability to find the "new" citations for a given paper, creating the appropriate alerts. A simple RSS feed would work wonders, but it is not there.

Of course, Google Scholar also does not monitor the web to find other types of documents that may mention a particular paper. PhD seminars, or even blog posts, are things that I would like to keep track of when monitoring who cites my own work. Especially for such volatile pages, I typically want to keep a copy so that I can retrieve them a few years later, when compiling my promotion packet.

For this reason, over the summer, I created a tool that can augment Google Scholar and monitor Google Scholar (and other services like Libra, CiteSeerX, SSRN), and also monitor the Web (Google, Bing, Ask) for mentions of the paper.

You can access a pre-alpha version at http://www.citation-tracker.com

Some of the features:
  • Import publications from Google Scholar, DBLP, BibTeX, and manually.
  • Review the citations for each paper, and decide which ones to keep, which to discard, and which ones to examine later.
  • Monitor citation services (Google Scholar, Libra, CiteSeerX, SSRN) and see notifications when new citations to your papers appear.
  • Generate automatically a citation report, listing the papers that cite your work.
I have been using the service over the last few weeks and it seems reasonably stable. I import my papers using Google Scholar, "accept" the existing citations, and then wait to see about the new citations that pop up every now and then. I find it pretty useful for finding new papers that cite my work.

Over the last few days I even started importing papers from other researchers that I consider relevant to my work, and for which I want to see what new papers cite them.

Feel free to login and play with the system. Needless to say, it is an early release so I expect to see bugs here and there. If you see any bug, or if you would like to see a new feature, please add a note using the "feedback" tab that is visible on the side of the screen.

Enjoy!

9 comments:

Arun Sundararajan said...

very cool, Panos. very cool.

Michael Kuhn said...

Hi Panos,

I developed a similar tool for following citations in Web of Science. The source is on github, so if you want you could incorporate some of my ideas.

http://blog.mckuhn.de/2008/08/citeweb-following-citations-made-easy.html
http://github.com/mkuhn/cite-web

best wishes, Michael

Panos Ipeirotis said...

@Michael: We have thought of incorporating WoS as well. Unfortunately, it is a subscription-based service and opening it up indirectly through citation-tracker could cause trouble for my host institution (NYU) and for me personally.

Or do you have an alternative for accessing its feeds without a subscription?

Gene Golovchinsky said...

Cool tool! Lots of opportunities to build it up as a way of helping researchers communicate. I wrote up some details on my blog post, In pursuit of impact

Gene Golovchinsky said...

What environment/tools are you using to build Citation Tracker?

Panos Ipeirotis said...

Amazon EC2 and SimpleDB. Also SQS for scheduling and pacing tasks.

Gene Golovchinsky said...

Although this is probably not directly relevant to your blog, it would be interesting (to me) to read about how you managed relational data in SimpleDB, and what sorts of tradeoffs you encountered versus trying to do it with a relational database.

Panos Ipeirotis said...

Actually the tool was built on SimpleDB just to see the tradeoffs. Given that we do not try to do any extensive data cleaning, and we treat each publication/citation in isolation, SimpleDB works pretty well. A single row for each publication, and all the necessary data for the publication are contained there.

SimpleDB also gives me the piece of mind that it can increase in size rather easily. With relational databases, it is a pain to scale when a single machine cannot handle the load. I guess more extensive lessons will come as we get more experience with the incorporation of new features, maintenance, etc.

Now I see that Google App Engine would also be a good match. But when we started building the tool, they did not have task queues with background tasks, so crawling was pretty much out of question.

Michael Kuhn said...

Hi Panos,

sorry, I didn't check back on the WoS thing. My tool requires the user to paste in their list of WoS citation alerts and extract the RSS feeds from this list. The feeds themselves are freely accessible from anywhere, e.g. you could monitor them with Google Reader. citeweb just does some re-hashing of the feeds and also doesn't keep info beyond the current content of the RSS feed to stay clear of legal concerns.

Post a Comment