NextMove recently participated in the BioCreative CHEMNDER (Chemical compound and drug name recognition) task. This task involved annotating chemical mentions in PubMed abstracts. BioCreative have annotated 10,000 abstracts of which 7,000 were provided to participants for training and in mid-September participants were asked to identify mentions in the unseen test corpus of 3,000 abstracts (which to avoid cheating was combined with 17,000 decoy abstracts).
In total 27 teams (23 academic and 4 commercial) submitted results. We achieved 85.0% recall at a precision of 88.7% giving an Fscore of 86.9%. Our solution ranked amongst the best submitted, being only 0.53% from the best performing solution in the chemical entity mentions task and significantly ahead of the other commercial solutions. Inter-annotator agreement was 91% indicating that with recent advances in machine annotation, automated systems are rapidly approaching the quality of human abstractors.
Participation in this competition has driven recent developments in LeadMine including improved coverage of non-systematic chemical entities and detection of abbreviations.
If you want to know the full details our proceedings paper is available here and you can find out how we compared in the full proceedings here (results on p14, list of teams on p31). The presentation below, which I gave at the workshop, summarises our system: