Enabling Machines to Read the Chemical Literature (ACS Session)

I am organizing a session at the August ACS meeting in Boston entitled:

Enabling Machines to Read the Chemical Literature: Techniques, Case Studies & Opportunities

Abstracts are still being accepted so if you’re interested I encourage you to submit. Topics covered by talks are likely to be quite varied e.g. extraction of chemistry from images, classification of extracted compounds, association of chemicals with metadata etc.

The session is in the CINF division and the deadline for submissions is the 29th March 2015. This is a hard deadline so if you’re interested in submitting please don’t miss it!

On the topic of ACS meetings, at the upcoming ACS meeting in Denver, Tony Williams will be presenting about the RSC’s work to collect NMR spectra. As co-authors of the presentation our contribution is in the form of text mining over a million NMR spectra and their associated compounds from patent filings.

Roger Sayle will be attending the Denver ACS if you want to catch up or discuss anything.

Session: CHED:NMR Spectroscopy in the Undergraduate Curriculum
Day/time: Sunday, March, 22, 2015 from 4:15 PM – 4:35 PM
Location: Gold – Sheraton Denver Downtown Hotel
Title: Providing access to a million NMR spectra via the web
Abstract: Access to large scale NMR collections of spectral data can be used for a number of purposes in terms of teaching spectroscopy to students. The data can be used for teaching purposes in lectures, as training data sets for spectral interpretation and structure elucidation, and to underpin educational resources such as the Royal Society of Chemistry’s SpectralGame (www.spectralgame.com). These resources have been available for a number of years but have been limited to rather small collections of spectral data and specifically only about 3000 spectra. In order to expand the data collection and provide richer resources for the community we have been gathering data from various laboratories and, as part of a research project, we have used text-mining approaches to extract spectral data from articles and patents in the form of textual strings and utilized algorithms to convert the data into spectral representations. While these spectra are reconstructions of text representations of the original spectral data we are investigating their value in terms of utilizing for the purpose of structure identification. This presentation will report on the processes of extracting structure-spectral pairs from text, approaches to performing automated spectral verification and our intention to assemble a spectral collection of a million NMR spectra and make them available online.