Are more bioactivities available from patents than from the academic literature?

Patents, such as those freely available from the US patent office, are a rich source of bioactivity data. One argument for favoring these data over data extracted from the academic literature is timeliness: a recent publication by Stefan Senger suggests an average delay of 4 years between the publication of compound-target interaction pairs in the patent literature compared to the academic literature.

However, another argument is simply the quantity of data. Daniel has been working on the general problem of extracting data from tables in patents, a certain proportion of which are bioactivity data. The following graph shows the amount of bioactivity data per (publication) year in ChEMBL versus extracted by LeadMine from US patents. Note that for the purposes of this comparison, the ChEMBL data excludes data extracted from patents by BindingDB.

The rise in the amount of patent data is due to an increase in the size of patents as well as the number thereof. If the trend continues, patents will become increasingly important as a source of bioactivity data.

Daniel presented the details of the text-mining procedure at the recent ACS meeting in San Francisco. The talk below also includes a comparison between the data extracted by LeadMine and that extracted manually by BindingDB. If you’re interested in seeing a poster on the topic, Daniel will be presenting at UK-QSAR this Wednesday.

Workshop at Bio-IT World on extraction of information from medicinal chemistry patents

Next month, at Bio-IT World, I will be co-hosting a workshop with Chris Southan (Guide to Pharmacology) and Paul Thiessen (PubChem) entitled “Digging Bioactive Chemistry Out of Patents Using Open Resources”. Chris Southan recently wrote about some of the untapped potential for the patent literature in drug discovery here.

The workshop will cover the following topics:

  • Outline the statistics of patent chemistry in various open sources
  • Introduce a spectrum of open resources and tools
  • Enable a deeper understanding of target identification, bioactivity and SAR extraction from patents and also papers
  • Show ways to engage with medicinal chemistry patent mining
  • Include hands-on exercises

The workshop is scheduled for Tuesday 23rd May and the deadline for signups is the 14th of April registration is still open. For more information on the agenda of the workshop and to sign up head to here.