{"id":835,"date":"2014-02-27T15:00:37","date_gmt":"2014-02-27T15:00:37","guid":{"rendered":"http:\/\/nextmovesoftware.com\/blog\/?p=835"},"modified":"2017-08-24T21:45:29","modified_gmt":"2017-08-24T20:45:29","slug":"unleashing-over-a-million-reactions-into-the-wild","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2014\/02\/27\/unleashing-over-a-million-reactions-into-the-wild\/","title":{"rendered":"Unleashing over a million reactions into the wild"},"content":{"rendered":"<p><a href=\"http:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2014\/02\/reactionExtraction.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-837 alignright\" alt=\"Reaction Extraction Workflow\" src=\"\/blog\/wp-content\/uploads\/2014\/02\/reactionExtraction.png\" width=\"235\" height=\"409\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2014\/02\/reactionExtraction.png 336w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2014\/02\/reactionExtraction-172x300.png 172w\" sizes=\"(max-width: 235px) 100vw, 235px\" \/><\/a>Unlike with small molecules, there are currently no large sets of publically available reaction data.<\/p>\n<p>To remedy this situation, we have extracted over a million reactions from United States patent applications (2001-2013) and the same again from patent grants (1976-2013). This contrasts to the original data release of &#8220;only&#8221; 420 thousand (from 2008-2011 applications) whilst I was in the <a href=\"http:\/\/blogs.ch.cam.ac.uk\/pmr\/\">PMR<\/a> group.<\/p>\n<p>The reactions are available as reaction SMILES or\u00a0 CML from <a href=\"https:\/\/bitbucket.org\/dan2097\/patent-reaction-extraction\/downloads\">here<\/a>, as <a href=\"http:\/\/www.7-zip.org\/\">7zip<\/a> archives. The CML representation includes quantities and yields where these were found. A documentation zip provides further information on the format of the data. This data is made available under <a href=\"http:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\">CC-Zero<\/a> i.e. without copyright. [<b>Update 24\/08\/2017:<\/b> A newer version of the dataset described here is available on <a href=\"https:\/\/figshare.com\/articles\/Chemical_reactions_from_US_patents_1976-Sep2016_\/5104873\">FigShare<\/a>]<\/p>\n<p>It is hoped that making this data resource available will facilitate analyses that require a large number of reactions.<\/p>\n<p>NextMove Software is currently looking into what insights can be obtained from such data sets. For example using our reaction classification software we can show broad correlation between the type of reaction and its yield and that this trend could be reproduced from ELN data (presentation <a href=\"http:\/\/www.slideshare.net\/NextMoveSoftware\/filbert2\/32\">here<\/a>). This is just the beginning of the sorts of analyses that can be performed with access to so many reactions. Expect to hear more at the upcoming <a href=\"http:\/\/www.int-conf-chem-structures.org\/\">ICCS<\/a> and <a href=\"http:\/\/www.ukqsar.org\/2014\/02\/07\/spring-2014\/\">UK-QSAR<\/a> meetings.<\/p>\n<p>More information about how the reactions were extracted can be found in my PhD <a href=\"https:\/\/www.repository.cam.ac.uk\/handle\/1810\/244727\">thesis<\/a> and a <a href=\"http:\/\/www.slideshare.net\/dan2097\/automated-extraction-of-reactions-from-the-patent-literature\">presentation<\/a> I gave at the ACS.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Unlike with small molecules, there are currently no large sets of publically available reaction data. To remedy this situation, we have extracted over a million reactions from United States patent applications (2001-2013) and the same again from patent grants (1976-2013). This contrasts to the original data release of &#8220;only&#8221; 420 thousand (from 2008-2011 applications) whilst &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2014\/02\/27\/unleashing-over-a-million-reactions-into-the-wild\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Unleashing over a million reactions into the wild<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/835"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=835"}],"version-history":[{"count":21,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/835\/revisions"}],"predecessor-version":[{"id":2605,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/835\/revisions\/2605"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=835"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=835"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=835"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}