{"id":1684,"date":"2016-01-27T11:45:22","date_gmt":"2016-01-27T11:45:22","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=1684"},"modified":"2016-01-27T11:45:22","modified_gmt":"2016-01-27T11:45:22","slug":"assembling-a-large-data-set-for-melting-point-prediction-text-mining-to-the-rescue","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2016\/01\/27\/assembling-a-large-data-set-for-melting-point-prediction-text-mining-to-the-rescue\/","title":{"rendered":"Assembling a large data set for melting point prediction: Text-mining to the rescue!"},"content":{"rendered":"<p><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2016\/01\/Gallenkamp_Melting_Point_Apparatus.jpg\" rel=\"attachment wp-att-1685\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-1685 size-full\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2016\/01\/Gallenkamp_Melting_Point_Apparatus.jpg\" alt=\"Gallenkamp_Melting_Point_Apparatus\" width=\"300\" height=\"452\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2016\/01\/Gallenkamp_Melting_Point_Apparatus.jpg 680w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2016\/01\/Gallenkamp_Melting_Point_Apparatus-199x300.jpg 199w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>As part of a project initiated by Tony Williams and the Royal Society of Chemistry, I have been working with Igor Tetko to text-mine melting and decomposition point data from the US patent literature so that he could then produce a melting point prediction model. This model showed an improvement over previous models, which is likely due to the overwhelming large size of the dataset compared to the smaller curated data sets used by these previous models.<\/p>\n<p>The results of this work have now been published in the Journal of Cheminformatics here: <a href=\"http:\/\/www.jcheminf.com\/content\/8\/1\/2\">The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from Patents<\/a><\/p>\n<p>From the text-mining side this involved identifiying compounds, melting\u00a0 and decomposition points, performing the association between them, and then normalizing the representation of the melting points (e.g. &#8220;182-4\u00b0C&#8221; means the same as &#8220;182 to 184\u00b0C&#8221;). Values that were likely to be typos in the original text were also flagged.<\/p>\n<p>As mentioned in the paper the resultant set of 100,000s of melting points is available as SDF from <a href=\"https:\/\/figshare.com\/articles\/Melting_Point_and_Pyrolysis_Point_Data_for_Tens_of_Thousands_of_Chemicals\/2007426\">Figshare<\/a> while the model Igor developed is available from <a href=\"http:\/\/ochem.eu\/article\/99826\">OCHEM<\/a>.<\/p>\n<p><b>Image credit:<\/b> <a class=\"owner-name truncate\" title=\"Go to Iain George's photostream\" href=\"https:\/\/www.flickr.com\/photos\/51035797337@N01\/\" data-rapid_p=\"29\" data-track=\"attributionNameClick\">Iain George<\/a> on Flickr (CC-BY-SA)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As part of a project initiated by Tony Williams and the Royal Society of Chemistry, I have been working with Igor Tetko to text-mine melting and decomposition point data from the US patent literature so that he could then produce a melting point prediction model. This model showed an improvement over previous models, which is &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2016\/01\/27\/assembling-a-large-data-set-for-melting-point-prediction-text-mining-to-the-rescue\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Assembling a large data set for melting point prediction: Text-mining to the rescue!<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1684"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=1684"}],"version-history":[{"count":11,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1684\/revisions"}],"predecessor-version":[{"id":1698,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1684\/revisions\/1698"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=1684"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=1684"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=1684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}