{"id":1520,"date":"2015-08-24T16:12:02","date_gmt":"2015-08-24T15:12:02","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=1520"},"modified":"2015-08-24T16:12:02","modified_gmt":"2015-08-24T15:12:02","slug":"biopolymer-canonicalisation-scaling-between-toolkits","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2015\/08\/24\/biopolymer-canonicalisation-scaling-between-toolkits\/","title":{"rendered":"Biopolymer Canonicalisation Scaling Between Toolkits"},"content":{"rendered":"<p>We&#8217;ve previously shown using all-atom structure representations is a tractable approach to handling biologics (see <a href=\"https:\/\/nextmovesoftware.com\/blog\/2014\/11\/\">https:\/\/nextmovesoftware.com\/blog\/2014\/11\/<\/a>). Handling biologics in this way allows you to reuse existing registration infrastructure (e.g. Canonical SMILES\/InChI\/CACTVS keys).<\/p>\n<p>At the Fall ACS &#8217;15, Roger presented an update to this on-going work showing that many popular open-source cheminformatics toolkits can already handle peptides < 500 AA (the size of immunoglobulin heavy chains) in less than a second. We timed the generation of a canonical SMILES string (from the internal representation) over SwissProt. With the exception of Indigo\/CDK (that hit hard error limits) the lines stop due to time constraints.\n\n<center><br \/>\n<a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-all_1000.png\"><img decoding=\"async\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-all_1000.png\" alt=\"sp-all_1000\" width=\"512\" class=\"alignnone size-full wp-image-1524\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-all_1000.png 1800w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-all_1000-300x200.png 300w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-all_1000-1024x683.png 1024w\" sizes=\"(max-width: 1800px) 100vw, 1800px\" \/><\/a><br \/>\n<\/center><\/p>\n<p>One thing the timings highlighted was recent improvements in RDKit that show faster canonicalisation and and reduced scatter (similar size structures ~ same amount of time). CDK was originally limited by the number of primes listed (it uses product of primes for refinement); <a href=\"https:\/\/github.com\/cdk\/cdk\/pull\/141\">patching<\/a> the CDK to use more primes allows it to encode biopolymers of over 1000 AA.<\/p>\n<p><center><br \/>\n<a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-scatter1000.png\"><img decoding=\"async\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-scatter1000.png\" alt=\"sp-scatter1000\" width=\"512\" class=\"alignnone size-full wp-image-1526\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-scatter1000.png 1242w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-scatter1000-300x186.png 300w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2015\/08\/sp-scatter1000-1024x633.png 1024w\" sizes=\"(max-width: 1242px) 100vw, 1242px\" \/><\/a><br \/>\n<\/center><\/p>\n<p>Roger&#8217;s full talk is available here:<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" src=\"\/\/www.slideshare.net\/slideshow\/embed_code\/key\/Ke6Txhcy5PMcMr\" width=\"512\" height=\"420\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"\/\/www.slideshare.net\/NextMoveSoftware\/cinf-1-generating-canonical-identifiers-for-glycoproteins-and-other-chemically-modified-biopolymers\" title=\"CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemically Modified) Biopolymers\" target=\"_blank\">CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemically Modified) Biopolymers<\/a> <\/strong> from <strong><a href=\"\/\/www.slideshare.net\/NextMoveSoftware\" target=\"_blank\">NextMove Software<\/a><\/strong> <\/div>\n<p><\/center><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve previously shown using all-atom structure representations is a tractable approach to handling biologics (see https:\/\/nextmovesoftware.com\/blog\/2014\/11\/). Handling biologics in this way allows you to reuse existing registration infrastructure (e.g. Canonical SMILES\/InChI\/CACTVS keys). At the Fall ACS &#8217;15, Roger presented an update to this on-going work showing that many popular open-source cheminformatics toolkits can already handle &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2015\/08\/24\/biopolymer-canonicalisation-scaling-between-toolkits\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Biopolymer Canonicalisation Scaling Between Toolkits<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1520"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=1520"}],"version-history":[{"count":26,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1520\/revisions"}],"predecessor-version":[{"id":1550,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1520\/revisions\/1550"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=1520"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=1520"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=1520"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}