{"id":180,"date":"2012-11-27T16:18:21","date_gmt":"2012-11-27T16:18:21","guid":{"rendered":"http:\/\/nextmovesoftware.com\/blog\/?p=180"},"modified":"2015-06-22T17:04:22","modified_gmt":"2015-06-22T16:04:22","slug":"building-and-climbing-a-chemical-ladder","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2012\/11\/27\/building-and-climbing-a-chemical-ladder\/","title":{"rendered":"Building and climbing a Chemical Ladder"},"content":{"rendered":"<p>A <a href=\"http:\/\/jakevdp.github.com\/blog\/2012\/10\/14\/scipy-sparse-graph-module-word-ladders\/\">recent post by Jake Vanderplas<\/a> described Word Ladders, and gave me an idea. A <a href=\"http:\/\/en.wikipedia.org\/wiki\/Word_ladder\">Word Ladder<\/a> is a game developed by Lewis Carroll that involves converting one word into another one letter at a time, while passing through only valid words. For example, to convert MAD into HAT, a valid word ladder would be MAD-&gt;MAT-&gt;HAT or MAD-&gt;HAD-&gt;HAT.<\/p>\n<p>Here I propose the term Chemical Ladder for a Word Ladder that is restricted to chemical names. For example, try converting from Barbamate to Carbazide in 4 steps only using valid chemical names. Or from Anginine to Arsonite.<\/p>\n<style type=\"text\/css\">\n<!--\npre { font-family: monospace; color: #000000; background-color: #ffffff; }\n.Special { color: #6a5acd; }\n.Constant { color: #ff00ff; }\n.Identifier { color: #008080; }\n.Statement { color: #804040; font-weight: bold; }\n.PreProc { color: #a020f0; }\n-->\n<\/style>\n<p>So, how did I come up with these examples? Well, NextMove Software&#8217;s <a href=\"http:\/\/www.nextmovesoftware.com\/products\/CaffeineFix.html\">CaffeineFix<\/a> can do chemical spelling correction based on a dictionary (or grammar). The spelling suggestions provided by CaffeineFix are substitutions, deletions and insertions, but if we consider just the 1-letter substitutions, this is exactly the transformation needed to build a word ladder, e.g.<\/p>\n<pre>&gt;&gt;&gt; <span class=\"PreProc\">from<\/span> caffeinefix <span class=\"PreProc\">import<\/span> CaffeineFix\r\n&gt;&gt;&gt; cf = CaffeineFix(<span class=\"Constant\">&quot;mydictionary.cfx&quot;<\/span>)\r\n&gt;&gt;&gt; <span class=\"Identifier\">list<\/span>(cf.suggest(<span class=\"Constant\">\"azite\"<\/span>))\r\n[<span class=\"Constant\">\"azote\"<\/span>, <span class=\"Constant\">\"azine\"<\/span>]<\/pre>\n<p>To begin with I downloaded the list of synonyms from PubChem, and filtered to remove database identifiers and various other cruft. I compiled these into a CaffeineFix dictionary, and edited the <code>suggest<\/code> method to just return substitutions (<code>suggest_substitutions<\/code> in the code below). The code shown below then uses the suggested substitutions for each chemical name to create a graph that I visualised to identify Chemical Ladders (see example below). A longer version of the code could be written to identify the Chemical Ladders more automatically.<br \/>\n<a href=\"http:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/11\/mycins.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-181\" title=\"mycins\" src=\"\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/11\/mycins.png\" alt=\"\" width=\"1067\" height=\"856\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/11\/mycins.png 1067w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/11\/mycins-300x240.png 300w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/11\/mycins-1024x821.png 1024w\" sizes=\"(max-width: 1067px) 100vw, 1067px\" \/><\/a><br \/>\nJust in case any highly-respected and discerning chemistry society wants to include Chemical Ladders in its weekly or monthly magazine, I&#8217;ve decided not to include the full output of the program in this blogpost, apart from the image above. What do you think? Could this be the next <a href=\"http:\/\/www.amazon.co.uk\/Chemistry-Su-Doku-v-1\/dp\/0854046933\">ChemDoku<\/a>?<\/p>\n<pre>\r\n<span class=\"PreProc\">from<\/span> collections <span class=\"PreProc\">import<\/span> defaultdict\r\n\r\n<span class=\"PreProc\">from<\/span> caffeinefix <span class=\"PreProc\">import<\/span> CaffeineFix\r\n\r\n<span class=\"Statement\">def<\/span> <span class=\"Identifier\">difference<\/span>(a, b):\r\n    <span class=\"Statement\">for<\/span> i, (d, e) <span class=\"Statement\">in<\/span> <span class=\"Identifier\">enumerate<\/span>(<span class=\"Identifier\">zip<\/span>(a,b)):\r\n        <span class=\"Statement\">if<\/span> d != e:\r\n            <span class=\"Statement\">return<\/span> i\r\n\r\n<span class=\"Statement\">def<\/span> <span class=\"Identifier\">nearest<\/span>(name):\r\n    suggestions = <span class=\"Identifier\">list<\/span>(cf.suggest_substitutions(name))\r\n    suggestions.remove(name)\r\n    <span class=\"Statement\">return<\/span> (name, suggestions)\r\n\r\nreplacements = [(<span class=\"Constant\">&quot;0&quot;<\/span>, <span class=\"Constant\">&quot;zero&quot;<\/span>), (<span class=\"Constant\">&quot;%&quot;<\/span>, <span class=\"Constant\">&quot;PERCENT&quot;<\/span>), (<span class=\"Constant\">&quot;+&quot;<\/span>, <span class=\"Constant\">&quot;PLUS&quot;<\/span>), (<span class=\"Constant\">&quot;4&quot;<\/span>, <span class=\"Constant\">&quot;FOUR&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;7&quot;<\/span>, <span class=\"Constant\">&quot;SEVEN&quot;<\/span>), (<span class=\"Constant\">&quot;9&quot;<\/span>, <span class=\"Constant\">&quot;NINE&quot;<\/span>), (<span class=\"Constant\">&quot;6&quot;<\/span>, <span class=\"Constant\">&quot;SIX&quot;<\/span>), (<span class=\"Constant\">&quot;8&quot;<\/span>, <span class=\"Constant\">&quot;EIGHT&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;)&quot;<\/span>, <span class=\"Constant\">&quot;RBRACKET&quot;<\/span>), (<span class=\"Constant\">&quot;'&quot;<\/span>, <span class=\"Constant\">&quot;APOSTROPHE&quot;<\/span>), (<span class=\"Constant\">&quot;@&quot;<\/span>, <span class=\"Constant\">&quot;AT&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;}&quot;<\/span>, <span class=\"Constant\">&quot;RBRACKETB&quot;<\/span>), (<span class=\"Constant\">&quot;{&quot;<\/span>, <span class=\"Constant\">&quot;LBRACKETB&quot;<\/span>), (<span class=\"Constant\">&quot;:&quot;<\/span>, <span class=\"Constant\">&quot;COLON&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;\/&quot;<\/span>, <span class=\"Constant\">&quot;FSLASH&quot;<\/span>), (<span class=\"Constant\">&quot;.&quot;<\/span>, <span class=\"Constant\">&quot;PERIOD&quot;<\/span>), (<span class=\"Constant\">&quot;&amp;&quot;<\/span>, <span class=\"Constant\">&quot;AMPERSAND&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;^&quot;<\/span>, <span class=\"Constant\">&quot;CIRCUMFLEX&quot;<\/span>), (<span class=\"Constant\">&quot;[&quot;<\/span>, <span class=\"Constant\">&quot;LBRACKETC&quot;<\/span>), (<span class=\"Constant\">&quot;]&quot;<\/span>, <span class=\"Constant\">&quot;RBRACKETC&quot;<\/span>),\r\n                (<span class=\"Constant\">&quot;|&quot;<\/span>, <span class=\"Constant\">&quot;PIPE&quot;<\/span>), (<span class=\"Constant\">&quot;;&quot;<\/span>, <span class=\"Constant\">&quot;SEMICOLON&quot;<\/span>)]\r\n\r\n<span class=\"Statement\">def<\/span> <span class=\"Identifier\">fix<\/span>(name):\r\n    name = name.replace(<span class=\"Constant\">&quot;-&quot;<\/span>, <span class=\"Constant\">&quot;_&quot;<\/span>).replace(<span class=\"Constant\">&quot;1&quot;<\/span>, <span class=\"Constant\">&quot;one&quot;<\/span>).replace(<span class=\"Constant\">&quot;2&quot;<\/span>, <span class=\"Constant\">&quot;two&quot;<\/span>).replace(<span class=\"Constant\">&quot;3&quot;<\/span>, <span class=\"Constant\">&quot;three&quot;<\/span>).replace(<span class=\"Constant\">&quot;5&quot;<\/span>, <span class=\"Constant\">&quot;five&quot;<\/span>).replace(<span class=\"Constant\">&quot;,&quot;<\/span>, <span class=\"Constant\">&quot;_&quot;<\/span>).replace(<span class=\"Constant\">&quot;?&quot;<\/span>, <span class=\"Constant\">&quot;Q&quot;<\/span>).replace(<span class=\"Constant\">&quot;(&quot;<\/span>, <span class=\"Constant\">&quot;LB&quot;<\/span>).replace(<span class=\"Constant\">&quot; &quot;<\/span>, <span class=\"Constant\">&quot;SPACE&quot;<\/span>)\r\n    <span class=\"Statement\">for<\/span> x, y <span class=\"Statement\">in<\/span> replacements:\r\n        name = name.replace(x, y)\r\n    <span class=\"Statement\">return<\/span> name\r\n\r\n<span class=\"Statement\">if<\/span> __name__ == <span class=\"Constant\">&quot;__main__&quot;<\/span>:\r\n    cf = CaffeineFix(<span class=\"Constant\">r&quot;C:\\Tools\\LargeData\\PubChem_Synonyms\\pubchem.cfx&quot;<\/span>)\r\n    names = [x.strip() <span class=\"Statement\">for<\/span> x <span class=\"Statement\">in<\/span> <span class=\"Identifier\">open<\/span>(<span class=\"Constant\">r&quot;C:\\Tools\\LargeData\\PubChem_Synonyms\\lowercase_sorted_uniq.txt&quot;<\/span>, <span class=\"Constant\">&quot;r&quot;<\/span>) <span class=\"Statement\">if<\/span> x.rstrip()\r\n             <span class=\"Statement\">and<\/span> <span class=\"Identifier\">len<\/span>(x) == <span class=\"Constant\">10<\/span>]\r\n    results = <span class=\"Identifier\">map<\/span>(nearest, names)\r\n\r\n    output = <span class=\"Identifier\">open<\/span>(<span class=\"Constant\">&quot;wordladder_10.txt&quot;<\/span>, <span class=\"Constant\">&quot;w&quot;<\/span>)\r\n    output.write(<span class=\"Constant\">&quot;graph graphname {<\/span><span class=\"Special\">\\n<\/span><span class=\"Constant\">&quot;<\/span>)\r\n\r\n    <span class=\"Statement\">for<\/span> x, y <span class=\"Statement\">in<\/span> results:\r\n        <span class=\"Statement\">if<\/span> <span class=\"Identifier\">len<\/span>(y) &gt; <span class=\"Constant\">1<\/span>:\r\n            collate = defaultdict(<span class=\"Identifier\">list<\/span>)\r\n            <span class=\"Statement\">for<\/span> w <span class=\"Statement\">in<\/span> y:\r\n                collate[difference(x, w)].append(w)\r\n            <span class=\"Statement\">if<\/span> <span class=\"Identifier\">len<\/span>(collate) &gt; <span class=\"Constant\">1<\/span>:\r\n                <span class=\"Statement\">for<\/span> v <span class=\"Statement\">in<\/span> collate.values():\r\n                    output.write(<span class=\"Constant\">'%s [label=&quot;%s&quot;];<\/span><span class=\"Special\">\\n<\/span><span class=\"Constant\">'<\/span> % (fix(x), x))\r\n                    <span class=\"Statement\">for<\/span> z <span class=\"Statement\">in<\/span> v:\r\n                        output.write(<span class=\"Constant\">&quot;%s -- %s<\/span><span class=\"Special\">\\n<\/span><span class=\"Constant\">&quot;<\/span> % (fix(x), fix(z)))\r\n                        output.write(<span class=\"Constant\">'%s [label=&quot;%s&quot;];<\/span><span class=\"Special\">\\n<\/span><span class=\"Constant\">'<\/span> % (fix(z), z))\r\n\r\n    output.write(<span class=\"Constant\">&quot;}<\/span><span class=\"Special\">\\n<\/span><span class=\"Constant\">&quot;<\/span>)\r\n    output.close()\r\n<\/pre>\n<p><b>Note:<\/b> A couple of people have asked why are there two edges for only some of the connections in the graph. This would be the case if I retained all of the original edges, as if A is a spelling correction of B, then B is a spelling correction of A. However, since a word ladder can only exist if a node in the graph has at least two connections, I filter out all those cases where a node has only a single connection (otherwise you end up with a lot of &#8216;word ladders&#8217; composed of just two words). So, if I have A->B, B->(A, C), C->(B, D), D->C, then A->B and D->C will be removed, and the graph will be A-B=C-D.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A recent post by Jake Vanderplas described Word Ladders, and gave me an idea. A Word Ladder is a game developed by Lewis Carroll that involves converting one word into another one letter at a time, while passing through only valid words. For example, to convert MAD into HAT, a valid word ladder would be &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2012\/11\/27\/building-and-climbing-a-chemical-ladder\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Building and climbing a Chemical Ladder<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/180"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=180"}],"version-history":[{"count":22,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/180\/revisions"}],"predecessor-version":[{"id":1469,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/180\/revisions\/1469"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}