{"id":3130,"date":"2025-09-22T14:42:16","date_gmt":"2025-09-22T13:42:16","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=3130"},"modified":"2025-09-22T14:56:10","modified_gmt":"2025-09-22T13:56:10","slug":"cxsmiles-part-3-repeat-groups","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2025\/09\/22\/cxsmiles-part-3-repeat-groups\/","title":{"rendered":"CXSMILES Part 3: Repeat Groups"},"content":{"rendered":"\n<p>This post continues a series (see <a href=\"https:\/\/nextmovesoftware.com\/blog\/2022\/04\/08\/cxsmiles-gotchas-part-1-bond-indexes\/\">Part 1<\/a> and <a href=\"https:\/\/nextmovesoftware.com\/blog\/2022\/04\/20\/cxsmiles-part-2-component-grouping\/\">Part 2<\/a>) examining some features and insights when using ChemAxon Extended SMILES\/SMARTS (CXSMILES) to represent repeat groups.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Repeat Groups<\/strong><\/h2>\n\n\n\n<p>Inherited from CTfiles (Molfile) there are two ways to specify repeat groups in CXSMILES, you can either use use an <strong>Structure Repeat Unit (SRU) Sgroup<\/strong> or a <strong>Link Node<\/strong>. Historically link nodes are used for queries and the SRU for polymers however the boundaries are a little blurry. From a user\u2019s perspective you probably want to handle them interchangeably &#8211; particularly if this is a query structure input. Recent versions of ChemAxon\u2019s <a href=\"https:\/\/marvinjs-demo.chemaxon.com\/latest\/demo.html\"><strong>MarvinSketch<\/strong><\/a> will encode a repeat as a link node if possible. In third party tools similar interconversion is useful with a preferred (canonical) representation generated on output.<\/p>\n\n\n\n<p>Link nodes are more limited than SRU groups but allow for a terser encoding. Firstly any crossing bonds (bonds that cross the brackets) must connect to the same repeated atom. You must also define a lower and upper bound for number of types to repeat (e.g. 1 to 3). In CTfiles the lower bound must be 1 &#8211; in practice any lower bound (including 0 is reasonable). A simple example is shown below, atom idx <strong>1<\/strong> repeats <strong>1 <\/strong>to <strong>3<\/strong> times.<\/p>\n\n\n\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\"><div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"68\" height=\"45\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image.png\" alt=\"\" class=\"wp-image-3131\"\/><\/a><\/figure><\/div><\/div>\n<\/div>\n\n\n\n<pre class=\"wp-block-code\"><code>NCC=O |LN:1:1.3| (Ia)<\/code><\/pre>\n\n\n\n<p>As an SRU Sgroup we can write it like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NCC=O |Sg:n:1:1-3:| (Ib)<\/code><\/pre>\n\n\n\n<p>I have specified the subscript as \u201c1-3\u201d which isn\u2019t semantically encoded but sufficient. SRUs Sgroups can have whatever you like as the subscript and you will frequently see <strong>n<\/strong> or <strong>m<\/strong> for any number of repeats but it can be anything.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"192\" height=\"83\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-1.png\" alt=\"\" class=\"wp-image-3132\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1C(O)CC |Sg:n:6,7:n:ht| (II)\nc1ccccc1C(O)CC |Sg:n:6,7:&amp;#62;1:ht| (III)\n<\/code><\/pre>\n\n\n\n<p>The connectivity superscript of the bracket can be [head-to-tail (<strong>ht<\/strong>), head-to-head (<strong>hh<\/strong>), either-unspecified (<strong>eu<\/strong>)]. If the repeated part is symmetric (like in II) then it is redundant and can be omitted.<\/p>\n\n\n\n<p>Link nodes allow you to specify the outer atoms of crossing bonds, Sgroups only support this for ladder-type polymers (as per documentation). You need the outer atoms for link nodes if there are more than two bonds &#8211; this means multiple atoms repeat. With the SRU Sgroups you just specify all the atoms that repeat. I\u2019ve strained the brackets to demonstrate below:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"192\" height=\"70\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-2.png\" alt=\"\" class=\"wp-image-3133\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1C(O)CC |LN:6:1.2.6.8| (IVa)\nc1ccccc1C(O)CC |Sg:n:6,7:1-2:ht| (IVb)\n\n<\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"184\" height=\"73\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-3.png\" alt=\"\" class=\"wp-image-3134\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1C(O)CC |LN:6:1.2.6.7| (Va)<br>c1ccccc1C(O)CC |Sg:n:6,8,9:1-2:ht| (Vb)<\/code><\/pre>\n\n\n\n<p>If the link node is in a ring, then the crossing bonds are implicitly the ones in the ring. However for portability the outer atoms should be specified whenever there is more than two bonds connected.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"34\" height=\"76\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-4.png\" alt=\"\" class=\"wp-image-3135\" style=\"width:34px;height:auto\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>C1(O)CCCCC1 |LN:0:1.3| (VIa) acceptable\nC1(O)CCCCC1 |LN:0:1.3.1.6| (VIb) preferred\nC1(O)CCCCC1 |Sg:n:0,1:1-3:ht| (VIc)<\/code><\/pre>\n\n\n\n<p>Something that may not be obvious when you first encounter repeat groups is that although in polymer chemistry they are typically linear, for structure queries we can also have what I will call <em>radial<\/em> repeats. With a radial repeat the repeat unit \u201crotates\u201d around a single external atom &#8211; they are implicitly bounded by the number of bonds that atom can have. They can have an odd or even number of crossing bonds and it depends on the grouping of the crossing bonds as to what type of repeat you have. The table below shows this, I have put some hypothetical examples in light grey to show the pattern repeats.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"263\" height=\"350\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png\" alt=\"\" class=\"wp-image-3136\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png 263w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-5-225x300.png 225w\" sizes=\"(max-width: 263px) 100vw, 263px\" \/><\/a><\/figure><\/div>\n\n\n<p>As with linear repeats, radial repeats can be represented as link nodes if the range is known and they are a single atom:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png\"><img loading=\"lazy\" decoding=\"async\" width=\"337\" height=\"72\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png\" alt=\"\" class=\"wp-image-3137\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png 337w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-6-300x64.png 300w\" sizes=\"(max-width: 337px) 100vw, 337px\" \/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>CS(=O)CC |Sg:n:2:1-2:| (VIIa)\nCS(=O)CC |LN:2:1.2| (VIIb)\n\nN1CCCC(Cl)C1 |Sg:n:5:1-2:| (VIIIa)\nN1CCCC(Cl)C1 |LN:5:1.2| (VIIIb)\n\n*Cl.n1ccccc1 |Sg:n:1:1-2:,m:0:2.3.4.5| (IXa)\n*Cl.n1ccccc1 |m:0:2.3.4.5,LN:1:1.2| (IXb)\n\n**.n1ccccc1 |$;_R1$,m:0:2.3.4.5,Sg:n:1:1-2:| (Xa)\n**.n1ccccc1 |$;_R1$,m:0:2.3.4.5,LN:1:1.2| (Xb)<\/code><\/pre>\n\n\n\n<p>For non ladder-type polymers CXSMILES Sgroups only capture the atoms that repeat and not the grouping of the crossing bonds. Therefore we have an ambiguity with radial spiro-repeats. A reasonable rule of thumb is if the outer atom of each crossing bond is the same then it is likely a spiro-repeat rather than linear.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png\"><img loading=\"lazy\" decoding=\"async\" width=\"222\" height=\"151\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png\" alt=\"\" class=\"wp-image-3138\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>C=1C2=C3&#91;N](&#91;Ir]&#91;N]4=CC=CC(C=C2)=C34)=CC1 |Sg:n:0,1,2,3,5,6,7,8,9,10,11,12,13,14:n:ht| (XI)<\/code><\/pre>\n\n\n\n<p>CTfiles actually have the same issue if you don\u2019t store the coordinates. Fortunately these repeat types are relatively rare but here are some in the wild:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png\"><img loading=\"lazy\" decoding=\"async\" width=\"310\" height=\"209\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png\" alt=\"\" class=\"wp-image-3139\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png 310w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-8-300x202.png 300w\" sizes=\"(max-width: 310px) 100vw, 310px\" \/><\/a><figcaption class=\"wp-element-caption\"><strong>US 2016\/0049599 A1<\/strong><br>(US20160049599A1-20160218-C00566)<\/figcaption><\/figure><\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-9.png\"><img loading=\"lazy\" decoding=\"async\" width=\"151\" height=\"120\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-9.png\" alt=\"\" class=\"wp-image-3140\"\/><\/a><figcaption class=\"wp-element-caption\"><strong>US 2015\/0380666 A1<\/strong><br>(US20150380666A1-20151231-C00746)<\/figcaption><\/figure><\/div>\n\n\n<p>We extract these from patents as CXSMILES with our Text-Mining tool <a href=\"https:\/\/www.nextmovesoftware.com\/leadmine.html\">LeadMine<\/a> (<a href=\"https:\/\/www.nextmovesoftware.com\/posters\/Mayfield_SketchySketchesII_Sheffield_202306.pdf\">Poster<\/a>).<\/p>\n\n\n\n<p>Explicit hydrogens may be required to define the Sgroup SRU properly. If an explicit hydrogen is used care needs to be taken when suppressing explicit hydrogens. If you suppress\/remove the hydrogen in (XII) the meaning changes from a PEG linear repeat to a radial repeat<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-10.png\"><img loading=\"lazy\" decoding=\"async\" width=\"104\" height=\"49\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-10.png\" alt=\"\" class=\"wp-image-3141\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1CCO&#91;H] |Sg:n:6,7,8:m:ht| (XII)<\/code><\/pre>\n\n\n\n<p>Atom indices can be written in any order, ideally when writing CXSMILES you should sort the list of indices.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-11.png\"><img loading=\"lazy\" decoding=\"async\" width=\"143\" height=\"49\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-11.png\" alt=\"\" class=\"wp-image-3142\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1CCOCCO |Sg:n:6,7,8:n:ht| (XIIa)\nc1ccccc1CCOCCO |Sg:n:6,8,7:n:ht| (XIIb)\nc1ccccc1CCOCCO |Sg:n:8,7,6:n:ht| (XIIc)<\/code><\/pre>\n\n\n\n<p>As touched upon previously, If the repeated unit is symmetric then the subscript is irrelevant, for registration these should all be considered the same:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJAAAABFCAYAAACltW8kAAAKrmlDQ1BJQ0MgUHJvZmlsZQAASImVlwdUU9kWhs+96Y0WQEBKqKEIUgQCSAk99N5EJSQBAiHGFJpdGRzBsaAigoqgoyAKjgWQsWLBwqDYsE+QQUAZBws2VN4FFsGZt9576+21Tva3dvbdZ5+Tc+76AwBFiy0SCWAVALKFUnFUgDctITGJhhsEeKADiMABuLE5EhEzIiIEIDbl\/27v7wFo3N+2Hq\/179\/\/V1Pl8iQcAKAIhFO5Ek42wseR8Y4jEksBQNUiceNcqWicOxBWFyMNIiwf5\/RJfjfOqROMxk\/kxET5IKwLAJ7MZovTASCbI3FaDicdqUMORNhWyOULEc5D2CM7exEX4RaEzZEcEcLj9Rmp39VJ\/1vNVEVNNjtdwZNrmTC8L18iErDz\/8\/t+N+WLZBNzUFHBjlDHBiFeOQXhP7IWhSsYGFqWPgU87kT+ROcIQuMnWKOxCdpiiWCaNYUc9m+wYo6grCQKU7j+yty+FJWzBTzJH7RUyxeFKWYN03sw5xitni6B1lWrCKewWMp6hdkxMRPcQ4\/LkzRW1Z08HSOjyIulkUp1sITBnhPz+uv2IdsyXdr57MUz0ozYgIV+8Ce7p8nZE7XlCQoeuPyfP2mc2IV+SKpt2IukSBCkc8TBCjikpxoxbNS5HBOPxuh2MNMdlDEFAM\/4I3cPXtgBwJBJIiV8vKk44vwWSTKF\/PTM6Q0JnLTeDSWkGMzi2Zva+8IwPi9nTwWb+9P3EdIEz8dEyB3Z+7vyPlNmY5xGgE4\/gIAypnpmNk8AJScATivwpGJcyZj6PEPDPI2UAbqQBvoA2NgDqyR7pyAG\/BCeg0C4SAGJIIFgAMyQDYQg1ywFKwCRaAEbALbQAWoAntBLTgMjoJmcAqcB5fBdXAT3AWPgBz0gZdgGLwHoxAE4SAKRIW0IQPIFLKC7CEG5AH5QSFQFJQIpUDpkBCSQUuhNVAJVApVQNVQHfQLdBI6D12FuqAHUA80CL2BPsMomAyrw3qwGTwbZsBMOBiOgefD6fBiuAAuhDfA5XANfAhugs\/D1+G7sBx+CY+gAIqE0kQZoqxRDJQPKhyVhEpDiVHLUcWoMlQNqgHVimpH3UbJUUOoT2gsmoqmoa3RbuhAdCyag16MXo5ej65A16Kb0BfRt9E96GH0NwwFo4uxwrhiWJgETDomF1OEKcPsx5zAXMLcxfRh3mOxWE0sHeuMDcQmYjOxS7Drsbuwjdhz2C5sL3YEh8Np46xw7rhwHBsnxRXhduAO4c7ibuH6cB\/xJLwB3h7vj0\/CC\/Gr8WX4g\/gz+Fv4fvwoQYVgSnAlhBO4hHzCRsI+QivhBqGPMEpUJdKJ7sQYYiZxFbGc2EC8RHxMfEsikYxILqRIEp+0klROOkK6QuohfSKrkS3JPuRksoy8gXyAfI78gPyWQqGYUbwoSRQpZQOljnKB8pTyUYmqZKPEUuIqrVCqVGpSuqX0SpmgbKrMVF6gXKBcpnxM+YbykApBxUzFR4WtslylUuWkSrfKiCpV1U41XDVbdb3qQdWrqgNqODUzNT81rlqh2l61C2q9VBTVmOpD5VDXUPdRL1H71LHqdHWWeqZ6ifph9U71YQ01jTkacRp5GpUapzXkmihNM02WpkBzo+ZRzXuan2fozWDO4M1YN6Nhxq0ZH7Rmanlp8bSKtRq17mp91qZp+2lnaW\/WbtZ+ooPWsdSJ1MnV2a1zSWdopvpMt5mcmcUzj858qAvrWupG6S7R3avboTuip68XoCfS26F3QW9IX1PfSz9Tf6v+Gf1BA6qBhwHfYKvBWYMXNA0akyagldMu0oYNdQ0DDWWG1YadhqNGdKNYo9VGjUZPjInGDOM0463GbcbDJgYmoSZLTepNHpoSTBmmGabbTdtNP5jRzeLN1po1mw3QtegsegG9nv7YnGLuab7YvMb8jgXWgmGRZbHL4qYlbOlomWFZaXnDCrZysuJb7bLqmoWZ5TJLOKtmVrc12ZppnWNdb91jo2kTYrPaptnm1WyT2UmzN89un\/3N1tFWYLvP9pGdml2Q3Wq7Vrs39pb2HPtK+zsOFAd\/hxUOLQ6v51jN4c3ZPee+I9Ux1HGtY5vjVydnJ7FTg9Ogs4lzivNO526GOiOCsZ5xxQXj4u2ywuWUyydXJ1ep61HXv9ys3bLcDroNzKXP5c3dN7fX3cid7V7tLvegeaR47PGQexp6sj1rPJ95GXtxvfZ79TMtmJnMQ8xX3rbeYu8T3h98XH2W+ZzzRfkG+Bb7dvqp+cX6Vfg99TfyT\/ev9x8OcAxYEnAuEBMYHLg5sJulx+Kw6ljDQc5By4IuBpODo4Mrgp+FWIaIQ1pD4dCg0C2hj8NMw4RhzeEgnBW+JfxJBD1iccSvkdjIiMjKyOdRdlFLo9qjqdELow9Gv4\/xjtkY8yjWPFYW2xanHJccVxf3Id43vjRenjA7YVnC9USdRH5iSxIuKS5pf9LIPL952+b1JTsmFyXfm0+fnzf\/6gKdBYIFpxcqL2QvPJaCSYlPOZjyhR3OrmGPpLJSd6YOc3w42zkvuV7crdxBnjuvlNef5p5WmjaQ7p6+JX0wwzOjLGOI78Ov4L\/ODMysyvyQFZ51IGtMEC9ozMZnp2SfFKoJs4QXF+kvylvUJbISFYnki10Xb1s8LA4W75dAkvmSFqk6IpA6ZOayH2Q9OR45lTkfc+Nyj+Wp5gnzOvIt89fl9xf4F\/y8BL2Es6RtqeHSVUt7ljGXVS+Hlqcub1thvKJwRd\/KgJW1q4irslb9ttp2denqd2vi17QW6hWuLOz9IeCH+iKlInFR91q3tVU\/on\/k\/9i5zmHdjnXfirnF10psS8pKvqznrL\/2k91P5T+NbUjb0LnRaePuTdhNwk33Nnturi1VLS0o7d0SuqVpK21r8dZ32xZuu1o2p6xqO3G7bLu8PKS8ZYfJjk07vlRkVNyt9K5s3Km7c93OD7u4u27t9trdUKVXVVL1eQ9\/z\/3qgOqmGrOasr3YvTl7n++L29f+M+Pnuv06+0v2fz0gPCCvjaq9WOdcV3dQ9+DGerheVj94KPnQzcO+h1sarBuqGzUbS46AI7IjL35J+eXe0eCjbccYxxqOmx7feYJ6orgJaspvGm7OaJa3JLZ0nQw62dbq1nriV5tfD5wyPFV5WuP0xjPEM4Vnxs4WnB05Jzo3dD79fG\/bwrZHFxIu3LkYebHzUvClK5f9L19oZ7afveJ+5dRV16snrzGuNV93ut7U4dhx4jfH3050OnU23XC+0XLT5WZr19yuM7c8b52\/7Xv78h3Wnet3w+523Yu9d787uVt+n3t\/4IHgweuHOQ9HH618jHlc\/ETlSdlT3ac1v1v83ih3kp\/u8e3peBb97FEvp\/flH5I\/vvQVPqc8L+s36K8bsB84Neg\/ePPFvBd9L0UvR4eK\/lT9c+cr81fH\/\/L6q2M4Ybjvtfj12Jv1b7XfHng3513bSMTI0\/fZ70c\/FH\/U\/lj7ifGp\/XP85\/7R3C+4L+VfLb62fgv+9ngse2xMxBazJ6QAChlwWhoAbw4guiERAOpNAIjzJnX1hEGT\/wUmCPwnntTeE+YEwK6VACQiY1wW7RnXIAhTzgEQ4QVAjBeAHRwUY0oDT+j1CW2CqEoMDvGl3W8urAT\/sEkt\/13f\/\/RAUfVv\/l+7IAgu5LeUnwAAADhlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAAqACAAQAAAABAAAAkKADAAQAAAABAAAARQAAAABd4tibAAAKHklEQVR4Ae2dDdBVRRmAxR+gPzF\/oPE3UAqjNEoErZQKZ9RwMouiyDQp1GZCGJ2wgCFsMLQ0xwJNBpMaCtRGygw1zc9KhSTFRsFmiEBzmlKMqawZLOt5LndjOXPvx73fPed8373sO\/Pcs7tnz7t73n3Pu3suez\/22itJskALFujXwrW9celQGn0INlcbv47jrdV0XzysolODQDtvhHMhSS9aYAptL4NhVfbvxb400vRR1X4O5\/iXRi5otzr7tluH6e8W2NQm\/bavymvhb5VUh33s3WH3k26nZAt0kgO5PnKqcL1RthxHgzfD\/XAT2I89QtpxCssOzAgKlsMfYTu8Gc6DtVCGjKKRH8NUWAPj4OfV4+85JulDFnARfWXUHyPoBpgUlY0j\/Ry8KiorMvlDlM\/INPBl8kaiIK6BfAvrOGn3KWw0IzIQjEBBuki4eD0tFBR8tA8PZtowf2KmrOjs+2jArw2Mhu8ourGgP08HOhul\/YPiko6H084farTl1HFEjfIiilxzvZJRbD5P22bU75LVBt+B+TAPloHrsW\/DkVCo5HGTx9DD22E8PALnQFnyIg0NrtGYZVtrlBdRtA6lJ2UUn0z+sUxZ3tn9UDgdfgZPwCmwGlbAWHga7oG50Ce\/L3s1HZsDrkFcmyiG7bvhXsgalaKWJbsGcp3zPJwQaR5KehscAhq5iLWQa5qw7tJZngHv3Wh0KrigHwnKNPBcnmugcejTWYw8RqB6cjAnrob1cCFojz4hZ9ELn7zrwU5m5YMUPApL4ejsyRbyOtA3YCE4KMqZsAl05pnwO5gMiusjn85PgIObh3wYJeo8L1L2btKuPYw6LqrjNYgO9iTo1IdCK3IYFzs96TzvbULRcOp+D9aAY9dr4nR1GzwALh6DjCFxRchUj3r7RWCEWgC1HI3ihsXpVod8Ab4EsUMMJv9x+CRkn8jjKVsJXTAOeip+PaAep4xRTSoxGtpvnXs2GMGaEW15CTgtzYCerjXfxbX3wV1wAmTlLRRY54DohOlBUd5k0+tLpytv3FD4GXAwFQ2zEB6H+Kkj+3+x8XmgI2kEo0KzYrTpgl\/DIuiJnM5FD8NyOLYJBa+hrv333o06seOSbUh0mI0wGK6Fp+AC2Ad2J6dSwX7fAodDHvIRlDhmPviK\/bKNO8Hx1NEvBuUrcHkltfPjRZIN22EClW3smxCiiA40FXwijAaNrDOOot5iWAcToZEO2N63qtfYD6ewK6Gn4oB9GhzA62AIdCfncNLpykE\/sLuKuzkXHChUG0HiB+CgnREKM0enuyWwBt6fOZdHdgBK3lRVdDvHeZFS7\/VPYFDosQMdzcW3QhecCEH02gdhBQwNhU0c7dRPwFD6njrX6aCfBaPWLAgOOoV0Kw7E5RUxquj4PgBfACNsLBr2DrCP9rdVyTpQ0Gd0eQB+BG+vFjpdTQP7din0hyJF274MOk0sXyOzAHSgq8AAENhGuh\/UFBXOBgfPQYynK6OB0egD0KqoYzUsAwcsiM6qUXVenTiWvBwo6PQpN7L+Fs4FB28urIfzoa6RONeM1HMgddjGJDDSOVgPw1Joep3BNT0R2\/lHjQs\/T5n9sE86810R20nXtM0ETtSarnSkbDSgqGXZFw3qdsC+DteD7dd7U8jbgWiqIm\/l07en74JT20GQp3TnQKEdH1wXyONDQUlHo\/G\/Yf9Me18lbxTSgS7PnKu5BnJx3AVOUUFMd4HRYBgUJa9DsWuCFZCdTuI2i3Kg0MbokMj52IgD5dxkU+pWUfuy6Aqd6lk4Gbp1ICNAkC+ScPrYCgPBiOAr3RxwzVKk\/B3lhu5B8M8iG9qN7kd3cz6P08ejRIdykO7NQ2EOOi5Eh1PUSHgGzoYbwTE5A+pK7EAvUWufak0VuArXA\/9VLSvi4PQ1GOaDYTS0T7KuGK10dGUL\/LmS6psfLsC1sffVv9rFCzgabX04N8Cz0Nui04wCx\/v1sAQsUxbBK5XUzo9xJP9rNnag\/0T55Z4sSZz7FR1ov0qq\/sdPOTUVfBtQFsOdlVTf\/LiUbunwe8P3q13cztGXh6FwMPQFB6IbFfv\/wkRGDCRZ8aWjIrEDOYBxvlql0EPcZpyu16g347qsXWRyjY7qQIr3m32yKyfa6SN2mJfpeJwv4z5ipzECNjKFldGvItuYVVV+W5GNlKXb0BoknsJCWdHH2IF04N1NYUX3J+lv0gKxAzmYZUeAOOr1RvtNmitVz1og60BlR4A46ulAZbeftUfKN2mBrAOVvQaKp614OmvyNlL13rJA7EDxdFJWf+KFc5rCyrJ6ju3EDhRPJzk20a2qOOqkKaxbU\/XNk7EDxYNZVm\/jqJciUFlWz7Gd3nagOOrF6RxvMakq0gK97UDxIjpOF3nPSXeOFsg6UNnfA8XTVpzO8RaTqiItkHWgsr+H0WlCm3G6yHtOunO0QNaBwvdA\/mv35yBsQcixyV1UbSQ3u1qiA4X2d6mUMn3XArEDxW9Ez9Fld+c9Ah8qsPvuxQ2buA4h7SarJG1qgcn02701x0T9d+PW3XAPjI3K80y6+9E9t0+De2SStLEFLqLv62EBuNkpiDsU18JSGBYKczi6XfIxuAEG56AvqegDFnBf8jwwIkwHI4TiYlcHcxumDnYQ9FTeyIVu6\/wlnARJOtACR3JPN8E6+Cj0A+UAuAJ0sEtgADQqOuNM0Al1xrK\/NqDJJGVbwI3hro3uh1Oixn1TWww62MSovF7S6eo3cCMMqVcplXeuBc7k1nwrc+oZEd3mO0n7s5\/7oNZblI7mhnKnK3f9J9mDLeD3NFPgSbgGfPUOMoHEsSHDMUxXTnUXQ5quIuPs6Ul\/qjIH\/DMgl0H216SnU5amK4yQpHsLHMbphfAE+D2Sr+JOV7+CNF1hhCSNWeA4qt0BKyFNV43ZLNWqYQFf8ZMkCyQLJAskCyQLJAskCyQLJAskCyQLJAskCyQLJAskCyQLJAskCyQLJAskCyQLJAsUZoGw87KVBvLQ0Ur7pV2b9ujsNLWD\/inw92lu2Z0B7n96vJqezvF5mAb+Zm4zvA3cjeAvWV4C\/7a2W1m2gtuA\/clS2\/8hTe4hSQMWcKtt2CDnPwq7aW5u9TqdJOzCPJ\/0qmq5h80wxgRyC8w0gYyFkK4UdOKHT1iSHf8Pl78yMcIo2+Bj4BZey1eBOysVI1T4U73ZfHxuNScXwc0Q9Fq\/oyT+ZWpH3ViTN3Ma9TdkrnmK\/A3gVl1\/WNkT2cJFo3tyYbtckxxox0g53fg\/0MQyhMyB8ALMik80kXYtNLyJ+m1XNU1hO4bMSOH\/ERFkAImrwYXzobAG\/GmT+7x96OK3rDgfp6lW+c9j1proVEkRaMfIPsTBNyllJIQp66+kjUCbYCVMBP\/YhG9fTk1ngfvCJ8EY8Fe24+EIUIxiOl+SPcACS7jHOAq1ess62fxWlaTr28cCvm353c\/AHLr8BnQYnTpe\/geW\/aJAnIMR5gAAAABJRU5ErkJggg==\" alt=\"\"><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>c1ccccc1OCOCCO |Sg:n:6,7,8:n:hh| (XIIIa)\nc1ccccc1OCOCCO |Sg:n:6,7,8:n:ht| (XIIIc)\nc1ccccc1OCOCCO |Sg:n:6,7,8:n:eu| (XIIIb)\nc1ccccc1OCOCCO |Sg:n:6,7,8:n:| (XIIId)<\/code><\/pre>\n\n\n\n<p>My interests are mainly on repeat variation for structure queries. There is a lot more that can be said on polymer registration, for example you need to canonicalise the repeat unit (frame shift). Four of these are the same structure and one isn\u2019t:&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-12.png\"><img loading=\"lazy\" decoding=\"async\" width=\"144\" height=\"264\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-12.png\" alt=\"\" class=\"wp-image-3143\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>&#91;H]NCC(=O)NCC(=O)O&#91;H] |Sg:n:1,2,3,4:n:ht| (XIVa)\n&#91;H]NCC(=O)NCC(=O)O&#91;H] |Sg:n:2,3,4,5:n:ht| (XIVb)\n&#91;H]NCC(=O)NCC(=O)O&#91;H] |Sg:n:3,4,5,6:n:ht| (XIVc)\n&#91;H]NCC(=O)NCC(=O)O&#91;H] |Sg:n:5,6,7,8:n:ht| (XIVd)\n&#91;H]NCC(=O)NCC(=O)O&#91;H] |Sg:n:6,7,8,9:n:ht| (XIVe)<\/code><\/pre>\n\n\n\n<p>A final comment is something I noticed when preparing this post. It is common in polymer chemistry to draw external external attachments as a plain bond. Here is the wikipedia depiction of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Polyvinyl_chloride\">polyvinyl chloride<\/a>:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-13.png\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"142\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-13.png\" alt=\"\" class=\"wp-image-3144\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>*CC(Cl)* |Sg:n:1,2,3:n:ht| (XVa)<\/code><\/pre>\n\n\n\n<p>EPAM\u2019s <a href=\"https:\/\/lifescience.opensource.epam.com\/KetcherDemoSA\/index.html\">Ketcher<\/a> looks like it automatically converts any methyl capping groups to <strong>\u2018*\u2019<\/strong> on load\u2026 but this might just be a display setting. It\u2019s possibly reasonable and it doesn\u2019t seem to be default behaviour in the Indigo API but is an interesting auto-conversion to be aware of.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-14.png\"><img loading=\"lazy\" decoding=\"async\" width=\"293\" height=\"242\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2025\/09\/image-14.png\" alt=\"\" class=\"wp-image-3145\"\/><\/a><\/figure><\/div>\n\n\n<pre class=\"wp-block-code\"><code>CCC(Cl)C |Sg:n:1,2,3:n:ht| (XVb)<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This post continues a series (see Part 1 and Part 2) examining some features and insights when using ChemAxon Extended SMILES\/SMARTS (CXSMILES) to represent repeat groups. Repeat Groups Inherited from CTfiles (Molfile) there are two ways to specify repeat groups in CXSMILES, you can either use use an Structure Repeat Unit (SRU) Sgroup or a &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2025\/09\/22\/cxsmiles-part-3-repeat-groups\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">CXSMILES Part 3: Repeat Groups<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24,26,23,22,25,1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3130"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=3130"}],"version-history":[{"count":9,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3130\/revisions"}],"predecessor-version":[{"id":3156,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3130\/revisions\/3156"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=3130"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=3130"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=3130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}