I attended the ever-excellent Sheffield Cheminformatics – sorry – Chemoinformatics Conference last week where I presented a poster on Sugar & Splice, Macromolecules or Big Small-Molecules? Handling Biopolymers in a Chemical Registry System (click on the image below to access the PDF):
If you’re familiar with the HELM format, a new format for describing macromolecules, you may be interested to note the HELM string in the bottom-left of the poster which represents a cyclic peptide connected to a cysteine through a disulfide bridge:
PEPTIDE1{[ac].C}|PEPTIDE2{N.V.P.C}|CHEM1{*N[C@@H](Cc1ccc (cc1)OP(=O)(O)O)C(=O)* |$_R1;;;;;;;;;;;;;;;;;_R2$|}|PEPTIDE3{V} $PEPTIDE1,PEPTIDE2,2:R3-4:R3|PEPTIDE2,CHEM1,4:R2-1:R1| CHEM1,PEPTIDE3,1:R2-1:R1|PEPTIDE3,PEPTIDE2,1:R2-1:R1$$$
In this case, the HELM string is much longer than the corresponding IUPAC Condensed string, and indeed also longer than the all-atom SMILES string. Unfortunately, while both tyrosine and phosphate are supported as monomers by the current HELM release, phosphotyrosine is not nor can it be constructed by connecting the phosphate to the tyrosine (no R3 locant). As a result, the phosphotyrosine is represented as a CHEM object using the SMILES string.