From Sun to Thurs next week, I’ll be attending the 245th ACS National Meeting in New Orleans. You’ll find me hanging around the CINF sessions, not least because I’ll be presenting some recent work there.
In particular, I’ll be talking about Roundtripping between small-molecule and biopolymer representations on the Tuesday (3:10pm 9th April, Room 349), which looks at the challenges I’ve encountered in the development of NextMove Software’s Sugar & Splice software. This software can be used to perceive, depict and interconvert between various biopolymer representations, and currently supports peptides, nucleotides and sugars (and mixtures thereof, e.g. glycoproteins).
If you’re interested in meeting up to discuss this or anything else, drop me a line at noel@nextmovesoftware.com.
Here’s the abstract. Slides will follow after the event:
Roundtripping between small-molecule and biopolymer representations
Noel M. O’Boyle,1 Evan Bolton,2 Roger A. Sayle1
1 NextMove Software Ltd, Innovation Centre, Unit 23, Science Park, Milton Road, Cambridge,
CB4 0EY, UK
2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda MD 20894, USAExisting cheminformatics toolkits provide a mature set of tools to handle small-molecule data, from generating depictions, to creating and reading linear representations (such as SMILES and InChI). However, such tools do not translate well to the domain of biopolymers where the key information is the identity of the repeating unit and the nature of the connections between them. For example, a typical all-atom 2D depiction of all but the smallest protein or oligosaccharide obscures this key structural information.
We describe a suite of tools which allow seamless interconversion between appropriate structure representations for small molecules and biopolymers (with a focus on polypeptides and oligosaccharides). For example:
SMILES: OC[C@H]1O[C@@H](O[C@@H]2[C@@H](CO)OC([C@@H]([C@H]2O)NC(=O)C)O)[C@@H]([C@H]([C@H]1O)O[C@@]1(C[C@H](O)[C@H]([C@@H](O1)[C@@H]([C@@H](CO)O)O)NC(=O)C)C(=O)O)O
Shortened IUPAC format: NeuAc(a2-3)Gal(b1-4)GlcNAcI will discuss the challenge of supporting a variety of biopolymer representations, handling chemically-modified structures, and handling biopolymers with unknown attachment points (e.g. from mass spectrometry).