In an earlier post, I described the importance of knowing the biopolymer structure when handling biologics. I also discussed various file formats that have been proposed to address this.
But rather than regarding this as a file format problem, why not consider it instead a perception problem. If we can perceive the biopolymer structure from an all-atom representation, then interconverting between any file format (whether one of the proposed biopolymer formats or existing all-atom representations such as SMILES) is straightforward. Can it be done? Well, that’s what exactly what PDB file writers do; they perceive the amino acid sequence from the all-atom structure and fill in the relevant columns in the PDB file.
There are several benefits to this approach. To begin with, it avoids the cost associated with a new registry system based on a macromolecular file format. There are no problems with new and unusual monomers; these will be faithfully stored in the all-atom representation. The de-facto standards for chemical information interchange, SMILES and MOL files, can be used as always for exchange of data. Tools for small-molecule analysis (e.g. SMARTS searching) can be combined with analyses based on biopolymer structure (e.g. HELM depiction, Smith-Waterman searching). And finally it’s worth considering that it may be difficult to migrate at a later date if a registry system is based on a particular file format.
What I’ve described here and in the previous post is the introduction from my ACS presentation on Roundtripping between small-molecule and biopolymer representations. This describes the development of the Sugar & Splice software for handling oligopeptides, oligonucleotides and oligosaccharides (including modified residues and mixtures of different biopolymers). Note that the presentation is somewhat sugar-centric; for more info on the peptide and nucleotide side of things see Roger’s Spring 2012 ACS presentation.
(For more presentations from NextMove Software, see our SlideShare page.)