The increasing importance of biological therapeutics, or biologics, to the pharmaceutical industry is well-known. For example, data from Drugs.com show that of the top 15 best selling therapies in the US in Q4 2012, six were biologics. Monoclonal antibodies are a typical example; these are glycoproteins, comprising of short oligosaccharides attached to a multi-chain polypeptide.
It is clear that handling such molecules requires a different approach than that taken for small-molecules. For example, here is an all-atom depiction of the peptide crambin:
No – it’s not a cyclic peptide. It just happens to have three disulfide bridges. A more useful depiction can be generated if we follow the IUPAC or FDA guidelines for peptide depiction; here the primary structure is much clearer as is the presence of the disulfide bonds:
However, to create these sorts of depictions, and otherwise handle biopolymers more appropriately, we need to know the polymer structure.
Some consider this a file format problem. Some file formats which have been developed to store or represent biopolymer structures include the CHUCKLES and CHORTLES languages from Chiron and Daylight, HELM (Hierarchical Editing Language for Macromolecules) from Pfizer, Protein Line Notation from Biochemfusion and SCSR (Self-Contained Sequence Representation, an MDL V3000 extension) from Accelrys. Naturally, Wisswesser Line Notation has also been extended to handle this problem.
In particular, the HELM format has recently received support from the Pistoia Alliance. See for example this post on the Pistoia blog which describes how HELM “gives us a single consistent way to describe macromolecules which can be used across industry and academia” so that “researchers do not have to spend time creating their own notations”.
But is a new file format the best way to achieve this goal? (I can’t resist inserting the xkcd comic on standards at this point 🙂 )