Bringing sugars to OPSIN

cycleofsugarsupportI’m pleased to announce the release of OPSIN 1.4.0. This new release brings significant improvements to OPSIN’s coverage of carbohydrate nomenclature. It also complements NextMove Software’s Sugar & Splice project that aims to make the conversion between carbohydrate and small molecule representations effortless.

Below is the effect this improvement to OPSIN has had on the conversion of IUPAC names in ChEBI. (This is one of the data sets used in the OPSIN publication [free access])

Number of names convertible to InChI on IUPAC names from ChEBI (Sept 2010)
Number of names convertible to InChI on IUPAC names from ChEBI (Sept 2010)

Examples of new nomenclature supported (pictures generated by the OPSIN web service)

3-Deoxy-alpha-D-manno-oct-2-ulopyranosonic acid
3-Deoxy-alpha-D-manno-oct-2-ulopyranosonic acid

beta-D-Fructofuranosyl alpha-D-glucopyranoside
beta-D-Fructofuranosyl alpha-D-glucopyranoside

Methyl 2,3,4-tri-O-acetyl-alpha-D-glucopyranosyluronate bromide
Methyl 2,3,4-tri-O-acetyl-alpha-D-glucopyranosyluronate bromide

3-O-beta-D-galactosyl-sn-glycerol
3-O-beta-D-galactosyl-sn-glycerol

OPSIN 1.4.0 is available from Bitbucket and Maven Central. The full release notes are below:

  • Added support for dialdoses,diketoses,ketoaldoses,alditols,aldonic acids,uronic acids,aldaric acids,glycosides,oligosacchardides, named systematically or from trivial stems, in cyclic or acyclic form
  • Added support for ketoses named using dehydro
  • Added support for anhydro
  • Added more trivial carbohydrate names
  • Added support for sn-glcyerol
  • Improved heuristics for phospho substitution
  • Added hydrazido and anilate suffixes
  • Allowed more functional class nomenclature to apply to amino acids
  • Added support for inverting CAS names with substituted functional terms e.g. Acetaldehyde, O-methyloxime
  • Double substitution of a deoxy chiral centre now uses the CIP rules to decide which substituent replaced the hydroxy group
  • Unicode right arrows, superscripts and the soft hyphen are now recognised

On the other hand

Mirror imageThe vast majority of amino acid residues that appear in peptides and proteins appear as their natural L-form enantiomer. This is the form of all amino acids as translated by the ribosome. However post-translational modification, such as by racemases, or peptide synthetic methods can be used to introduce the mirror image D-forms of amino acids into peptidic compounds.

Though rare, it is often important to keep track of whether individual amino acids are L-form, D-form, unknown or a racemic mixture of the two (DL-form). Rather unhelpfully, IUPAC rule 3AA-3.3 from IUPAC’s “Nomenclature and Symbolism for amino acids and peptides” states that the configuration prefix may be omitted for amino acids from a natural protein source (where the configuration may be assumed to be L) and for amino acids from synthetic sources (where the configuration is assumed to be an equimolecular mixture of enantiomers).

In the RCSB’s PDB file format, three letter residue codes have been assigned for both the L- form and the D- form of the traditional 19 amino acids other than glycine. Glycine (PDB residue code GLY) has an achiral α-carbon and therefore does not have an L-form and a D-form. For convenience, the table below lists the correspondence between PDB’s three letter codes for these amino acids.

Name L-form D-form
Alanine ALA DAL
Arginine ARG DAR
Asparagine ASN DSG
Aspartic Acid ASP DAS
Cysteine CYS DCY
Glutamine GLN DGN
Glutamic Acid GLU DGL
Histidine HIS DHI
Isoleucine ILE DIL
Leucine LEU DLE
Lysine LYS DLY
Methionine MET MED
Phenylalanine PHE DPN
Proline PRO DPR
Serine SER DSN
Threonine THR DTH
Tryptophan TRP DTR
Tyrosine TYR DTY
Valine VAL DVA

The above table is believed to be the only internet resource conveniently linking the two PDB residue codes for enantiomeric forms of amino acids.

Unfortunately for the two recent natural amino acids, selenocysteine (PDB residue code CSE) and pyrrolysine (PDB residue code PYH), no PDB residue codes have yet been assigned for their enantiomeric D-forms.

Image credit: The Joneses (where are the joneses on Flickr)