The CIP (Cahn-Ingold-Prelog) priority rules are used to assign R and S labels to stereocentres. However it is known to be very prone to mis-implementation:
The CIP System Again:? Respecting Hierarchies Is Always a Must
Through our work on OEChem, OPSIN and Centres we have independently written 3 different CIP implementations and hence discussion of the corner cases of CIP inevitable becomes a heated coffee time discussion.
This deceptively simple case on the right turns out to give different results in many implementations.
Which “ligand” do you think has highest priority?
If you said [CH2][OH] you’d be right, but the majority of implementations disagree:
Toolkit/application | Assignment |
---|---|
Marvin 2014.11.3.0 | S |
ChemBioDraw 12 | S |
RDKit (HEAD) | S |
Centres (HEAD) | R |
CACTVS (Web Sketcher) | R [updated 23/02/2015] |
DataWarrior (latest) | S |
AccelrysDraw 4.2 | S (now R in BIOVIA Draw 2017) |
OEChem 2014.Oct.2 | S |
ChemDoodle 7.0.2 | S |
OPSIN 1.6 | R |
CDK 1.5.10 | S |
We can speculate that the cause of the disagreement may be that the left and right side of the molecule are symmetrical by atomic number (rule 1) and that hence rule 2 (atomic mass) is then being erroneously applied to ALL ligands… while correct implementations will only apply rule 2 to split the tie between the two ligands that could not be determined by rule 1 (*). Hence this case should be assigned R.
* “precedence (priority) of an atom in a group established by a rule
does not change on application of a subsequent rule.” (IUPAC recommendations)
Thanks for documenting this interesting problem. This is now fixed in current Cactvs sources – isotope labels are only used when comparing two ligand subtrees if they appear identical otherwise.
In the absence of an exact explanation how the Xemistry Web sketcher (at http://www.xemistry.com/edit?) was used to get the Cactvs-baseed CIP descriptor (there is no direct function for this), that finding may actually be an unrelated artefact stemming from some Internet computational or lookup service.
Nevertheless, I confirm that also the basic Cactvs toolkit did indeed have the problem.
That’s great. I’ve updated the table above. I think John will reply separately on the Web sketcher question.
Hi Wolf,
The CIP codes are listed in the KEGG structure export when imported from SMILES.
Example:
[C@@H]1([C@H](CCCC1)C)O
Produces:
ENTRY Compound
ATOM 10
1 ??? C 288.8576 198.0000 #R
2 ??? H 306.0384 207.9200
3 ??? C 261.1424 182.0000 #S
4 ??? H 261.1424 201.8400
5 ??? C 261.1424 150.0000
6 ??? C 288.8576 134.0000
7 ??? C 316.5696 150.0000
8 ??? C 316.5696 182.0000
9 ??? C 233.4304 198.0000
10 ??? O 288.8576 230.0000
BOND 10
1 1 2 1
2 1 3 1
3 3 4 1
4 3 5 1
5 5 6 1
6 6 7 1
7 7 8 1
8 1 8 1
9 3 9 1 #Down
10 1 10 1 #Down
///
Hi John,
that is ingenious, and yes, that does indeed tap into the native Cactvs implementation. But as I wrote, fixed in current sources, and that includes KEGG export.