John Overington over at the ChEMBLog has recently discussed the task of finding redox pairs in a database. As John points out, these are neither isomers nor tautomers but are of interest in any case.
It turns out that after a minor modification, NextMove’s Equalizer software (insert link from future here) can be used to find these. This software can be used to generate canonical “hashes” for molecules which cause different forms of the same molecule to hash to the same representation [Footnote]. It’s not a new idea by any means (think of the layered structure of the InChI) but the way we do this is pretty neat as it uses canonical SMILES to do the hashing. By altering the information encoded in the SMILES different forms of the same structure can be identified; for example mesomers, structural isomers, tautomers [1], protomers…and now redox pairs.
For redox pairs, a SMILES is generated after setting all bond orders to 1 and all atom charges to 0. If the resulting canonical SMILES is identical but the overall (original) charge is different, two structures can be considered a redox pair. Here are some examples found in ChEMBLdb 15:
[Footnote] This is similar to the classic way of finding anagrams given a dictionary of words (I read about this in Jon Bentley’s Programming Pearls). Take each word, sort its letters (this sorted word is the “hash” in this case; words with the same hash will be anagrams) and write it to an output file followed by the original word on the same line. Sort the output file, and look for adjacent duplicates in the first column. The corresponding anagrams will be in the second column.
References:
[1] RA Sayle, So you think you understand tautomerism? JCAMD, 2010, 24, 485.