Popular med chem replacements

medchemreplacementsWhen people talk about bioisosteres (e.g. tetrazole and carboxylic acid) they are usually referring to R group replacements that have similar biological properties. Identifying new bioisosteres can expand a med chemist’s toolbox, and so a number of studies have analysed activity databases to search for previously unknown bioisoteric replacements (e.g. [1]).

Here instead we will analyse what med chemists already consider to be bioisosteres. That is, we will look at the set of med chem replacements observed in the medicinal chemistry literature without any regard to the corresponding activity.

What I’ve done is take all (non-duplicate) IC50, EC50 and Ki data from ChEMBL and generated matched series on a per-assay basis (e.g. an assay with halide analogues will be converted to [*Br, *Cl, *F]). The corresponding matched pairs (e.g. [*Br, *F], [*Br, *Cl], [*F, *Cl]) are then associated with the paper from which the assay is taken, and any duplicates for the same paper are removed.

Having done this, we can then ask what is a popular replacement for *Br? As it turns out the top answer is ethynl, after *I. This comes from the fact that *Br occurs in 5497 of the 32,158 papers, and ethynl in 322, so if they occured independently we would expect to see them co-occur in 55 papers. Given that they actually co-occur in 103, this is an enrichment (or “lift” as recommender systems [2] call it) of 1.9 times what you would expect to see by chance. Here are the others with positive enrichment:

R Occurence Co-occur Expected Enrichment
*I 1553 901 265.5 3.4
*C#C 322 103 55.0 1.9
*Cl 10769 3263 1840.8 1.8
*[N+](=O)[O-] 3910 1179 668.4 1.8
*C=C 334 91 57.1 1.6
*C#N 3373 883 576.6 1.5
*SC(F)(F)F 63 16 10.8 1.5
*F 9048 2261 1546.6 1.5
*OC(F)(F)F 1149 279 196.4 1.4
*C(F)(F)F 4984 1130 852.0 1.3
*S(=O)(=O)C(F)(F)F 51 10 8.7 1.1
*SC 1337 252 228.5 1.1
*C#CC 76 14 13.0 1.1

I’ve put together an animation that summarises these data. This cycles through the most popular R group replacements that have positive enrichment and that have not previously been shown (in the animation, that is). The suggestions seem to make a lot of sense, especially when you remember that no fingerprint or MCS calculation is used – the co-occurences come completely from the data.

[1] Wassermann AM, Bajorath J. Large-scale exploration of bioisosteric replacements on the basis of matched molecular pairs. Future Med Chem. 2011, 3, 425-436.
[2] Boström J, Falk N, Tyrchan C. Exploiting personalized information for reagent selection in drug design. Drug Discov Today. 2011, 16, 181-187.