In the course of preparing a talk for the recent ACS meeting (more on this later), I thought it would be interesting to give an overview of the ChEMBL data on substituted phenyls. What I did was take all those matched series* with associated IC50 data containing 4 or more phenyl substituents, and then count the frequency of each particular phenyl.
In other words, when a medicinal chemist was trying to optimize the substituents around a phenyl ring, which were the most frequent groups tested?
The order of popularity at the 4 position is OMe > Cl > F > Me, while at the 2 and 3 positions it’s Cl > OMe > F > Me. For these groups, in general the corresponding frequencies are in the order 4 >> 3 > 2. It would be interesting to know whether this corresponds to the ease of synthesis of these groups (in the general case) or whether other factors are at play.
In response to a query about whether the preferences have changed over time, I’ve generated the following image (click for bigger) that provides this information for the period 1990-2013 (the x-axis). The y-axis shows frequencies divided by the total number of substituted phenyls that year.It’s a bit hard to draw any conclusions, but possibly 4-nitrile is becoming more popular, along with 3-F, while 2-NO2 and 2,3,4-OMe are going down.
*A matched (molecular) series is a series of analogs with same scaffold but different R groups (all at the same position). In this context, each matched series contains only molecules from the same assay and paper.