Our latest work has just been published in the Journal of Medicinal Chemistry. It’s a collaboration with AstraZeneca Mölndal on an algorithm (which we call Matsy) that predicts R groups that improve biological activity, given some existing SAR information at the same R group position:
N.M. O’Boyle, J. Boström, R.A. Sayle, A. Gill. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity J. Med. Chem. 2014, In press.
Here’s a deliberately basic example from the paper. Imagine that you have synthesised and tested three alkane R groups for some scaffold and found that the pIC50s are in the following order (bigger is better): propyl > ethyl > methyl. What should you make next? Kind of obvious, you might say: a longer alkane. Using ChEMBL data and the Matsy algorithm described in the paper, we find that the most likely R group to increase activity is n-hexyl, which increased activity in this situation 75% of 53 times.
But about what ethyl > propyl > methyl? Now there is no correlation with molecular weight. Perhaps ethyl is just the right size, while methyl is too small but propyl is too big. In any case, the question remains: what should you try next? According to Matsy, tert-butyl is most likely to increase activity, based on 39% of 23 times.
To a large extent the algorithm is just doing what a medicinal chemist would; except that where a medicinal chemist would draw on previous experience or intuition, Matsy works out the answer from a database of previous work. To make up your mind about a particular prediction, you can always look at where the data came from and decide if it’s applicable in your case.
For more information, check out the paper (*). The talk below summarises the key points; I gave it a few weeks ago at the 1st Joint CICAG and Cambridge Cheminformatics Network Meeting (hosted by the CCDC):
* The paper will be made freely available soon. Until then, we are allowed to give 50 copies away so if you don’t have access to the journal, email email@example.com if you want one.