Looking for leads in a 2D activity matrix

There is a nice dataset in the supporting information for Pickett et al. that illustrates how the sort order of rows and columns in an activity heatmap can hinder/help the identification of gaps which should be filled.

First of all, here is the heatmap in question, a 50×50 array. It show activity values for a series of analogs with the same scaffold but 50 different R groups at R1 and R2. Black squares are gaps where activity information isn’t available, white squares indicate inactive molecules, while the remaining colours indicate levels of activity from highly active (green) to lowly active (red).bitmap
The heatmap is depicted as shown in Figure 3 (top) in the paper where the rows (and separately the columns) are sorted by the most active molecule in the row. This has the effect of clustering the green squares in the bottom left of the array, and would suggest that the molecules in grid positions (2, 1) and (4, 1) should be tested next.

However, I would suggest trying (11, 2) and (20, 2) instead, and would say that in particular the molecule corresponding to (4, 1) is very unlikely to be worthwhile pursuing.

How so? Well, each row (and separately each column) can be considered a matched series; that is, the entire molecule is unchanged apart from a single R group. With this in mind, the sort order of the 2D array should be based on properties of these series rather than of any individual molecules in them, and especially not the extreme value which is unlikely to be representative of the row/column as a whole, and may indeed be a fluctuation due to experimental error.

A simple way to do this is to choose that row (and separately that column) with the largest number of filled boxes, and measure the average relative shift (that is, get the average deviation) of each row to it. If the sort order is based on this shift, the following heatmap is obtained:bitmap2This image has a much clearer band structure. The two gaps at (2, 2) and (5, 2) are much better candidates than those inferred from the original heatmap. In particular, the gap at (4, 1) in the original is now at (44, 3) and can be seen, in context, to be a poor choice despite the green box in the same column.

If you are interested in more information on tools for SAR transfer, get in touch…

R or S? Let’s vote

requiresRuleTwoForTwoLigands
CC[C@](CO)([H])[14CH2]C
The CIP (Cahn-Ingold-Prelog) priority rules are used to assign R and S labels to stereocentres. However it is known to be very prone to mis-implementation:
The CIP System Again:? Respecting Hierarchies Is Always a Must

Through our work on OEChem, OPSIN and Centres we have independently written 3 different CIP implementations and hence discussion of the corner cases of CIP inevitable becomes a heated coffee time discussion.

This deceptively simple case on the right turns out to give different results in many implementations.

Which “ligand” do you think has highest priority?

If you said [CH2][OH] you’d be right, but the majority of implementations disagree:

Toolkit/application Assignment
Marvin 2014.11.3.0 S
ChemBioDraw 12 S
RDKit (HEAD) S
Centres (HEAD) R
CACTVS (Web Sketcher) R [updated 23/02/2015]
DataWarrior (latest) S
AccelrysDraw 4.2 S
OEChem 2014.Oct.2 S
ChemDoodle 7.0.2 S
OPSIN 1.6 R
CDK 1.5.10 S

We can speculate that the cause of the disagreement may be that the left and right side of the molecule are symmetrical by atomic number (rule 1) and that hence rule 2 (atomic mass) is then being erroneously applied to ALL ligands… while correct implementations will only apply rule 2 to split the tie between the two ligands that could not be determined by rule 1 (*). Hence this case should be assigned R.

* “precedence (priority) of an atom in a group established by a rule
does not change on application of a subsequent rule.” (IUPAC recommendations)

Coming soon: Matsy in StarDrop

For the last few months, we have been working together with Optibrium to integrate Matsy into their StarDrop platform for lead optimisation, the result of which will be released later this year:
StarDrop_Logo_300

Optibrium™ and NextMove Software, developers of software and chemoinformatics solutions for drug discovery, today announced an agreement to collaborate on the integration of NextMove Software’s Matsy technology with Optibrium’s StarDrop software suite. This combination will help to guide scientists’ optimisation strategies to quickly identify compounds with a high chance of success for their drug discovery projects.

The Matsy algorithm has been developed by NextMove Software to generate and search databases of matched molecular series to identify chemical substitutions that are most likely to improve target activity (J. Med. Chem., 2014, 57(6), pp 2704–2713). This goes beyond conventional ‘matched molecular pair analysis’ by using data from longer series of matched compounds (and not just pairs) to make more relevant predictions for a particular chemical series of interest. As part of the collaboration with Optibrium, Matsy will be applied in StarDrop’s Nova™ module, which automatically generates new compound structures to stimulate the search for optimisation strategies related to initial hit or lead compounds. StarDrop’s unique capabilities for multi-parameter optimisation and predictive modelling will enable efficient prioritisation of the resulting ideas to identify high quality compounds with the best chance of success.

For further information, see the full press release.