Matched Molecular Reactants

What happens when you cross Matched Molecular Pair Analysis (MMPA) with reactions? Why, of course, you get a new paradigm in drug discovery, Matched Molecular Reactants!

Well, let’s think about it for a second. If you take the reactants and the products and look for matched molecular pairs combining both, what you will find are reactions that involve single R group transformations. We can call these Matched Molecular Reactants, but they are probably more commonly known as functional group transformations, e.g. -OH to -Cl.

So, what are the most common functional group transformations in a typical ELN? Well, I can’t show you that but I can show you the results when this analysis is applied to reactions in the US patent literature (this data courtesy of Daniel). The following table show SMILES for the R group together with the observed frequency for the 15 most common transformations:

*[N+](=O)[O-] --> *N                                    21456
*C --> *[H]                                             21165
*[H] --> *C                                             15583
*CC --> *[H]                                            12914
*C(=O)OC --> *C(=O)O                                    11729
*C(=O)OC(C)(C)C --> *[H]                                 9149
*NC(=O)OC(C)(C)C --> *N                                  8054
*C(=O)OCC --> *C(=O)O                                    7673
*Cc1ccccc1 --> *[H]                                      6695
*OC --> *O                                               6339
*[H] --> *Br                                             6141
*O --> *Cl                                               4852
*OCc1ccccc1 --> *O                                       4662
*[H] --> *CC                                             3980
*C(=O)O --> *C(=O)OC                                     3888

It seems that the majority of reactions tend to make molecules smaller. If this keeps up, we’ll soon be left with nothing!

2 thoughts on “Matched Molecular Reactants”

  1. Nice! Producing smaller compounds would agree with the general strategy of reducing lipophilicity and molecular obesity… (provided your patent dataset is relevant to pharmaceutical research)
    On the other hand, organic synthesis is generally tuned towards producing larger things (that’s why it’s called synthesis and not apo-synthesis 🙂 ).
    So I wonder what happens if you canonicalise the order (e.g. *H>>*C equals *C>>*H). Would the new transformation ranking agree with the ones coming from the MMPA of large corporate datasets? And if yes, why?

  2. Hi George. My final comment was somewhat tongue-in-cheek of course; several of the reactions are deprotections, and it must be that the protected molecule was purchased. Even to produce smaller compounds, pharma has to do synthesis like everyone else by building up from small building blocks.

    BTW, it’s worth noting that I’ve done nothing clever to the data to tidy it up, that is, some of the categories are supersets of others.

    Regarding the difference between the data in the patent literature vs ELNs, I might have a perspective on this by the end of the week. Apart from incorrect reactions extracted from the patents, I don’t expect it to be a whole lot different, but let’s see. Corporate ELNs versus academic ELNs might be a more interesting comparison…

Comments are closed.