Chemically-related patent families

The term patent family is generally used to describe a set of patents that cover the same invention but which are filed with different patent authorities. Here instead we look at finding groups of patents within a single authority (the USPTO) where the patents are linked by chemical structures.

It turns out that it is not usual for essentially the same chemical information to appear in multiple patent applications within the USPTO, often with the same or similar title. I’m not sure of the reason for this – perhaps corrections, rewrites, or separate applications for different targets. In any case, it is useful to identify such cases for the purposes of linking or collation, or indeed to discard if looking for truly novel chemistry.

Here’s an approach that appears to work reasonably well: we regard as “chemically-related” two patents that share at least N key (but rare) molecules in common. All that remains is to define “N”, “key”, and “rare”:

  • A key compound is one associated with a compound number (which may be in the text or a ChemDraw file) or associated with an experimental property (taken from a table, and possibly described in terms of R groups that need to be attached to a scaffold).
  • A rare molecule is one that appears in 30 or fewer patents.
  • N was defined as 8.

Naturally, these cutoffs could benefit from some tweaking with a testset (e.g. patents with the same title and assignee), but for the purposes of this blog post they seem to work well. Here is a typical example of a highly-connected chemical patent family, where the labels are the number of key (but rare) molecules in common:

US Patent Application Title
US20030032623A1 Tnf-alpha production inhibitors
US20050014800A1 Angiogenesis inhibitor
US20060229342A1 TNF-a production inhibitors
US20060241155A1 TNF-alpha production inhibitors
US20080161270A1 Angiogenesis inhibitors
US20080182881A1 TNF-alpha production inhibitors
US20100016380A1 TNF-alpha production inhibitors

These patents appear to be all from Santen Pharmaceutical Co., though the company name is not listed as assignee on some of the patents. Equally interesting are those related families where the members are less highly connected. Here’s an example from GSK along with representative examples of the patent titles:

US Patent Application Title
US20130012491A1 PYRIMIDINE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS
US20120094979A1 THIAZOLE OR THIADIZALOE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS
US20100273771A1 OXADIAZOLE DERIVATIVES ACTIVE ON SPHINGOSINE-1-PHOSPHATE (SIP)
US20100174065A1 COMPOUNDS
US20120101083A1 S1P1 AGONISTS COMPRISING A BICYCLIC N-CONTAINING RING

As ever, if this sparks some ideas and you’re interested in collaborating, drop us a line.

6 thoughts on “Chemically-related patent families”

  1. Exactly. The analysis above is an exact identity search (based on canonical SMILES as extracted), but a similarity search would clearly be a next step. A good question to ask is whether it would be based on the whole molecule or a Murcko scaffold. Of course, we also textmine the targets described, and so an even more powerful search would be to combine both sets of information.

  2. Hello, Interesting project – one reason you may see the same compounds (other than different uses, etc.) is that all the listed records retrieved in the example above are published patent *applications*, rather than granted patents. This is also why the assignee may not be listed, as the USPTO only recently added assignees/probable assignees to published applications.

    1. Certainly. And thanks for the point about assignees.

      For the benefit of a future reader wondering why we are looking at applications rather than grants, applications are of particular interest due to their timeliness, and so anyone interested in keeping up with the Joneses (“current awareness”) needs to keep an eye on applications.

Leave a Reply

Your email address will not be published. Required fields are marked *