Chemically-related patent families

The term patent family is generally used to describe a set of patents that cover the same invention but which are filed with different patent authorities. Here instead we look at finding groups of patents within a single authority (the USPTO) where the patents are linked by chemical structures.

It turns out that it is not unusual for essentially the same chemical information to appear in multiple patent applications within the USPTO, often with the same or similar title. I’m not sure of the reason for this – perhaps corrections, rewrites, or separate applications for different targets. In any case, it is useful to identify such cases for the purposes of linking or collation, or indeed to discard if looking for truly novel chemistry.

Here’s an approach that appears to work reasonably well: we regard as “chemically-related” two patents that share at least N key (but rare) molecules in common. All that remains is to define “N”, “key”, and “rare”:

  • A key compound is one associated with a compound number (which may be in the text or a ChemDraw file) or associated with an experimental property (taken from a table, and possibly described in terms of R groups that need to be attached to a scaffold).
  • A rare molecule is one that appears in 30 or fewer patents.
  • N was defined as 8.

Naturally, these cutoffs could benefit from some tweaking with a testset (e.g. patents with the same title and assignee), but for the purposes of this blog post they seem to work well. Here is a typical example of a highly-connected chemical patent family, where the labels are the number of key (but rare) molecules in common:

US Patent Application Title
US20030032623A1 Tnf-alpha production inhibitors
US20050014800A1 Angiogenesis inhibitor
US20060229342A1 TNF-a production inhibitors
US20060241155A1 TNF-alpha production inhibitors
US20080161270A1 Angiogenesis inhibitors
US20080182881A1 TNF-alpha production inhibitors
US20100016380A1 TNF-alpha production inhibitors

These patents appear to be all from Santen Pharmaceutical Co., though the company name is not listed as assignee on some of the patents. Equally interesting are those related families where the members are less highly connected. Here’s an example from GSK along with representative examples of the patent titles:

US Patent Application Title
US20130012491A1 PYRIMIDINE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS
US20120094979A1 THIAZOLE OR THIADIZALOE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS
US20100273771A1 OXADIAZOLE DERIVATIVES ACTIVE ON SPHINGOSINE-1-PHOSPHATE (SIP)
US20100174065A1 COMPOUNDS
US20120101083A1 S1P1 AGONISTS COMPRISING A BICYCLIC N-CONTAINING RING

As ever, if this sparks some ideas and you’re interested in collaborating, drop us a line.