The term patent family is generally used to describe a set of patents that cover the same invention but which are filed with different patent authorities. Here instead we look at finding groups of patents within a single authority (the USPTO) where the patents are linked by chemical structures.
It turns out that it is not unusual for essentially the same chemical information to appear in multiple patent applications within the USPTO, often with the same or similar title. I’m not sure of the reason for this – perhaps corrections, rewrites, or separate applications for different targets. In any case, it is useful to identify such cases for the purposes of linking or collation, or indeed to discard if looking for truly novel chemistry.
Here’s an approach that appears to work reasonably well: we regard as “chemically-related” two patents that share at least N key (but rare) molecules in common. All that remains is to define “N”, “key”, and “rare”:
- A key compound is one associated with a compound number (which may be in the text or a ChemDraw file) or associated with an experimental property (taken from a table, and possibly described in terms of R groups that need to be attached to a scaffold).
- A rare molecule is one that appears in 30 or fewer patents.
- N was defined as 8.
Naturally, these cutoffs could benefit from some tweaking with a testset (e.g. patents with the same title and assignee), but for the purposes of this blog post they seem to work well. Here is a typical example of a highly-connected chemical patent family, where the labels are the number of key (but rare) molecules in common:
US Patent Application | Title |
---|---|
US20030032623A1 | Tnf-alpha production inhibitors |
US20050014800A1 | Angiogenesis inhibitor |
US20060229342A1 | TNF-a production inhibitors |
US20060241155A1 | TNF-alpha production inhibitors |
US20080161270A1 | Angiogenesis inhibitors |
US20080182881A1 | TNF-alpha production inhibitors |
US20100016380A1 | TNF-alpha production inhibitors |
These patents appear to be all from Santen Pharmaceutical Co., though the company name is not listed as assignee on some of the patents. Equally interesting are those related families where the members are less highly connected. Here’s an example from GSK along with representative examples of the patent titles:
US Patent Application | Title |
---|---|
US20130012491A1 | PYRIMIDINE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS |
US20120094979A1 | THIAZOLE OR THIADIZALOE DERIVATIVES FOR USE AS SPHINGOSINE 1-PHOSPHATE 1 (S1P1) RECEPTOR AGONISTS |
US20100273771A1 | OXADIAZOLE DERIVATIVES ACTIVE ON SPHINGOSINE-1-PHOSPHATE (SIP) |
US20100174065A1 | COMPOUNDS |
US20120101083A1 | S1P1 AGONISTS COMPRISING A BICYCLIC N-CONTAINING RING |
As ever, if this sparks some ideas and you’re interested in collaborating, drop us a line.