Text Mining for a Worthy Cause

I recently received an e-mail from the charity “jeans for genes” introducing me to “black bone disease“, a rare genetic disease without a cure. It is more formally known as “Alkaptonuria” (OMIM entry) and is a defect in the homogentisate 1,2-dioxygenase gene (HGD) which leads to a toxic build-up of homogentisic acid in the blood, causing the symptoms of the disease.

Interestingly a re-purposed herbicide, nitisinone, is currently being investigated as a possible treatment for the disease based on its previous re-purposing as a therapy in related genetic disorder, Type 1 Tyrosinemia.

The story starts in 1977 when a researcher in California observed that relatively few weeds were growing under the bottlebrush (Callistemon) plants in his backyard. Analytical chemistry of the soil fractions revealed the active compound to be the natural product Leptospermone. Traditional ligand based optimization of this compound led to the effective herbicides mesotrione (Syngenta’s Callisto) and nitisinone being synthesized and tested in 1984, with the first patents on this class of herbicides appearing in 1986 (e.g. US 4780127). At the point these patents were filed/granted, the mechanism of action and protein target weren’t yet known, although they were experimentally proven to be toxic to plants but harmless to mammals. Much later it was discovered that these compounds worked by inhibiting the enzyme 4-hydroxyphenylpyruvate dioxygenase (HPPD) which blocks the synthesis of chlorophyll and leads to “bleaching” and eventual plant death.

It is the role that HPPD plays in human metabolism that make these herbicides so interesting as therapeutic agents. The pathway diagram below describes the five enzymatic steps (arrows) in the degradation metabolism of tyrosine.

Defects in these various enzymes responsible for each step lead to a number of related diseases: Problems with the first step, tyrosine-transaminase, cause type 2 tyrosinemia; the second step, p-Hydroxylphenylpyruvate-dioxygenase (HPPD) is our herbicide target for which defects cause type 3 tyrosinemia; step three, homogentisate dioxygenase (HGD) causes alkaptonuria (aka black bone disease); and step 5, 4-fumaryl-acetoacetate hydrolase causes type 1 tyrosinemia.

In the case of type 1 tyrosinemia, it was noticed that those patients with active HPPD had a more severe form of the disease, so it was hypothesized that a HPPD inhibitor may be beneficial. At the time Zeneca worked on both pharmaceuticals and crop protection and were able to evaluate their proven-safe herbicide nitisinone directly in the clinic. In what seems incredible by the standards of today’s pharmaceutical pipelines, their US 5550165 patent filing describes the administration to, and recovery of, sick infants and children, where it is now more usual for a drug candidate to spend years in phase I, II and III clinical trials after a patent is granted before it gets approved by the FDA.

HPPD inhibitors can be anticipated to treat alkaptonuria by much the same mechanism:
By blocking the formation of the toxic metabolite homogentisate, and causing tyrosine
to be metabolised via alternate routes.

One of the goals of modern text mining is to automatically discover links such as those between the above two patents, US4780127 and US5550165. Unfortunately, a range of technical issues complicate the process: In common with many pharmaceutical patent filings, the drug target is not known or not mentioned, so it is necessary to identify and annotate compound classes or modes of action such as “kinase inhibitor”, “beta-blocker”, “herbicide” or “antibiotic”. The large number of synonyms and typographical variants of enzyme and disease names requires the use of synonym dictionaries or ontologies to recognize that “tyrosine transaminase” is the same entity as “tyrosine aminotransferase” is the same as “EC 2.6.1.5“. Finally, as revealed by the mistake “tyosinemia” in the title of the above US 5550165, documents in real life frequently contain spelling errors, making it impossible to find the most relevant documents when searching for a keyword like “tyrosinemia” (without automatic spelling correction).

These are exactly the types of challenges our LeadMine software attempts to tackle.