Identifying novel chemical-disease relationships

roger_layoutAs part of the BioCreative V competition, Daniel developed software to find chemical-disease relationships in PubMed abstracts. I’m going to describe a proof-of-concept that uses that code to identify new relationships extracted from the literature. This could be useful both for finding new adverse drug affects and for finding new therapeutic applications.

Most common relationships

Daniel ran the software over all PubMed abstracts in high-precision mode and found 1392503 putative relationships (of which 282604 were unique). To begin with, I looked at the most common relationships found. However learning that “alcohol is associated with alcoholism” and “cyanide is associated with poisoning” is not super-useful. It is unfortunately the case that the information about which you can be most confident (i.e. it is found multiple times) is also the least useful as by definition it’s already well known. Although actually I didn’t know the top relation found, that “streptozotocin is associated with diabetes”; it turns out that streptozotocin is used to produce an animal model for diabetes.

Searching for novel relationships

Really what’s most interesting are novel relationships, ones that haven’t previously been described. To find these I looked at any relationships attributed to this month (i.e. Sep 2015 at the time of writing) or later that were not in earlier abstracts. This gave 847 relationships. When I looked at the sentences associated with these relationships I found that 6 of them explicitly stated that this was the first report of a particular interaction and that in each case we identified the correct relationship.* (Just for interest, I searched the “known” relationships from September for similar phrases stating that they were the first report, but did not find any.)

26228174	D010269	D013262	paraquat	TEN	To our knowledge, this is the first case report of TEN related to paraquat 	Dermatology (Basel, Switzerland)	Sep 2015
25619447	C079703	D000380	rufinamide	agranulocytosis	To the best of our knowledge, this is the first reported case of agranulocytosis induced by rufinamide.	Brain & development	Sep 2015
26356743	D011345	D016553	Fenofibrate	Immune Thrombocytopenia	A Case of Fenofibrate-Induced Immune Thrombocytopenia: First Report.	Puerto Rico health sciences journal	Sep 2015
26370487	D017706	D010996	lisinopril	pleural effusion	We report the first case of eosinophilic pleural effusion occurring due to lisinopril treatment.	Revue des maladies respiratoires	Sep 2015
25588686	C118667	D013262	Dronedarone	Toxic Epidermal Necrolysis	Toxic Epidermal Necrolysis During Dronedarone Treatment: First Report of a Severe Serious Adverse Event Of A New Antiarrhythmic Drug.	Cardiovascular toxicology	Oct 2015
26308264	C033249	D010024	HAR	bone loss	The current study describes for the first time that HAR inhibits receptor activator of nuclear factor ?B ligand (RANKL)-induced osteoclastogenesis in vitro and suppresses inflammation-induced bone loss in a mouse model.	Journal of natural products	Sep 2015

Then there are mentions of novel compounds, but I guess there are different degrees of novelty:

26386102	C530299	D050197	Vorapaxar	atherosclerotic	Vorapaxar is a novel antiplatelet agent that has demonstrated efficacy in reducing atherosclerotic events in patients with a history of 	American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists	Oct 2015
25969859	C509120	D007249	2-Chloroacetamidine	inflammation	2-Chloroacetamidine, a novel immunomodulator, suppresses antigen-induced mouse airway inflammation.	Allergy	Sep 2015

A number of other relationships mention potential as a therapeutic agent:

26201693	C005274	D013274	Naringin	gastric carcinoma	Thus, the present finding suggests that Naringin induced autophagy- mediated growth inhibition shows potential as an alternative therapeutic agent for human gastric carcinoma.	International journal of oncology	Sep 2015
26079694	D002762	D007889	vitamin D3	uterine fibroids (leiomyomas)	To provide a detailed summary of current scientific knowledge on uterine fibroids (leiomyomas) in-vitro and in in-vivo animal models, as well as to postulate the potential role of vitamin D3 as an effective, inexpensive, safe, long-term treatment option for 	Fertility and sterility	Sep 2015
26192096	C054989	D000544	sulfuretin	Alzheimer's disease	Our results also indicate that sulfuretin-induced induction of Nrf2-dependent HO-1 expression via the PI3K/Akt signaling pathway has preventive and/or therapeutic potential for the management of Alzheimer's disease.	Neuroscience	Sep 2015
26239378	D011374	D003093	progesterone	UC	Collagenase, progesterone, heparin, urokinase, nadh and adenosine drugs demonstrated potential for use in treatment of CRC and UC.	Molecular medicine reports	Oct 2015
26234785	C101789	D010523	SA4503	neuropathy	, and the Sig-1R agonist SA4503 could serve as a potential candidate for the treatment of chemotherapeutic-induced neuropathy.	Synapse (New York, N.Y.)	Nov 2015
26301726	C550822	D007249	Fijiolide A	inflammation	Fijiolide A is a secondary metabolite isolated from a marine-derived actinomycete and displays inhibitory activity against TNF-α-induced activation of NFκB, an important transcription factor and a potential target for the treatment of different cancers and inflammation related diseases.	Journal of the American Chemical Society	Sep 2015
26245494	C469689	D009369	Tricetin	cancers	Tricetin, a natural flavonoid, was demonstrated to inhibit the growth of various cancers, but the effect of 	Expert opinion on therapeutic targets	Oct 2015
26203774	C581182	D001943	DMDD	breast cancer	 cells in vitro and further examined the molecular mechanisms of DMDD-induced apoptosis in human breast cancer cells.	Oncotarget	Sep 2015

Filtering using the CTD database

The Comparative Toxicogenomics Database (CTD) contains curated and inferred chemical-disease relationships (among other data) and is freely available to download. The latest update is from Aug 2015 and appears to contain 89039 unique curated relationships and 4.0 million inferred ones (I note that these figures do not agree with the ones reported by CTD so I could be mistaken).

If the novel relationships from Sep 2015 are filtered using the curated CTD set, 813 remain and none of the results above change (note that I didn’t take any advantage of the MESH hierarchy for this proof of concept). Of these, 254 are present in the much larger CTD inferred relationship set. Interestingly, the link between toxic epidermal necrolysis (TEN) and paraquat, first reported in Sep 2015, is one of these.


Hopefully the above discussion and results show the potential of this approach. To do this properly would probably require more work on the text-mining to target therapies (this was outside the scope of the BioCreative V competition) and a manual assessment of the quality of the results. If you’d like to collaborate on this, get in touch.

* Note: The format used is PubMed Id, Chemical MESH Id, Disease MESH Id, Chemical text, Disease Text, Relationship text, Journal, Publication Date (it may have appeared online prior to this)