As part of the BioCreative V competition, Daniel developed software to find chemical-disease relationships in PubMed abstracts. I’m going to describe a proof-of-concept that uses that code to identify new relationships extracted from the literature. This could be useful both for finding new adverse drug affects and for finding new therapeutic applications.
Most common relationships
Daniel ran the software over all PubMed abstracts in high-precision mode and found 1392503 putative relationships (of which 282604 were unique). To begin with, I looked at the most common relationships found. However learning that “alcohol is associated with alcoholism” and “cyanide is associated with poisoning” is not super-useful. It is unfortunately the case that the information about which you can be most confident (i.e. it is found multiple times) is also the least useful as by definition it’s already well known. Although actually I didn’t know the top relation found, that “streptozotocin is associated with diabetes”; it turns out that streptozotocin is used to produce an animal model for diabetes.
Searching for novel relationships
Really what’s most interesting are novel relationships, ones that haven’t previously been described. To find these I looked at any relationships attributed to this month (i.e. Sep 2015 at the time of writing) or later that were not in earlier abstracts. This gave 847 relationships. When I looked at the sentences associated with these relationships I found that 6 of them explicitly stated that this was the first report of a particular interaction and that in each case we identified the correct relationship.* (Just for interest, I searched the “known” relationships from September for similar phrases stating that they were the first report, but did not find any.)
26228174 D010269 D013262 paraquat TEN To our knowledge, this is the first case report of TEN related to paraquat Dermatology (Basel, Switzerland) Sep 2015 25619447 C079703 D000380 rufinamide agranulocytosis To the best of our knowledge, this is the first reported case of agranulocytosis induced by rufinamide. Brain & development Sep 2015 26356743 D011345 D016553 Fenofibrate Immune Thrombocytopenia A Case of Fenofibrate-Induced Immune Thrombocytopenia: First Report. Puerto Rico health sciences journal Sep 2015 26370487 D017706 D010996 lisinopril pleural effusion We report the first case of eosinophilic pleural effusion occurring due to lisinopril treatment. Revue des maladies respiratoires Sep 2015 25588686 C118667 D013262 Dronedarone Toxic Epidermal Necrolysis Toxic Epidermal Necrolysis During Dronedarone Treatment: First Report of a Severe Serious Adverse Event Of A New Antiarrhythmic Drug. Cardiovascular toxicology Oct 2015 26308264 C033249 D010024 HAR bone loss The current study describes for the first time that HAR inhibits receptor activator of nuclear factor ?B ligand (RANKL)-induced osteoclastogenesis in vitro and suppresses inflammation-induced bone loss in a mouse model. Journal of natural products Sep 2015
Then there are mentions of novel compounds, but I guess there are different degrees of novelty:
26386102 C530299 D050197 Vorapaxar atherosclerotic Vorapaxar is a novel antiplatelet agent that has demonstrated efficacy in reducing atherosclerotic events in patients with a history of American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists Oct 2015 25969859 C509120 D007249 2-Chloroacetamidine inflammation 2-Chloroacetamidine, a novel immunomodulator, suppresses antigen-induced mouse airway inflammation. Allergy Sep 2015
A number of other relationships mention potential as a therapeutic agent:
26201693 C005274 D013274 Naringin gastric carcinoma Thus, the present finding suggests that Naringin induced autophagy- mediated growth inhibition shows potential as an alternative therapeutic agent for human gastric carcinoma. International journal of oncology Sep 2015 26079694 D002762 D007889 vitamin D3 uterine fibroids (leiomyomas) To provide a detailed summary of current scientific knowledge on uterine fibroids (leiomyomas) in-vitro and in in-vivo animal models, as well as to postulate the potential role of vitamin D3 as an effective, inexpensive, safe, long-term treatment option for Fertility and sterility Sep 2015 26192096 C054989 D000544 sulfuretin Alzheimer's disease Our results also indicate that sulfuretin-induced induction of Nrf2-dependent HO-1 expression via the PI3K/Akt signaling pathway has preventive and/or therapeutic potential for the management of Alzheimer's disease. Neuroscience Sep 2015 26239378 D011374 D003093 progesterone UC Collagenase, progesterone, heparin, urokinase, nadh and adenosine drugs demonstrated potential for use in treatment of CRC and UC. Molecular medicine reports Oct 2015 26234785 C101789 D010523 SA4503 neuropathy , and the Sig-1R agonist SA4503 could serve as a potential candidate for the treatment of chemotherapeutic-induced neuropathy. Synapse (New York, N.Y.) Nov 2015 26301726 C550822 D007249 Fijiolide A inflammation Fijiolide A is a secondary metabolite isolated from a marine-derived actinomycete and displays inhibitory activity against TNF-α-induced activation of NFκB, an important transcription factor and a potential target for the treatment of different cancers and inflammation related diseases. Journal of the American Chemical Society Sep 2015 26245494 C469689 D009369 Tricetin cancers Tricetin, a natural flavonoid, was demonstrated to inhibit the growth of various cancers, but the effect of Expert opinion on therapeutic targets Oct 2015 26203774 C581182 D001943 DMDD breast cancer cells in vitro and further examined the molecular mechanisms of DMDD-induced apoptosis in human breast cancer cells. Oncotarget Sep 2015
Filtering using the CTD database
The Comparative Toxicogenomics Database (CTD) contains curated and inferred chemical-disease relationships (among other data) and is freely available to download. The latest update is from Aug 2015 and appears to contain 89039 unique curated relationships and 4.0 million inferred ones (I note that these figures do not agree with the ones reported by CTD so I could be mistaken).
If the novel relationships from Sep 2015 are filtered using the curated CTD set, 813 remain and none of the results above change (note that I didn’t take any advantage of the MESH hierarchy for this proof of concept). Of these, 254 are present in the much larger CTD inferred relationship set. Interestingly, the link between toxic epidermal necrolysis (TEN) and paraquat, first reported in Sep 2015, is one of these.
Conclusions
Hopefully the above discussion and results show the potential of this approach. To do this properly would probably require more work on the text-mining to target therapies (this was outside the scope of the BioCreative V competition) and a manual assessment of the quality of the results. If you’d like to collaborate on this, get in touch.
* Note: The format used is PubMed Id, Chemical MESH Id, Disease MESH Id, Chemical text, Disease Text, Relationship text, Journal, Publication Date (it may have appeared online prior to this)