In an earlier post I looked at the chemicals found in Shakespeare’s plays. Following on from the improved text-mining of diseases described in the previous post, let’s look at diseases this time.
First of all, I should point out that it is actually useful to us to run LeadMine on arbitary texts. It helps to find errors in the dictionaries we use, but also makes us aware that certain terms may be fine if used to mine PubMed abstracts or patents, but may produce false positives on general text.
Here are the most common disease terms found in Shakespeare’s plays, with counts, MESH Id, then the text as it appeared in the play:
176 D010146 : ('pains', 91) ('pain', 66) ('painful', 8) ('sorely', 6) ('aches', 5) 124 D000435 : ('drunk', 67) ('drunken', 19) ('drunkard', 13) ('drooping', 9) ('drunkards', 6) ('drunkenness', 4) ('besotted', 2) ('intemperance', 2) ('being drunk', 1) ('buzzed', 1) 109 D004332 : ('drown', 74) ('drowned', 22) ('drowning', 9) ('drowns', 4) 107 D010930 : ('plague', 85) ('plagues', 13) ('the plague', 9) 68 D020521 : ('stroke', 41) ('strokes', 23) ('apoplexy', 4) 48 D001733 : ('sting', 26) ('bites', 11) ('stings', 8) ('stinging', 3) 44 D018746 : ('sirs', 44) 39 D006470 : ('bleeding', 31) ('bleeds', 7) ('loss of blood', 1) 34 D005076 : ('rash', 33) ('a rash', 1) 33 D003221 : ('confusion', 33) 32 D013217 : ('starve', 19) ('famine', 11) ('starving', 1) ('starves', 1) 29 D002921 : ('scars', 15) ('scar', 10) ('cicatrice', 3) ('cicatrices', 1) 28 D003141 : ('infect', 21) ('infectious', 6) ('infecting', 1) 27 D018908 : ('weakness', 25) ('decrepit', 2) 27 D012614 : ('scurvy', 27) 27 D003288 : ('bruised', 9) ('bruise', 8) ('black and blue', 4) ('bruising', 4) ('contusions', 1) ('bruises', 1) 27 D002056 : ('burns', 27) 25 D034381 : ('deaf', 24) ('hard of hearing', 1) 23 D005334 : ('fever', 22) ('fevers', 1) 20 D004487 : ('swelling', 19) ('dropsy', 1) 19 D005221 : ('wearied', 9) ('weariness', 3) ('wearies', 2) ('weariest', 1) ('wearying', 1) ('wearily', 1) ('languor', 1) ('unwearied', 1) 19 D001237 : ('smother', 15) ('suffocating', 1) ('smothered', 1) ('suffocation', 1) ('smothering', 1) 18 D014202 : ('trembling', 17) ('tremor', 1) 18 D007239 : ('infection', 17) ('infections', 1) 18 D004216 : ('distemper', 18)
This already has highlighted some changes that we need to make (and have already made). For example, SIRS should only be matched uppercase, “unwearied” may redirect to “wearied” on Wikipedia but it’s the opposite, “besotted” no longer means drunk (except with love) and “buzzed” is probably not a useful synonym. 🙂
But overall, the software seems to be in good health, although Shakespeare’s protagonists may not be. Don’t they all die at the end? [SPOILER ALERT]
3 D058734 : ('bleed to death', 3) 3 D003645 : ('sudden death', 3)
Image credit: Ryan Ruppe on Flickr