Textmining blazons

When Roger described his latest project, a grammar for the heraldic language known as blazonry, I immediately said “what a great idea!”. Well, not exactly. But it turns out that it’s a nice example of how our text-mining software LeadMine isn’t just restricted to chemical and biological entities but can be used for a wide variety of tasks, limited solely by the user’s imagination.

Argent a chevron azure between three roundels gules each charged with a mullet or

So what is this blazonry I speak of? It’s the language used in blazons, a formal specification of the composition of a coat of arms, written in a sort of English that Shakespeare would have found old-fashioned. “Three lions rampant” is the classic example, which is somewhat intelligible, but how about “argent a chevron azure between three roundels gules each charged with a mullet or”?

While software exists for interpreting and displaying such blazons (check out the excellent pyBlazon which was used to generate the images on this page), what if you wanted to mine a text corpus to find examples? Clearly you need to use LeadMine along with our newly-developed blason.cfx grammar. In fact, by combining LeadMine with pyBlazon, you can identify blazons in text and automatically pop-up the corresponding coat-of-arms when you mouse-over.

Per fess gules and azure, three crescents or

To test out the grammar I ran it over the contents of Project Gutenberg, which contains out-of-copyright books. The motherlode is hit where people have written books on the topic: e.g. “gules, within a bordure azure” from The Manual of Heraldry (“Being a Concise Description of the Several Terms Used, and Containing a Dictionary of Every Designation in the Science”), “per fesse sable and gules” from The Handbook to English Heraldry (1914, by the author of “the monumental brasses of England”), “quarterly, or and gules, a plate” from The Curiosities of Heraldry or “Per chevron sable and barry wavy of six, argent and azure” from A Complete Guide to Heraldry (1909, images by the herald painter to the Lyon court). But the majority of hits are a single phrase from novels, or several phrases from historical books (e.g. “Per fess gules and azure, three crescents or” from The Strife of the Roses and Days of the Tudors in the West).

So, remember, for all your heraldic text-mining needs choose LeadMine. (Also does chemistry. And biology.)

If you have LeadMine, you can reproduce this work by creating a configuration file such as the following, blazon.cfg:

  location  blazon.cfx
  entityType  Blazonry
  htmlColor  #ff4500
  caseSensitive  false
  useSpellingCorrection  true
  allowSpellingCorrectionEvenAfterExactMatch  false
  maxCorrectionDistance  3
  minimumCorrectedEntityLength  18

Next run LeadMine over a folder containing the downloaded contents from Project Gutenberg:

java -jar leadmine.jar -c blazon.cfg -t 8 -R /home/noel/LargeData/ProjectGutenberg/aleph.gutenberg.org > blazonry.out

With 8 threads, this took about 2.5h. Finally, if you want to see coats-of-arms pop-up when you mouseover blazons in the example LeadMine applications, you will need to set up a pyBlazon server and point LeadMine to it by adding a line such as the following to patfetch.cfg: