{"id":2714,"date":"2018-02-21T13:01:49","date_gmt":"2018-02-21T13:01:49","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=2714"},"modified":"2018-02-27T13:23:28","modified_gmt":"2018-02-27T13:23:28","slug":"textmining-blazons","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2018\/02\/21\/textmining-blazons\/","title":{"rendered":"Textmining blazons"},"content":{"rendered":"<p>When Roger described his latest project, a grammar for the heraldic language known as blazonry, I immediately said &#8220;what a great idea!&#8221;. Well, not exactly. But it turns out that it&#8217;s a nice example of how our text-mining software <a href=\"https:\/\/www.nextmovesoftware.com\/leadmine.html\">LeadMine<\/a> isn&#8217;t just restricted to chemical and biological entities but can be used for a wide variety of tasks, limited solely by the user&#8217;s imagination.<\/p>\n<figure id=\"attachment_2718\" aria-describedby=\"caption-attachment-2718\" style=\"width: 272px\" class=\"wp-caption alignright\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php_.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2718 size-medium\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php_-272x300.png\" alt=\"\" width=\"272\" height=\"300\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php_-272x300.png 272w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php_.png 354w\" sizes=\"(max-width: 272px) 100vw, 272px\" \/><\/a><figcaption id=\"caption-attachment-2718\" class=\"wp-caption-text\">Argent a chevron azure between three roundels gules each charged with a mullet or<\/figcaption><\/figure>\n<p>So what is this blazonry I speak of? It&#8217;s the language used in blazons, a formal specification of the composition of a coat of arms, written in a sort of English that Shakespeare would have found old-fashioned. &#8220;Three lions rampant&#8221; is the classic example, which is somewhat intelligible, but how about &#8220;argent a chevron azure between three roundels gules each charged with a mullet or&#8221;?<\/p>\n<p>While software exists for interpreting and displaying such blazons (check out the excellent <a href=\"http:\/\/web.meson.org\/pyBlazon\/\">pyBlazon<\/a> which was used to generate the images on this page), what if you wanted to mine a text corpus to find examples? Clearly you need to use LeadMine along with our newly-developed blason.cfx grammar. In fact, by combining LeadMine with pyBlazon, you can identify blazons in text and automatically pop-up the corresponding coat-of-arms when you mouse-over.<\/p>\n<figure id=\"attachment_2717\" aria-describedby=\"caption-attachment-2717\" style=\"width: 272px\" class=\"wp-caption alignright\"><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php2_.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2717 size-medium\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php2_-272x300.png\" alt=\"\" width=\"272\" height=\"300\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php2_-272x300.png 272w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/02\/emblazon.php2_.png 354w\" sizes=\"(max-width: 272px) 100vw, 272px\" \/><\/a><figcaption id=\"caption-attachment-2717\" class=\"wp-caption-text\">Per fess gules and azure, three crescents or<\/figcaption><\/figure>\n<p>To test out the grammar I ran it over the contents of <a href=\"https:\/\/www.gutenberg.org\/\">Project Gutenberg<\/a>, which contains out-of-copyright books. The motherlode is hit where people have written books on the topic: e.g. &#8220;<a href=\"http:\/\/web.meson.org\/blazonserver\/index.php?blazon=gules%2C+within+a+bordure+azure&#038;format=png\">gules, within a bordure azure<\/a>&#8221; from <a href=\"http:\/\/www.gutenberg.org\/files\/16273\/16273-h\/16273-h.htm\">The Manual of Heraldry<\/a> (&#8220;Being a Concise Description of the Several Terms Used, and Containing a Dictionary of Every Designation in the Science&#8221;), &#8220;<a href=\"http:\/\/web.meson.org\/blazonserver\/index.php?blazon=per+fesse+sable+and+gules&#038;format=png\">per fesse sable and gules<\/a>&#8221; from <a href=\"http:\/\/www.gutenberg.org\/files\/23186\/23186-h\/23186-h.htm\">The Handbook to English Heraldry<\/a> (1914, by the author of &#8220;the monumental brasses of England\u201d), &#8220;<a href=\"http:\/\/web.meson.org\/blazonserver\/index.php?blazon=quarterly%2C+or+and+gules%2C+a+plate&#038;format=png\">quarterly, or and gules, a plate<\/a>&#8221; from <a href=\"http:\/\/www.gutenberg.org\/files\/38951\/38951-h\/38951-h.htm\">The Curiosities of Heraldry<\/a> or &#8220;<a href=\"http:\/\/web.meson.org\/blazonserver\/index.php?blazon=Per+chevron+sable+and+barry+wavy+of+six%2C+argent+and+azure&#038;format=png\">Per chevron sable and barry wavy of six, argent and azure<\/a>&#8221; from <a href=\"http:\/\/www.gutenberg.org\/files\/41617\/41617-h\/41617-h.htm\">A Complete Guide to Heraldry<\/a> (1909, images by the herald painter to the Lyon court). But the majority of hits are a single phrase from novels, or several phrases from historical books (e.g. &#8220;Per fess gules and azure, three crescents or&#8221; from <a href=\"http:\/\/www.gutenberg.org\/ebooks\/32675\">The Strife of the Roses and Days of the Tudors in the West<\/a>).<\/p>\n<p>So, remember, for all your heraldic text-mining needs choose LeadMine. (Also does chemistry. And biology.)<\/p>\n<p><b>Notes:<\/b><br \/>\nIf you have LeadMine, you can reproduce this work by creating a configuration file such as the following, blazon.cfg:<\/p>\n<pre>\r\n[dictionary]\r\n  location  blazon.cfx\r\n  entityType  Blazonry\r\n  htmlColor  #ff4500\r\n  caseSensitive  false\r\n  useSpellingCorrection  true\r\n  allowSpellingCorrectionEvenAfterExactMatch  false\r\n  maxCorrectionDistance  3\r\n  minimumCorrectedEntityLength  18\r\n<\/pre>\n<p>Next run LeadMine over a folder containing the downloaded contents from Project Gutenberg:<\/p>\n<pre>java -jar leadmine.jar -c blazon.cfg -t 8 -R \/home\/noel\/LargeData\/ProjectGutenberg\/aleph.gutenberg.org > blazonry.out<\/pre>\n<p>With 8 threads, this took about 2.5h. Finally, if you want to see coats-of-arms pop-up when you mouseover blazons in the example LeadMine applications, you will need to set up a pyBlazon server and point LeadMine to it by adding a line such as the following to patfetch.cfg:<\/p>\n<pre>depictionService=Blazonry=\/blazonry?blazon=%t<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>When Roger described his latest project, a grammar for the heraldic language known as blazonry, I immediately said &#8220;what a great idea!&#8221;. Well, not exactly. But it turns out that it&#8217;s a nice example of how our text-mining software LeadMine isn&#8217;t just restricted to chemical and biological entities but can be used for a wide &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2018\/02\/21\/textmining-blazons\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Textmining blazons<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2714"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=2714"}],"version-history":[{"count":17,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2714\/revisions"}],"predecessor-version":[{"id":2734,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2714\/revisions\/2734"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=2714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=2714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=2714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}