{"id":239,"date":"2012-12-12T23:43:02","date_gmt":"2012-12-12T23:43:02","guid":{"rendered":"http:\/\/nextmovesoftware.com\/blog\/?p=239"},"modified":"2015-06-22T17:03:45","modified_gmt":"2015-06-22T16:03:45","slug":"making-sense-of-patent-tables","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2012\/12\/12\/making-sense-of-patent-tables\/","title":{"rendered":"Making Sense of Patent Tables"},"content":{"rendered":"<p>Tabular data in patents is a useful source of experimental data and chemical structures. USPTO patents are available <a title=\"Google patents bulk download\" href=\"http:\/\/www.google.com\/googlebooks\/uspto-patents-grants-text.html\">back to 1976<\/a> in formats where tables are explicitly annotated. For more recent patents these are XML tables similar in structure to what would be expected in HTML. Unfortunately the format used from 1976-2000 is not quite so straightforward to interpret leading to naive interpretations producing output that does not at all resemble the actual table, often with chemical name fragments scattered:<\/p>\n<p><a href=\"http:\/\/patft.uspto.gov\/netacgi\/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=4633008.PN.&amp;OS=PN\/4633008&amp;RS=PN\/4633008\">USPTO Patent Full-Text and Image Database:<\/a><\/p>\n<p><a href=\"http:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/usptopatft.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"usptopatft\" src=\"\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/usptopatft.png\" width=\"723\" height=\"306\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.freepatentsonline.com\/4633008.html\">FreePatentsOnline:<\/a><a href=\"http:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/freepatentsonline.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-258\" alt=\"freepatentsonline\" src=\"\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/freepatentsonline.png\" width=\"790\" height=\"753\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/freepatentsonline.png 790w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/freepatentsonline-300x285.png 300w\" sizes=\"(max-width: 790px) 100vw, 790px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>The format for these tables is briefly documented by the <a title=\"Green Book Documentation\" href=\"http:\/\/www.uspto.gov\/products\/PatentFullTextAPSGreenBook-Documentation.pdf\">USPTO<\/a> but the description raises as many questions as answers:<\/p>\n<ul>\n<li>Columns are delimited by one or more spaces&#8230; but a cell may contain spaces!<\/li>\n<li>An overly long cell may be split over multiple lines due the format being limited to 80 characters per line<\/li>\n<li>Where in the printed patent a cell spanned multiple rows it spans multiple lines in the format.<\/li>\n<\/ul>\n<p>As the format is based on the how the tables were printed perfect reproduction of the semantics of these tables appears impossible, but a good approximation can be achieved.<\/p>\n<p>After processing PatFetch produces:<a href=\"http:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/patfetch.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-270\" alt=\"patfetch\" src=\"\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/patfetch.png\" width=\"1047\" height=\"676\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/patfetch.png 1047w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/patfetch-300x193.png 300w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2012\/12\/patfetch-1024x661.png 1024w\" sizes=\"(max-width: 1047px) 100vw, 1047px\" \/><\/a><\/p>\n<p>Much better \ud83d\ude42<\/p>\n<p>(the colouring of Example 22 is due to &#8220;tertbutyl&#8221; being recognised as a misspelling of &#8220;tert-butyl&#8221;)<\/p>\n<p>The method broadly works by:<\/p>\n<ul>\n<li>Identifiying the header, body and footer<\/li>\n<li>Producing a putative table layout<\/li>\n<li>Splitting cells where a single space is determined to be a split point between two columns<\/li>\n<li>Merging cells that are determined to be a continuation of a previous cell<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tabular data in patents is a useful source of experimental data and chemical structures. USPTO patents are available back to 1976 in formats where tables are explicitly annotated. For more recent patents these are XML tables similar in structure to what would be expected in HTML. Unfortunately the format used from 1976-2000 is not quite &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2012\/12\/12\/making-sense-of-patent-tables\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Making Sense of Patent Tables<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/239"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=239"}],"version-history":[{"count":40,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/239\/revisions"}],"predecessor-version":[{"id":1465,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/239\/revisions\/1465"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=239"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=239"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=239"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}