Patently wrong – Tracing the origin of an unusual molecule in PubChem

Greg Landrum of Novartis posted a link to PubChem structure CID60140829 on Google Plus, with the comment:

This one is from a patent with the title “Apparatus and method for encoding and decoding block low density parity check codes with a variable coding rate”. I bet it’s the result of an overly zealous (and insufficiently error checked) image->structure conversion.

A good guess, but not quite right…

First of all, let’s look at the PubChem image for the deposited structure, SID143481705 (see right). Hmmm…

These structures are from the US Patent 20050283708 via SCRIPDB. For each patent the USPTO makes available a PDF, and very usefully for chemists, the ChemDraw files associated with the patent (along with MOL, and TIFF). NextMove Software’s PatFetch (bundled with LeadMine) makes it easy to extract the corresponding PNG, CDX and MOL files for a particular structure (it runs in a web browser, and you just click on an image to obtain the CDX file). In this case, the image corresponding to the structure is as follows:
Don’t ask me what it is, but I can confirm that it’s not a crossword.

But here’s the thing. If you download the CDX file and open it in ChemDraw…you get exactly this image. 🙂 In other words, the good people at the USPTO appear to use ChemDraw as a generic drawing tool, and in particular, seem to favour carbon-carbon bonds over the actual box or line tools. Actually, now that I see how useful a grid of carbon-carbon bonds can be to create a nice table, I think I might dump Excel for good too.

8 thoughts on “Patently wrong – Tracing the origin of an unusual molecule in PubChem”

  1. Reminds me of a project 30 years ago when we built a database of structures from dissertations published by the chemistry department of Erlangen University.

    By far the most common structure extracted from the associated CDX data was cyclobutane…

  2. Wild. I witnessed numerous instances of tables and even entire slide decks being created with nothing but ChemDraw.

    The image looks like Figure 23 of the patent:

    But I’m not so sure this is the USPTOs doing. If you give them a CDX file, I believe they publish it. But I don’t thing they generate cdx files.

    1. Well, at least in the past, the USPTO contracted out to a company to digitise the chemical diagrams. Maybe the company charges per CDX file?

      Regarding being a witness to such an atrocity, that makes you an accomplice. 🙂

  3. I have seen many examples of such issues….ChemDraw is used a lot to draw tables in patents. Markush structures where the Y results in Yttrium compounds. Markush structures with Alkyl chains marked as Al and resulting in Aluminum compounds. CDX files from patents are very dangerous and need a lot of filtering. We learned this from some of the mistakes WE made with them!!

