A novel procedure towards accurate estimation of room temperature utilising the patent literature

When chemists report that a reaction took place at room temperature, what exactly do they mean? Clearly the best way to approach this problem is to textmine reaction conditions from all US patent applications since 2001 and thus infer room temperature.

As previously discussed, Daniel has extracted reactions from US patents. The textmining software that Daniel has been working on, LeadMine, now has the ability to extract reaction conditions. Considering just those reactions where the temperature is explicitly given (as opposed to specified as “room temperature” or some such), the following graph is obtained (this is interactive – use the toolbar to zoom/pan; data included for the interval -273 to 800 °C):

You will immediately notice a preference among chemists for temperatures that are multiples of 5, and in particular, multiples of 10. In our determination of likely room temperature, such values are probably not useful. Once we remove them, the remaining data is as follows:

If you zoom in around the 20-25 degree area, we can infer that room temperature is 23°C or thereabouts – QED. Other peaks in the plot indicate particular reaction conditions that are common in organic chemistry: for example, both 78°C and -78°C are favourites (remember why?).

This analysis of temperature data was based on data presented by Daniel at the Fall ACS in San Francisco. His talk, “Chemistry and reactions from non-US patents”, covered:

  • Coverage of European vs United States patents
  • For novel compounds, which patent authority published first and how long was the lag
  • Trends in gene/protein mentions over time
  • Melting/boiling point extraction
  • Analysis of text mined reactions (yields vs scale, grouping them into synthetic routes, trends in solvent usage)