A recent article by Matt Luchette over at the Bio-IT World website gives an overview of the project and explains why existing enterprise search tools need to be made chemically-aware:
What they realized, though, was that the search requirements for their scientists were different than those of a standard text search engine…Most importantly, the engineers wanted the program to search the company’s entire library of electronic lab notebooks and recognize chemicals through their various generic and scientific names, as well as drawings and substructures.
…
Socrates Search, as the project came to be known, was made by combining a number of commercial search programs… Autonomy’s text search and ChemAxon’s JChem Oracle cartridge, which allows users to search for chemicals with their various names or structure, were already a part of GSKSearch, but now had added capabilities, including improved text analytics and data extraction with software from NextMove, and web integration with Microsoft’s C# ASP.NET libraries. The result was a new program that could search through the company’s archived electronic lab notebooks and recognize a vast library of scientific terms, bringing once inaccessible data to scientists’ fingertips.“Searching for Gold: GSK’s New Search Program that Saved Them Millions“, Matt Luchette, June 2013, Bio-IT World website