Given a set of chemical structures as SMILES, how can you visualise the substructure/superstructure relationship between them?
For example, the following picture shows the relationship between members of a set of structures containing several benzene derivatives and monosaccharides:
This was created using the following Python script, which iteratively looks for the structure that matches the largest number of molecules in the set, building up a tree as it does so. The output is the tree in a form suitable for depiction using Graphviz’s dot program. The Python script uses Open Babel, but could easily be adapted for other toolkits.
There are several tools out there for batch conversion of chemical file formats. However, you may not have access to those tools or else a particular file format may not be supported. Sometimes your only option is to open each file in the original software, and export it as a more useful file format. When dealing with a large number of files, this can be quite tedious or just impossible.
Recently, faced with the challenge of converting 100s of ChemSketch files to Mol files, myself and Daniel looked up the literature (i.e. googled the web) and found that Rajarshi Guha has described how to automate ChemDraw using the Python module WATSUP. Unfortunately this no longer seems to be available. Rich Apodaca has described an alternative approach using AppleScript but this does not work so well on Windows.
Instead we used a Python library, pywinauto, to automate the process of using the ChemSketch GUI to File/Open the ChemSketch file and File/Export a Mol file. The script is shown below. The trickiest part is choosing the delays so that the process runs as fast as possible without failing (due to a ChemSketch operation taking longer than expected):