A previous post (see the slidedeck from slide 40) described some of the work we have done on the development of fast substructure search, a project code-named Arthor. At the time, it ran about two orders of magnitude faster than any of the other programs benchmarked. Such speed makes possible interactive searches of large databases. That’s pretty obvious, and so rather than discuss that here, here’s something else that’s a bit more novel: interactive substructure search of moderately sized datasets, entirely client-side in the browser.
It is important to note this is not the first time that substructure search has been implemented entirely in the browser: Peter Ertl and co. developed the Wikipedia Structure Explorer which searches almost 15K structures from Wikipedia using the Actelion Java library compiled to JavaScript. However, with Arthor (also compiled to JavaScript), it is possible to search the whole of ChEMBL22_1, 1.68 million molecules, in the browser. It even works on my mid-range phone (Moto G 3rd gen, 2GB RAM), although there it is limited by memory constraints to 1.0 million molecules.
Time for the timings. Note that times quoted for the native code do not include the use of a fingerprint screen to be like-for-like with the JavaScript, where is not possible to use fingerprints for the whole of ChEMBL due to RAM constraints. The native and JavaScript times were measured on the same machine (Core i7 6900K CPU, 3.20GHz), and all are times to find the total number of hits (rather than the first 10 or 100 or whatever) using a single-thread. Phone times are for 1.0 million molecules. All times are in ms unless otherwise stated.
1.00M mols | ||||
---|---|---|---|---|
Query | Hits | Native | JavaScript | Phone |
c1ccccc1 | 1420663 | 419 | 663 | 3.24s |
Br | 75132 | 113 | 197 | 819 |
CCO | 754842 | 230 | 368 | 1.32s |
OOO | 1 | 99 | 300 | 1.12s |
[X5] | 160 | 102 | 186 | 817 |
Imagine a future where the computationally expensive step of substructure searching no longer requires a server, but is done client-side. Impossible, or only a matter of time?