June 2015 – NextMove Software

At the recent Cambridge Cheminformatics Network Meeting (CCNM) we presented a performance benchmark of substructure searching tools using the same queries, target dataset, and hardware. Whilst many tools publish figures for isolated benchmarks, the use of different query sets and variations in target database size makes it impossible to determine how tools compare to each other.

The talk compared the performance of various tools and offers insight in to the performance characteristics.

A question was asked at the talk as to whether the slowest queries were always the same. As expected there is some correlation (benzene is always bad) but there are some rather dramatic differences within and between tools. For example, the time taken to query Anthracene or Zinc varies with some tools finding Anthracene hits faster (marked as <) and others finding Zinc faster (marked as >).

The rank of slowest queries (per tool) is provided as a guide to how many queries took more time than listed here.

	Anthracene			Zinc
Tool	Query Time (s)	Rank (slow)		Query Time (s)	Rank (slow)
arthor	2.254	3	>	0.357	2602
arthor+fp	0.022	285	>	0.001	1667
rdcart	0.698	794	<	202	4
rdlucene	27.126	566	>	23.87	600
pgchem	28.231	138	>	18.181	197
mychem	48.289	108	>	34.145	159
fastsearch	396	99	>	285	126
bingo-nosql	0.448	451	<	1.311	260
bingo-pgsql	0.392	638	>	0.060	1228
tripod-ss	21.797	350	<	1441	18
orchem	27.075	906	>	0.721	2390

As promised the query and target ids are available: here.

If this is an area of interest to you feel free to get in touch.

Month: June 2015

PubChem peptide depictions: Part 2

Substructure Search Face-off: Are the slowest queries the same between tools?