NextMove Software
  • Home
  • Blog
  • News
  • Talks
  • Events
  • About Us
  • Careers
  • ELNs & Reactions
  • Patents/TextMining
  • Biologics
  • Similarity & Search
 
General Inquiries: info@nextmovesoftware.com
Support: support@nextmovesoftware.com

Arthor

Version 3.6.1 [Mar 2023]

High-Performance Chemical Database Searching

NextMove Software's Arthor technology (named after Merlin's apprentice) pushes the performance limits of chemical database search on current computer hardware. Building upon NextMove Software's Patsy chemical pattern matching engine, Arthor easily outperforms current chemical cartridges, scaling to handle the hundreds of millions of compounds found in next generation chemical databases.

Substructure Searching

Traditional chemical database search engines rely on successful fingerprint screening to achieve their high performance substructure search. This requirement means that relatively broad queries that have poor fingerprint screening have significantly worse performance, adversely affecting average and worst case search times. By tackling the computationally intensive SMARTS matching phase of a search, Arthor dramatically improves worst-case (and therefore average) search times, achieving the real-time performance bounds required by interactive users.

Similarity Searching

Similarity searches using fingerprint-based Tanimoto scores typically rely on a popcount sorted index to bound and improve search times. Unfortunately the popular search bounds described by Swamidass and Baldi (2007) are only effective for denser path-based fingerprints. Sparser circular fingerprints (e.g. ECFP) see little or no benefit from bounding using these bounds and other techniques are required to improve search speed. Arthor uses on-the-fly code generation to create query specific machine instructions and a linear-time sort algorithm is used to rank and page results. Databases containing hundreds of millions hits can be interactively queried in real time.


More info:
  • John Mayfield (nĂ© May) and Roger Sayle. Substructure Search Face-off, CCNM, May 2015. PDF
  • Roger Sayle. Recent Advances in Chemical and Biological Search Systems: Evolution vs. Revolution, ICCS 2018, May 2018. PDF
  • John Mayfield. PAINS in the butt. CCNM. Feb 2019 PDF
  • John Mayfield and Roger Sayle. The Secrets of Fast SMARTS Matching. 8th Joint Sheffield Conference on Chemoinformatics. June 2019 PDF
Arthor provides fast state-of-the-art substructure and chemical similarity search capabilities for ultra-large databases of hundreds of millions of compounds, using SMARTS optimization, Just-In-Time compilation and/or GPUs.
CaffeineFix is used to rapidly match chemical names or terms against a dictionary or grammar (e.g. a grammar for IUPAC names). As well as use in text-mining, it can be used to provide autocomplete functionality and spell-correction.
Casandra is a server for delivering real time safety warnings of experimental hazards straight to the pharmaceutical electronic laboratory notebooks (ELNs).

HazELNut is a suite of tools used to extract, normalize and analyse information in Electronic Lab Notebooks (ELNs). This can be used to implement a search interface, find/eliminate duplicates, find similar reactions and so on.
LeadMine extracts chemical names and terms from text. It incorporates NextMove's CaffeineFix technology to find terms that match appropriate dictionaries or grammars. It has enhanced functionality to handle the patent literature.
Matsy is a set of tools for creating and analysing Matched Molecular Series (the general form of Matched Molecular Pairs). In particular, it can be used to suggest what compound to make next in a Medicinal Chemistry program.
MPSearch rapidly searches a database to find Matched Pairs related to a query molecule. This type of search is used to explore previous medicinal chemistry strategies.
NameRXN is used to classify and name reactions. It is particular useful in the context of ELN analysis but also as a plugin to chemical drawing software. NameRXN builds on NextMove Software's Patsy technology.
Patsy is used to speed up SMARTS pattern matching by creating optimized SMARTS patterns or source code. Speed gains are particularly large when multiple SMARTS patterns are matched against a single structure.
Pistachio is a reaction dataset browser providing loading, querying, and analytics of chemical reactions. With over 9 million chemical reactions extracted from US & EPO patents, it demonstrates an AI interface to faceted (structure) search
SmallWorld is an index of chemical space based on more than 230 billion molecular substructures. It can be used to measure similarity based on graph-edit distance, find the MCS of two or more molecules, analyse HTS results and much more.
Sugar & Splice can be used to perceive and depict biopolymer structure. It makes it easy to interconvert between small-molecule representations (e.g. SMILES, MOL) and biopolymer representations (HELM, IUPAC line notation).
©2023 NextMove Software. All rights reserved.