Pistachio

[Version 2026-01-06 (2025Q4)]

Reaction Data, Querying and Analytics

Pistachio is a reaction dataset and interface providing loading, querying, and analytics of chemical reactions. Pistachio builds on and extends existing solutions from NextMove Software to enrich reaction data and provide powerful query capabilities.

Figure 1. Pistachio Architecture

Reaction Data Reaction data can be obtained from an ELN export (HazELNut), external dataset (Reaxys), or mined from journals or patents. Patents provide a large accessible collection of documents for mining and hence are used for demonstration purposes. Data is mined from documents in three ways (Fig. 1). Patent Reaction Extraction uses LeadMine and ChemicalTagger to extract reactions and physical quantities from experimental paragraphs[1]. Indigo atom-mapping is then used to filter out suspect reactions and is a major bottle neck. Praline reads ChemDraw CDX files supplied in the U.S. Patents converting and interpreting exemplified reactions and schemes. LeadMine is used to create tables of bibliography data (author, document codes) and diseases (MeSH terms) from title and the claims section. These datasets are merged into a JSON file with full reaction details and a denormalised table for indexing in PostgresSQL.

Figure 2. Query Tagging

Query Handling Pistachio queries are input in an omnibox, the text is parsed using LeadMine and an expression tree built, the expression is then turned into a SQL query. The following basic data types are supported:

SMILES
SMARTS
Trivial Name
Line Formula
Systematic Name

SMARTS
Reaction Type (NameRxn)
Yield

Affiliation (Assignee)
Author (Inventor)
Publication Date
Document Name (Parent No.)
Document Codes (IPC)

Disease Terms

The compound types can be further constrained by component role (e.g. product) and search type (e.g. substructure, synthesis). Logical operators (AND, OR, NOT) can be used between terms and grouped with parenthesis, when absent (Fig. 2) implicit AND is implied.

The following video demonstrates the querys and results in real time

CaffeineFix is used to rapidly match chemical names or terms against a dictionary or grammar (e.g. a grammar for IUPAC names). As well as use in text-mining, it can be used to provide autocomplete functionality and spell-correction.

Casandra is a server for delivering real time safety warnings of experimental hazards straight to the pharmaceutical electronic laboratory notebooks (ELNs).

HazELNut is a suite of tools used to extract, normalize and analyse information in Electronic Lab Notebooks (ELNs). This can be used to implement a search interface, find/eliminate duplicates, find similar reactions and so on.

LeadMine extracts chemical names and terms from text. It incorporates NextMove's CaffeineFix technology to find terms that match appropriate dictionaries or grammars. It has enhanced functionality to handle the patent literature.

Matsy is a set of tools for creating and analysing Matched Molecular Series (the general form of Matched Molecular Pairs). In particular, it can be used to suggest what compound to make next in a Medicinal Chemistry program.

MPSearch rapidly searches a database to find Matched Pairs related to a query molecule. This type of search is used to explore previous medicinal chemistry strategies.

NameRXN is used to classify and name reactions. It is particular useful in the context of ELN analysis but also as a plugin to chemical drawing software. NameRXN builds on NextMove Software's Patsy technology.

Patsy is used to speed up SMARTS pattern matching by creating optimized SMARTS patterns or source code. Speed gains are particularly large when multiple SMARTS patterns are matched against a single structure.

Pistachio is a reaction dataset browser providing loading, querying, and analytics of chemical reactions. With over 21 million chemical reactions extracted from US & EPO patents, it demonstrates an AI interface to faceted (structure) search

SmallWorld is an index of chemical space based on more than 230 billion molecular substructures. It can be used to measure similarity based on graph-edit distance, find the MCS of two or more molecules, analyse HTS results and much more.

Sugar & Splice can be used to perceive and depict biopolymer structure. It makes it easy to interconvert between small-molecule representations (e.g. SMILES, MOL) and biopolymer representations (HELM, IUPAC line notation).

General Inquiries: info@nextmovesoftware.com Support: support@nextmovesoftware.com

Pistachio

Reaction Data, Querying and Analytics

General Inquiries: info@nextmovesoftware.com
Support: support@nextmovesoftware.com