ChemSpider and CaffeineFix

CaffeineFix Chemspider IntegrationIn collaboration with ChemSpider, CaffeineFix technology is now being used to make suggestions whenever a ChemSpider user’s query doesn’t match a synonym in ChemSpider.

CaffeineFix enables the correction of text to match entries in a dictionary or those expressed by a grammar/regular expression.

The system considers 4 correction operations: insertions, deletions, substitutions and transpositions. To improve the ability to distinguish likely from unlikely errors, the cost of these operations is parameterised by the context they are found in.

For example pyrole is one edit away from both pyrrole and pyrone but the correction to pyrrole is used as it is far more likely.

Peptide informatics at Bio-IT World

Congratulations to the Pistoia Alliance for winning this year’s Bio-IT World Best Practices Award for Informatics for the development of HELM, the Hierarchical Editing Language for Macromolecules. As NextMove’s Sugar & Splice was cited in the submission as supporting HELM, we can claim a small part in helping them achieve this recognition.

Handling and interconverting peptide representations was also the topic of Lisa Sach-Peltason’s talk, “Peptide Informatics – Bridging the gap between small-molecule and large-molecule systems“. She described the peptide registration system that Roche have developed, in which Sugar & Splice has played a major role. One of the many challenges is that although non-standard amino acids make up 7% by frequency of amino acids used by Roche, over 88% of their peptides contain at least one non-standard amino acid.


And finally Roger presented a poster around the same topic, on the challenges of handling peptide line notations for biologics registration and patent filings:

Qualitative structure-activity at UK-QSAR

I’ve been wondering whether my Matched Series work falls under QSAR, as it does not use a numerical model nor does it make absolute activity predictions. Everything it does is based on relative activity/property orders. So perhaps this is not a Quantitative Structure-Activity Relationship, but rather a Qualitative Structure-Activity Relationship (let’s call it QualSAR for short). Is this a useful distinction? Are there additional areas of cheminformatics that would fall under this (MMPA springs to mind)?

But until the whole QualSAR field takes off as a separate discipline, I guess I’m going to continue to file my work under QSAR. Earlier this year, I was invited to contribute a brief description of the algorithm to the current UK-QSAR newsletter (direct link here).

And last week I presented the Matsy algorithm at UK-QSAR, hosted by Eli Lilly. Here’s the poster I presented, which summarises the recent paper and also has some additional examples of the sort of predictions generated (better quality PDF available here).