Half a million molecules on PubChem have just had a new section added entitled “Biologic Description”. This includes a depiction of the oligomer structure and several line notations including IUPAC condensed and HELM, all of which were generated using Sugar&Splice through perception from the all-atom representation. Since the original development of Sugar&Splice was as part of a collaboration with PubChem, it is great to see these annotations finally appearing as part of this important resource.
Previous blog posts have shown examples of the sorts of peptide depictions that Sugar&Splice can generate. Here is how one appears on PubChem (CID118753634).
Sugar&Splice also supports CFG-style depiction of oligosaccharides (CID71297593):
As ever, there is always more work to be done on improving depictions and perception, and we look forward to further increasing the coverage of biologics in PubChem over the coming months.
NextMove Software is a partner in the Horizon 2020 MSC ITN EID BigChem project. Ten PhD positions are available in the area of “Big Data Analysis in Chemistry”, all of which offer a mix of time spent in academia and with industrial partners. The following position involves a placement with us for 3 months:
ESR2: Computational compound profiling by large-scale mining of pharmaceutical data
This position is announced within the BIGCHEM project. Read about the carrier development perspectives.
Check eligibility rules as well as recruitment details and apply for this position before 20 March 2016.
Objectives: In the life-sciences, data is being generated and published at unprecedented rates. This wealth of data provides unique opportunities to get insights into the mechanisms of disease and to identify starting points for treatments. At the same time, the size, complexity and heterogeneity of available data sets pose substantial challenges for computational analysis and design.
Aim of this project is to address the challenges posed by large, heterogeneous, incomplete, and noisy datasets. Specifically, we aim to:
- apply machine learning technologies to derive predictive QSAR models from real-world life science data sets;
- analyze trade-offs between training data accuracy and quantity, in particular, in the context of high-throughput screening data;
- develop and apply methods to systematically account for noise and experimental errors in the search for active compounds.
Planned secondments: Three months stay in NextMove to work with data automatically extracted from patents using unique technology of company. Three months in HMGU to collect data from public databases such as ChEMBL, OCHEM, PubChem.
Employment: 36 months total, including Boehringer Ingelheim, Biberach, Germany (months 1-18) and the University of Bonn, Germany (months 19-36).
Enrollment in PhD program: The ESR will be supervised by Prof. J. Bajorath from the University of Bonn and by supervisors from Boehringer Ingelheim.
Salary details are described here.
Boehringer Ingelheim GmbH & Co KG & University of Bonn
Years of experience:
4 years or less (see eligibility rules)
Required general skills:
Have experience in data mining and statistics. Good knowledge on medicinal chemistry is a plus.
Required IT skills:
Good knowledge on programming in mainstream computer languages and UNIX/LINUX operating system.
Required degree level:
Master’s degree in Chemistry, Bioinformatics, Medicinal Chemistry, Informatics/Data Science or closely related fields.