{"id":2892,"date":"2018-12-18T16:39:07","date_gmt":"2018-12-18T16:39:07","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=2892"},"modified":"2018-12-18T16:39:07","modified_gmt":"2018-12-18T16:39:07","slug":"putting-pubmed-peptides-in-pubchem","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2018\/12\/18\/putting-pubmed-peptides-in-pubchem\/","title":{"rendered":"Putting PubMed peptides in PubChem"},"content":{"rendered":"<p>Following on from the work described in the <a href=\"https:\/\/nextmovesoftware.com\/blog\/2018\/09\/27\/how-to-avoid-inventing-peptide-monomer-names\/\">last post<\/a>, I have put together a dataset of text-mined peptides which I&#8217;ve uploaded to PubChem. This involved extending our biopolymer grammar (which I use to textmine with <a href=\"https:\/\/www.nextmovesoftware.com\/leadmine\">LeadMine<\/a>) and improving the ability of <a href=\"https:\/\/www.nextmovesoftware.com\/sugarnsplice\">Sugar&#038;Splice<\/a> to interpret the text-mined entities.<\/p>\n<p>The dataset consists of 5350 unique peptides extracted from PubMed abstracts (up to 2017), comprising 8699 peptide line notation\/PubMed ID (PMID) pairs. Although the grammar identifies many more peptides, I&#8217;ve excluded them as they are either very short, appear to be fragments of a larger peptide, or are not currently interpreted by Sugar&#038;Splice.<\/p>\n<p><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/cbz.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/cbz.png\" alt=\"\" width=\"450\" height=\"89\" class=\"aligncenter size-full wp-image-2897\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/cbz.png 450w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/cbz-300x59.png 300w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/><\/a>The most popular peptide is &#8220;Cbz-Val-Ala-Asp-CH2F&#8221;, which I found in 296 abstracts in a variety of forms, e.g. &#8220;N-benzyloxycarbonyl-Val-Ala-Asp-fluoromethylketone&#8221; (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/?term=8628997\">PMID 8628997<\/a>), &#8220;Z-Val-Ala-Asp-fluoromethyl-ketone&#8221; (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/?term=10554885\">PMID 10554885<\/a>). This is an irreversible pan-caspase inhibitor, and appears to be used in studies of apoptosis. Not surprisingly, this was already in PubChem (<a href=\"https:\/\/pubchem.ncbi.nlm.nih.gov\/compound\/5497171\">CID 5497171<\/a>). Its Asp(OMe) form also ranks highly as the fourth most popular peptide line notation (88 abstracts).<\/p>\n<p><a href=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/PMID5120.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/PMID5120.png\" alt=\"\" width=\"501\" height=\"57\" class=\"aligncenter size-full wp-image-2908\" srcset=\"https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/PMID5120.png 501w, https:\/\/nextmovesoftware.com\/blog\/wp-content\/uploads\/2018\/12\/PMID5120-300x34.png 300w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a>At the other extreme is the peptide that appears as &#8220;Pro-Gly-Phe-Ser-Pro-Phe-Arg&#8221; in <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/5120\">PMID 5120<\/a>. This was not found in any other abstract, and is a new entry in PubChem despite the 42 years since its publication in <i>Biochemistry<\/i>. It can now be found in PubChem at <a href=\"https:\/\/pubchem.ncbi.nlm.nih.gov\/compound\/134824750\">CID 134824750<\/a>. If you go there and scroll down to &#8220;Depositor Provided PubMed Citations&#8221;, you will find the link to the <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/5120\">PubMed abstract<\/a>. This describes the action of a peptidase on various peptides. On the right-hand side under &#8220;Related Information&#8221;, if you click on &#8220;PubChem Compound&#8221; you will <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pccompound?LinkName=pubmed_pccompound&#038;from_uid=5120\">see listed<\/a> two peptides mentioned in the text that were extracted by LeadMine, as well as an entry for bradykinin.<\/p>\n<p>To see the complete NextMove Software biologics dataset, click on &#8220;8699 Live Substances&#8221; on the <a href=\"https:\/\/pubchem.ncbi.nlm.nih.gov\/source\/NextMove%20Biologics\">source page<\/a>. For more information on using LeadMine to mine PubMed abstracts, see <a href=\"https:\/\/nextmovesoftware.com\/blog\/2018\/04\/09\/textmining-pubmed-abstracts-with-leadmine\/\">this post<\/a>. Obviously, there is nothing here specific to PubMed Abstracts &#8211; other interesting datasets would be the patent literature or internal company documents.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Following on from the work described in the last post, I have put together a dataset of text-mined peptides which I&#8217;ve uploaded to PubChem. This involved extending our biopolymer grammar (which I use to textmine with LeadMine) and improving the ability of Sugar&#038;Splice to interpret the text-mined entities. The dataset consists of 5350 unique peptides &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2018\/12\/18\/putting-pubmed-peptides-in-pubchem\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Putting PubMed peptides in PubChem<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2892"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=2892"}],"version-history":[{"count":17,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2892\/revisions"}],"predecessor-version":[{"id":2912,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2892\/revisions\/2912"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=2892"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=2892"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=2892"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}