Natural Language Processing Approach to Extract Compound Information from PubChem
- DOI
- 10.2991/978-94-6463-294-1_6How to use a DOI?
- Keywords
- Natural language processing; SMILES; PubChem; Compounds; Properties; Python
- Abstract
PubChem is one of the largest and most comprehensive databases of its kind containing information on millions of chemical compounds including their structures properties and biological activities. The Natural Language Processing (NLP) approach to extract compound information from PubChem has several advantages including improved accuracy and efficiency compared to manual methods and the ability to handle large amounts of data. NLP plays a significant role in extracting compound information from PubChem by enabling the processing of unstructured and semi-structured text data and by allowing for the identification of chemical compound names and the extraction of relevant information from text data. Simplified Molecular Input Line Entry Specification (SMILES) representations are also used in computational chemistry and drug discovery where they can be used to predict properties of compounds such as their stability reactivity and toxicity. This information is then used by researchers to design and optimize new drugs and chemical compounds. In this work we have extracted the compound information from PubChem using natural language processing can be approached in several steps they are Define the target information Data acquisition Text pre-processing Named entity recognition Relation extraction Entity linking Output generation. The results of a natural language processing approach to extract compound information from PubChem have the potential to greatly aid research efforts in chemistry pharmacology and other related fields.
In conclusion, SMILES representations are a powerful tool for identifying chemical compounds. By representing the structure of a chemical compound as a string of characters, SMILES representations make it possible to process and analyze chemical compounds using computers, enabling scientists and researchers to make new discoveries and advancements in the field of chemistry.
- Copyright
- © 2023 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Rehan Khan AU - Preenon Bagchi AU - Krutanjali Patil PY - 2023 DA - 2023/11/17 TI - Natural Language Processing Approach to Extract Compound Information from PubChem BT - Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023) PB - Atlantis Press SP - 64 EP - 71 SN - 2468-5739 UR - https://doi.org/10.2991/978-94-6463-294-1_6 DO - 10.2991/978-94-6463-294-1_6 ID - Khan2023 ER -