Molecular Name Lookup
HQS Molecules
provides API access to the PubChem database, permitting name-to-structure and structure-to-name searches directly via Python scripts.
IMPORTANT: If you use the PubChem interface, your data will be sent over the internet to the PubChem servers.
Name-to-Structure Search
Requests to PubChem are made via the PubChem
data class, which stores the data retrieved after completion of each request. In the following example, a request is made using the molecule name:
>>> from hqs_molecules import PubChem
>>> pc = PubChem.from_name("2-methylprop-1-ene")
>>> print(pc.name)
Isobutylene
>>> print(pc.cid)
8255
>>> print(pc.formula)
C4H8
>>> print(pc.smiles)
CC(=C)C
>>>
As shown above, the data retrieved is stored in the attributes
name
for the compound name,smiles
for the SMILES string,cid
for the PubChem compound ID, andformula
for the molecular formula (as aMolecularFormula
object).
Note: the name stored in the name
attribute may differ from the argument supplied for the request: the most entries for a compound on PubChem list multiple synonymous names, but the name
attribute contains the representative title name selected on PubChem.
Structure-to-Name Search
Likewise, a reverse search can be performed using a SMILES string:
>>> pc = PubChem.from_smiles("COC1=C(C=CC(=C1)C=O)O")
>>> print(pc.name)
Vanillin
>>>
Search by PubChem Compound Identifier
Finally, requests can be made using the PubChem compound identifier:
>>> pc = PubChem.from_cid(145742)
>>> print(pc.name)
Proline
>>> print(pc.smiles)
C1C[C@H](NC1)C(=O)O
>>>
An important feature of the PubChem
class is that it verifies the depositor of individual data items retrieved. Data is only made available if PubChem is listed as the data source.
As shown in the section on 2D to 3D structure conversion, a SMILES string can be used to generate a three-dimensional representation.