Structure Conversion
An important functionality of HQS Molecules
is the conversion of two-dimensional molecular structure representations into three-dimensional geometries, as shown here for the example of the glucose molecule.
Simple 2D to 3D Conversion
Input: SMILES strings and Molfiles are supported as input for the functions smiles_to_molecule
and molfile_to_molecule
, respectively.
Output: Both functions return a Molecule
object containing three-dimensional atomic coordinates. In addition, the returned object contains the overall molecular charge, which is always included in a Molfile or a SMILES string. The multiplicity field is set to None
, as it cannot be derived unambiguously from the input.
The structure conversion employs the respective functionality of RDKit
as the first choice, and uses Open Babel
as a backup in case that structure conversion with RDKit fails. After generating a three-dimensional structure, a bonding graph is determined using distance criteria and used to verify the generated structure against the input. If the composition and the bonding graphs do not match, then the structure is rejected.
If the input structure is stored in a molfile named "my_molecule.mol"
, then the conversion is performed by calling:
>>> from hqs_molecules import molfile_to_molecule
>>> mol = molfile_to_molecule("my_molecule.mol")
Likewise, it is possible to perform a conversion of a molecular structure with a SMILES string - for example, one obtained from PubChem:
>>> from hqs_molecules import PubChem, smiles_to_molecule
>>> pc = PubChem.from_name("propane")
>>> mol = smiles_to_molecule(pc.smiles)
An optional check can be carried out with either of the conversion functions by supplying a molecular formula as an argument. The conversion fails if the input does not match the provided formula. In that case, the formula needs to be represented as a MolecularFormula
object.
>>> from hqs_molecules import MolecularFormula, PubChem, smiles_to_molecule
>>> pc = PubChem.from_name("propane")
>>> # succeeds
>>> mol = smiles_to_molecule(pc.smiles, formula=pc.formula)
>>> # raises an exception
>>> mol = smiles_to_molecule(pc.smiles, formula=MolecularFormula.from_str("C3H7-"))
Utilities for RDKit
The HQS Molecules
module includes convenience functions to create RDKit Mol
objects from SMILES strings or Molfiles. These objects represent molecular information within the RDKit package.
Both the smiles_to_rdkit
and molfile_to_rdkit
functions accept an argument addHs
. By default, it is set to True
, causing explicit hydrogen atoms to be added in the generated object. Setting addHs = False
suppresses the addition of explicit hydrogens; only hydrogens that were already explicitly represented within a Molfile are retained.
>>> from hqs_molecules import smiles_to_rdkit
>>> # The object generated contains 11 atoms.
>>> rdkit_mol = smiles_to_rdkit("CCC")
>>> # The object generated contains 3 atoms.
>>> rdkit_mol = smiles_to_rdkit("CCC", addHs=False)
Expert Usage
The functionalities described in the remainder of this section are only intended for expert usage.
An RDKit Mol
object can be converted to a three-dimensional structure by passing it to the function rdkit_to_molecule
. It is a low-level function that calls RDKit without resorting to Open Babel as a backup. Nonetheless, it performs a consistency check for the generated structure. If the RDKit Mol
object was created without the addition of explicit hydrogens (addHs = False
), this conversion may fail due to a composition mismatch.
The low-level functions to perform structure conversion using only Open Babel are available via smiles_to_molecule_obabel
and molfile_to_molecule_obabel
. These functions require a SMILES string or a Molfile as their input, respectively. A separate consistency check of the generated structure is also performed here.