Important Data Structures
This section provides a short description of important data structures used for input and output
in functions provided by HQS Molecules
. Many of these classes are implemented as Pydantic models. Pydantic provides data validation and parsing using Python type annotations. By leveraging Pydantic, the package ensures that input and output data are correctly formatted and validated, reducing the likelihood of errors and improving the robustness of the software.
The objects described in this section are:
MolecularGeometry
andMolecule
representing molecules in 3D,- Molecular formulas
MolecularFormula
, PubChem
dataclass for the data from PubChem,Trajectory
andMolecularFrequencies
representing output of quantum-mechanical calculations that cannot be stored in elementary data types.ConformerEnsemble
storing results of a conformer search: it is obtained by combining an initial search performed by CREST with a subsequent refinement of the conformer ensemble using techniques developed at HQS.
Representing Molecules in 3D
Representing Atomic Positions
The MolecularGeometry
class contains atomic positions (in Å) and chemical element symbols. Objects of this type are commonly generated in HQS Molecules
by reading an XYZ file. However, it lacks information on charge and spin multiplicity, which are typically needed for quantum-chemical calculations.
Important attributes of MolecularGeometry
objects are
natoms
(representing the number of atoms),symbols
(returning a list of chemical element symbols),- and
positions
(returning an N × 3 array of atomic positions).
Inspection of the class reveals further methods to update atomic positions and create copies of molecules, possibly with updated positions.
>>> from hqs_molecules import MolecularGeometry
>>> help(MolecularGeometry)
Internally, atoms are represented by a list of Atom
objects. These are defined as named tuples containing the element symbol and the position, one tuple per atom. Note that this feature permits the atoms
attribute to be used directly as input for PySCF calculations, as shown in the example below.
>>> from hqs_molecules import smiles_to_molecule
>>> from pyscf.gto import Mole
>>> hqs_mol = smiles_to_molecule("C=C")
>>> pyscf_mol = Mole(atom=hqs_mol.atoms)
Molecules with Charge and Spin
Molecule
is one of the most important classes in the HQS Molecules
package. It is implemented as a subclass of MolecularGeometry
, with the addition of charge
and multiplicity
fields. Objects of Molecule
type are commonly returned by functions performing 2D to 3D structure conversion. An additional attribute is nelectrons
, containing the number of electrons corresponding to the molecular composition and charge.
Molecular formulas (such as H2O or OH−) and molecular structure representations (such as SMILES strings or Molfiles) always contain the total molecular charge, explicitly or implicitly. Therefore, it is vital to preserve the total charge together with three-dimensional representations of molecular structures.
In addition to the charge, quantum-chemical calculations usually also require a specification of the spin multiplicity. Unlike the charge, it is not necessarily straightforward to infer from a molecular structure. Therefore, None
is permitted as a value for the field. Indeed, functions such as smiles_to_molecule
or molfile_to_molecule
never set the field to an integer value themselves.
Knowing the value of the spin multiplicity, the value can be set and validated for a Molecule
object by using the set_multiplicity
method.
>>> from hqs_molecules import smiles_to_molecule
>>> mol = smiles_to_molecule("CCO")
>>> print(mol.multiplicity)
None
>>> mol.set_multiplicity(1)
Molecule(atoms=[...], charge=0, multiplicity=1)
>>> print(mol.multiplicity)
1
Since the set_multiplicity
method returns the object itself in addition to modifying it, calls such as mol = smiles_to_molecule("CCO").set_multiplicity(1)
are possible.
Objects of type MolecularGeometry
can be converted to Molecule
instances using the to_molecule
method, with the charge being mandatory and the multiplicity optional.
Molecular Formulas
Within HQS Molecules
, molecular formulas are represented by MolecularFormula
objects containing the elemental composition and the total charge. For example, formulas from PubChem are converted into this format:
>>> from hqs_molecules import PubChem
>>> pc = PubChem.from_name("Bicarbonate")
>>> pc.formula
MolecularFormula(natoms={'C': 1, 'H': 1, 'O': 3}, charge=-1)
>>>
The class implements __str__
as a conversion of the formula to a string in Hill notation:
>>> f"{pc.formula}"
'CHO3-'
Users can easily create molecular formulas from a string input.
>>> from hqs_molecules import MolecularFormula
>>> formula = MolecularFormula.from_str("MnO4-")
>>> formula
MolecularFormula(natoms={'Mn': 1, 'O': 4}, charge=-1)
>>>
The from_str
constructor can handle some degree of complexity (for example, "CH3COOH"
is interpreted equivalently to "C2H4O2"
), but it cannot process arbitrarily complicated semi-structural formulas. Note that isomers cannot be distinguished, as they have identical elemental compositions.
Data from PubChem
Results from PubChem queries are stored within instances of the PubChem
class. Unlike most other classes described in this section, it is implemented as a dataclass
and not as a Pydantic model.
In practical use, instances of this class would normally be created using methods such as from_name
or from_smiles
. The retrieved data is stored in the fields of the class. A description can be found by executing:
>>> from hqs_molecules import PubChem
>>> help(PubChem)
Output of Quantum-Mechanical Calculations
Molecular Trajectories
Instances of the Trajectory
class, as returned by geometry optimizations with xTB, contain two fields:
- a list of
Molecule
objects that is labeledstructures
, - and the energies of each structure in a list labeled
energies
.
Convenience attributes are implemented for the following properties:
- Obtaining the number of structures through the
length
attribute. - Obtaining the last structure and its energy via the attributes
last
andlast_energy
, respectively. - Identifying the structure with the lowest energy and accessing the structure, its energy and its position in the trajectory with the attributes
lowest
,lowest_energy
andlowest_step
, respectively.
Vibrational Frequencies
A generic representation of computed vibrational frequencies and basic thermochemical properties is contained within the class MolecularFrequencies
. Instances contain
- the total electronic energy (in the
total_energy
field) - and a list of vibrational frequencies (in the field
frequencies
).
Note that the latter are represented as (real) floating-point numbers; by convention, imaginary frequencies are represented as negative numbers.
Additionally, the MolecularFrequencies
class defines
- a field
thermochem
with a list ofBasicThermochemistry
objects.
In addition to vibrational frequencies, programs such as xTB can calculate thermodynamic contributions via a rigid rotor and harmonic oscillator approximation. These contributions are temperature-dependent (while harmonic frequencies are not). Therefore, thermochemical corrections are stored in a list with one item per temperature value. Each BasicThermochemistry
object contains fields enthalpy
, entropy
, gibbs_energy
, and temperature
(representing the temperature used to evaluate the aforementioned properties). Since these quantities are interdependent, only enthalpy, entropy and temperature are stored explicitly, while the Gibbs energy is recomputed upon being accessed.
Note that the Hessian matrix itself is not represented in the MolecularFrequencies
class.
Conformer Search Results
Structures and energies of conformers determined via CREST are stored by HQS Molecules
in a class ConformerEnsemble
, which contains a list of Conformer
objects.
Note that the grouping of conformer and rotamer structures as determined in the CREST calculation is ignored, and all the structures are regrouped by our own procedure, as described in the section on conformer search.
Further information on the attributes of the respective classes can be accessed from within Python:
>>> from hqs_molecules import Conformer, ConformerEnsemble
>>> help(Conformer)
>>> help(ConformerEnsemble)