Introduction
The HQS NMR Tool is the first package in the HQStage module HQS Spectrum Tools.
With the HQS NMR Tool you can calculate and analyze NMR spectra for molecules or other spin systems.
The tool is a Python-based library with a python API
and uses RUST and C++ components for performance critical calculations.
Our NMR spectrum solvers are designed for both speed and precision. They are based on a very efficient implementation in the frequency domain. With a variety of solver strategies at your disposal, you can leverage symmetries and clustering techniques either fully automated or customized according to your specifications. This flexibility allows you to tailor your approach based on the specific characteristics of your system, optimizing performance without sacrificing accuracy.
The HQS NMR Tool is also an integral part of HQSpectrum, our end-to-end solution for NMR Spectra analysis. If you are interested in using this cloud service please request access by sending an email to hqspectrum@quantumsimulations.de.
If you are interested in the theory behind NMR or the math of how to calculate NMR spectra, take a look at the background chapter.
Getting started
To use the python API
follow the installation steps outlined here.
For some of the functionalities of the HQS NMR Tool (especially the provided example NMR parameters), the RDKit cheminformatics package is required. It is strongly recommended to install it via the conda-forge channel to ensure the access to its latest version. Additionally it
is highly recommended to install the Intel oneAPI Math-Kernel-Library (MKL) as otherwise performance will be critically impaired.
However, these packages have to be installed manually. The easiest way to install these libraries is running the following command in an active conda
environment:
conda install rdkit mkl
If you do not have conda available you can also install the packages via pip. This is not recommended as currently only an older version of RDkit
is supported via pip and could cause issues in the future. Also it requires a numpy version smaller numpy 2.0:
pip install "numpy<2.0"
pip install rdkit
You can also install mkl
via pip, however this requires also to set a symbolic link. Both steps can be conveniently performed using the python script ensure_mkl.py
available with this repo. Just run:
python ensure_mkl.py
Usage
After you have installed the package the best way to learn about it is to go through our example notebooks. You can find information on how to download and run the examples in the basic usage section of our HQStage documentation.
Alternatively, you can get an overview of the HQS NMR Tool Python library in the components chapter.
Background
This chapter provides an overview of the fundamentals behind the HQS NMR tool.
In section NMR, we describe the basics of nuclear magnetic resonance (NMR) from an experimental and theoretical perspective. We discuss, for example, the different relevant parameters entering an NMR calculation, such as the shifts or couplings between nuclear spins, and introduce the relevant Hamiltonian of a molecule for NMR.
In Math, we describe in greater detail the math behind calculation of spectra in the HQS NMR Tool.
Finally, in the section Solver, we discuss approaches for evaluating NMR spectra of large molecules.
NMR
Nuclear magnetic resonance (NMR) spectroscopy is one of the most important analytical techniques in chemistry and related fields. It is widely used to identify molecules, but also to obtain information about their structure, dynamics, and chemical environment. Some fundamental aspects of NMR are summarized below.
NMR spectrometers place the sample into a strong, but constant magnetic field and use a weak, oscillating magnetic field to perturb the nuclei. At or near resonance, when the oscillation frequency matches the intrinsic frequency of a nucleus, the system responds by producing an electromagnetic signal with a frequency characteristic of the magnetic field at the respective nucleus.
Zeeman interaction
For the simulation of NMR spectra for molecules, the most important part is the Zeeman effect describing the interaction of the nuclear spin with the external magnetic field . The corresponding Hamiltonian can be written as
where is the gyromagnetic ratio, the ratio of a system's magnetic moment to its angular momentum, and is the total spin operator. Assuming that the strong (and constant) magnetic field is in -direction only, meaning , the Hamiltonian simplifies to
where is the so-called Larmor frequency, the angular frequency corresponding to the precession of the spin magnetization around the magnetic field at the position of the nucleus.
As an example, consider the simplest nucleus, H, consisting of only one proton, for which the gyromagnetic ratio is MHz T, meaning that a 500 MHz NMR spectrometer has a static magnetic field of about 11.7 Tesla. The energy of radiation of the Larmor frequency MHz ( J) is several orders of magnitude smaller than the average thermal energy of a molecule at a temperature of K ( J). Therefore, the occupations of the spin states are almost equal at room temperature, only a small surplus is responsible for the sample magnetization.
Chemical shift
Perhaps the most important aspect for NMR spectroscopy in chemistry is that the nuclei in molecules are shielded against the external magnetic field by the electrons surrounding them. This can be expressed by adding a correction term to the Hamiltonian as
where is referred to as the shielding tensor quantifying the change in the local magnetic field experienced by the nucleus in the molecule relative to a bare nucleus in vacuum. However, if the molecules of interest are in solution, or in liquid phase in general, they can rotate freely and only the isotropic chemical shift is of interest,
In practice, chemical shifts are normally used instead of chemical shieldings: instead of invoking the Larmor frequency of a nucleus in vacuum, shifts are defined with respect to the resonance frequency of a reference compound:
The standard reference for H-NMR is the Larmor frequency of the protons in TMS [tetramethylsilane, Si(CH)]. Chemical shifts are normally reported on a scale of ppm (parts per million): most H chemical shifts are observed in the range between 0 and 10 ppm, and most C chemical shifts between 0 and 200 ppm. Since the scale of chemical shieldings is so small in absolute terms , for practical intents and purposes the chemical shift can be substituted directly into the Hamiltonian:
Spin-spin coupling
Up to this point, the nuclear spins have been regarded to be isolated from each other. However, their magnetic moments have an effect on neighboring spins. The interaction of the nuclear spins can happen through two different mechanisms. The first one is the direct (or through space) spin-spin coupling, where the interaction strength depends on the distance of the two nuclei and the angle of their distance vector relative to the external field. As it comes from the direct interaction of two magnetic dipoles, it is also referred to as dipolar coupling. However, the effect is generally not observable in liquid phase since the free rotation of the molecules averages over all orientations and thus results in a vanishing average coupling.
An effect observable in the NMR spectrum is indirect spin-spin coupling, which is mediated by the electrons of a chemical bond. Due to the Pauli principle, the electrons of a covalent bond always have an anti-parallel spin orientation, and one electron will be closer to one nucleus than to the other, preferring an anti-parallel orientation with the nearby nucleus. Depending on the number of electrons involved in the transmission of the interaction, either a parallel or an anti-parallel orientation of two nuclei may result in a lower energy. Importantly, this interaction does not average out in solution since it mainly depends on the electron density at the position of the nucleus and not on the orientation of the distance vector relative to the field, which is why it is also referred to as scalar coupling. Since only s-orbitals have a finite electron density at the nucleus, the coupling depends on the electron density in those orbitals alone.
The interaction Hamiltonian in the case of homonuclear coupling is given by
where . In heteronuclear coupling, where the difference in Larmor frequencies is much larger in magnitude than the corresponding coupling constant, , the Hamiltonian can be written in terms of the -components only,
It should be noted that the -coupling tensor is a real matrix that depends on the molecular orientation, but in liquid phase only its isotropic part is observed due to motional averaging. Typical -coupling strengths between protons in H-NMR amount to a few Hz.
NMR spin Hamiltonian for molecules in liquid phase
The spin Hamiltonian in a static magnetic field in frequency units (rad s) is given by
where the sum runs over all nuclear spins of interest.
There are several interactions that have not been taken into account here. As already mentioned, the direct dipolar spin-spin interaction vanishes in liquids due to motional averaging. Beyond dipolar coupling, such as quadrupolar interactions, for instance, are relevant only for nuclei with spin quantum number . Furthermore, interactions with unpaired electrons need a special treatment as well. While most organic compounds are diamagnetic (closed-shell), paramagnetic NMR also exists.
Math
The hamiltonian we are using for the NMR systems is of the form
where are the gyromagnetic factors and the chemical shifts of nuclear spin , and denotes the coupling between spins and , and , with being the usual spin operators.
Within NMR we have a strong magnetic field in the -direction, and electromagnetic pulses / oscillating fields are applied to flip the spins into the plane. Since is typically of the order of 500Mhz, the pulses of 10kHz bandwidth, and the required resolution is sub 1Hz, we refrain from modelling the explicit time dependence of the pulses. Instead we model the pulses directly by calculating the spectral function, i.e., time-dependent correlations between the corresponding operators.
The spectrum measured in an NMR experiment corresponds to the spectral function calculated in the HQS NMR Tool.
Calculation of the spectral function
We calculate the spectral function as the imaginary part of the spin-spin correlation function
the operators, , contain the gyromagnetic factors for convenience, and is the real time dependence.
The contribution of an individual nuclear spin to the full NMR spectrum is obtained via
while the full NMR signal is the sum of individual contributions.
Temperature
For a typical NMR setup, the temperature is much larger compared to the energy scales in the NMR Hamiltonian. Thus, we have to take finite or even infinite temperature into account. We define the partition sum
with the inverse temperature and arrive at
Here, the last equation is a Lehmann-type spectral representation, which is suitable if the complete spectrum is known. are the eigenenergies of the spin NMR-Hamiltonian.
Resolvent formulation
Instead of dealing with the delta function directly we can also rewrite the Green's function by introducing a convergence ensuring factor (broadening) and obtain
The broadening is formally necessary to ensure the convergence of the Fourier integral. In practice, it corresponds to the unknown or neglected noise, whether intrinsic or due to the resolution limitations of the spectrometer.
Energy rediscretization
One problem that arises in the resolvent formulation is that the NMR peaks are usually very sharp. Discretizing the frequency axis with a very fine grid is computationally ineffective. Therefore, we perform several rounds of computing the spectral function. We start with a linear grid in frequency space and a rather large artificial broadening . Then we iteratively rediscretize the frequency space in equal weight partitions of the total spectral function while reducing in each step until we reach the desired broadening. In this way, we obtain a non-linear adaptive grid with accumulation points at the spectral function peaks.
Calculating NMR Spectra
While using the resolvent formulation for the Green's function
is useful to resolve sharp features in an NMR spectrum, it does not address the problem of the exponential scaling of the Hamiltonian dimension. Evaluating this expression in a brute force approach would imply diagonalizing a Hamiltonian of dimension where is the number of spins in the molecule. This would restrict one to around 10 spins on a modern laptop. Therefore, different strategies need to be employed to evaluate NMR spectra for larger molecules.
Symmetry
First one can use the fact that the NMR Hamiltonian always conserves the quantum number, which means that we can identify a block diagonal structure in the Hamiltonian, where each block can be diagonalized individually. While this is always possible for the standard NMR Hamiltonian, it only restricts the dimension of the largest block to which still grows rather quickly.
For some molecules one can also identify magnetically equivalent groups. Such a symmetry group is defined as a group of spins, where each individual spin has the same chemical shift and couples in the exact same way to the rest of the system. Identifying these groups is advantageous, as one can combine them into higher order spin representations. This allows to exploit the local SU(2) symmetry of these groups, splitting the Hamiltonian into even smaller blocks. As an example consider a propane molecule, which has eight hydrogen atoms. By identifying all symmetrically coupled groups in this molecule, the number of spins can be reduced to two. One being the CH2 group which has a combined spin representation of one and the second being the two methyl groups each representing a spin 3/2 and adding up to a total spin representation of spin 3.
While these symmetry considerations are exact and can lead to a reasonable reduction in computational efford, they eventually break down when going to larger and larger molecules. Therefore, also approximate methods have to be used.
Clustering methods
The main approximation method used in the HQS NMR Tool
is based on the observation that we do not just evaluate the full spectral function at once, but rather determine the contribution of each individual spin separately. We should therefore write the Green's function from which we obtain the spectral function as
where and are the indices of the individual spin contributions. This allows us to identify for each spin an effective Hamiltonian and evaluate the spectral function for it independently from the other spin contributions. It turns out that a good approximation for the Hamiltonian for a specific spin contribution is typically given by simply identifying the cluster of spins most strongly coupled to the spin of interest. A good measure for this coupling is the following weight matrix, which is motivated from perturbation theory and can be analyzed using methods from graph theory:
Here, are the entries in the J coupling matrix connecting sites and , the gyromagnetic ratios, the chemical shift values, and the magnetic field strength.
Especially at high field, this method is extremely accurate allowing to choose cluster sizes around 8-12 spins for basically all molecules.
Program components
The following components will enable you to quickly and conveniently run NMR spectra calculations using the HQS NMR Tool Python library:
- The molecule input to define parameters for NMR calculations.
- Different datatypes to store the input and output of NMR calculations.
- A convenience function for quick and easy spectra calculations.
While these modules are typically all you need, you may also browse through the API-documentation to check out all available features. However, we recommend to first go through the example notebooks to get an idea of how the package is typically used.
Input structure for molecular data
In this section we will introduce the molecular data structure used as input in the HQS NMR Tool. It involves:
- Chemical name and molecular formula
- Chemical structures
- Conditions of the experiment/simulation
- NMR parameters
Some groups of molecules are already included in our internal data sets. In addition external data can be employed for setting up an NMR calculation.
In the following, the Pydantic classes related to molecular data as well as how to provide external data will be explained.
Representation in Python of molecular data structures
Inside the hqs_nmr_parameters
package, the Pydantic MolecularData
class has been implemented to
describe a molecule. It contains the following attributes:
name
: The name of the molecule. Whenname
is not provided in the YAML input, the name of the YAML file is used.isotopes
: List containing pairs of an atom index and the associated isotope. Atom indices are associated with the order in which atoms appear in the chemical structure representations, starting by index 0.shifts
: List containing pairs of an atom index and the associated chemical shift.j_couplings
: List containing pairs of atomic indices and the associated J-coupling values. Note that atom index pairs are unique: if a value is provided for an atom pair (k, l), then no value is provided for pair (l, k).structures
: Contains a dictionary with chemical structure representations. It accepts entries for a SMILES string ("SMILES"), a Molfile ("Molfile"), and an XYZ file ("XYZ"). Each value will correspond to aChemicalStructure
object. See below a full explanation of the chemical structure representations and the atom numeration.formula
: The molecular formula of the molecule.temperature
: An optional temperature definition.solvent
: Name of the solvent. An empty string represents an unknown or undefined solvent, or the absence of a solvent.description
: Optional further information.method_json
: Stores a JSON serialization of computational method settings. An empty string indicates that the field is not applicable. Creating and interpreting the content is the responsibility of the user of the model.
This is best illustrated by an example from one of our available data sets:
from hqs_nmr_parameters.examples import molecules
parameters = molecules["C2H5OH"]
print(type(parameters))
print("Chemical name:", parameters.name)
print("Molecular Formula:", parameters.formula)
print("Type(s) of representations:", parameters.structures.keys())
print("Information about solvent:", parameters.solvent)
print("Shifts:", parameters.shifts)
print("###")
print(parameters.description)
<class 'hqs_nmr_parameters.code.data_classes.MolecularData'>
Chemical name: Ethanol
Molecular Formula: C2H6O
Type(s) of representations: dict_keys(['SMILES'])
Information about solvent: chloroform
Shifts: [(3, 1.25), (4, 1.25), (5, 1.25), (6, 3.72), (7, 3.72), (8, 1.32)]
###
1H parameters for ethanol in CDCl3.
Shifts from: https://doi.org/10.1021/om100106e
J-couplings estimated.
In this case, we are getting information for the ethanol molecule, for which we have stored a SMILES representation (check below for details) and according to the description experimental 1H-NMR parameters in chloroform.
NMR parameters subset
The best way to check the NMR data is getting only the information necessary to perform an NMR calculation, i.e., a subset of parameters with the attributes isotopes
, shifts
, and j_couplings
. Those can be obtained using the spin_system
function of MolecularData
:
from pprint import pprint
from hqs_nmr_parameters.examples import molecules
parameters = molecules["C2H5OH"]
nmr_parameters = parameters.spin_system()
print(type(nmr_parameters))
pprint(nmr_parameters.model_dump())
<class 'hqs_nmr_parameters.code.data_classes.NMRParameters'>
{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H'), (1, 'H'), (1, 'H'), (1, 'H')],
'j_couplings': [((0, 3), 7.0),
((0, 4), 7.0),
((1, 3), 7.0),
((1, 4), 7.0),
((2, 3), 7.0),
((2, 4), 7.0)],
'shifts': [1.25, 1.25, 1.25, 3.72, 3.72, 1.32]}
As we can see, nmr_parameters
is an instance of the Pydantic NMRParameters
class, where the active nuclei have been renumbered starting from 0. It is used as input for several functions to calculate an NMR spectrum. To perform an NMR simulation, we can do the following:
from hqs_nmr_parameters.examples import molecules
from hqs_nmr import calculate_spectrum
parameters = molecules["C2H5OH"]
nmr_parameters = parameters.spin_system()
print(type(nmr_parameters)) # NMRParameters
spectrum = calculate_spectrum(molecule_parms=nmr_parameters, frequency=400.0)
print(type(spectrum)) # Spectrum
<class 'hqs_nmr_parameters.code.data_classes.NMRParameters'>
<class 'hqs_nmr.datatypes.Spectrum'>
For getting a better insight into NMRParameters
and the output object Spectrum
have a look here.
Isotopes/atoms selection
MolecularData
objects can contain data for several isotopes. However, we are often interested in creating a spin system for a given nuclei only. For example, if we want to simulate a 1H-NMR spectrum but we also have 13C-NMR parameters in our data, it is possible either to select only 1H using keep_nuclei(isotopes=["1H"])
or to drop 13C using drop_nuclei(isotopes=["13C"])
. Let us take a look at this in more detail:
from pprint import pprint
from hqs_nmr_parameters.examples import molecules
parameters = molecules["CH3Cl_13C"]
pprint(parameters.isotopes)
drop_C = parameters.drop_nuclei(isotopes=["13C"])
keep_H = parameters.keep_nuclei(isotopes=["1H"])
print(drop_C == keep_H)
nmr_parameters_1H = drop_C.spin_system()
pprint(nmr_parameters_1H.model_dump())
[(0, Isotope(mass_number=13, symbol='C')),
(2, Isotope(mass_number=1, symbol='H')),
(3, Isotope(mass_number=1, symbol='H')),
(4, Isotope(mass_number=1, symbol='H'))]
True
{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H')],
'j_couplings': [((0, 1), -10.8), ((0, 2), -10.8), ((1, 2), -10.8)],
'shifts': [3.05, 3.05, 3.05]}
We should take into account that the dropping/keeping of isotopes needs to be done carefully. The 13C nucleus has a nuclear spin of 1/2 (as 1H) and is NMR-activate but has a very low natural abundance, and the 13C-1H coupling pattern is only barely visible in 1H-NMR spectra. Therefore, it is common to simulate only 13 C-decoupled 1H-NMR spectra. However, the coupling of 1H with other isotopes such as 19F or 31P should not be ignored in order to get the correct peak pattern.
keep_nuclei
can also take an atoms
argument, which accepts a list of atom indices for which the NMR parameters are to be kept.
In the case of drop_nuclei
, specified nuclei can be dropped via atoms
, but a keep_atoms
argument prevents certain atoms from being dropped (useful in combination with isotopes
).
Structure representations
As shown above a MolecularData
object has the attribute structures
, which is a dictionary that can accept three entries
(keys): "SMILES", "Molfile" and "XYZ". The value of each of these entries is a ChemicalStructure
object that stores chemical structure representations of the type defined in the key:
- SMILES (Simplified Molecular-Input Line-Entry System): Strings representing the connectivity of all non-hydrogen atoms in a molecule. They become laborious to understand for complex molecules.
- Molfiles: Text files with a two-dimensional (2D) structure of a molecule (skeletal representation). It could omit hydrogens. The number of hydrogen atoms is inferred from the atomic valencies, of the heavy atoms.
- XYZ files: Text files containing as first line an integer with the total number of atoms, followed by a comment line, and then atomic positions and chemical element symbols for all the atoms explicitly.
Connectivity and charge information can be extracted from SMILES strings or Molfiles, but not from XYZ files. More information about Molfiles and Strings can be found here.
The attributes of the ChemicalStructure
class are:
representation
: Type of chemical structure representation present in thecontent
attribute. It could be "SMILES", "Molfile" or "XYZ".content
: It corresponds to the raw structure representation (as an string), i.e., a SMILES string or the content of an XYZ file or a Molfile.charge
: Charge of the molecule.symbols
: List of chemical element symbols for the full set of atoms.atom_map
: As we have seen, 2D representations can omit hydrogens, this attribute contains information about how the full representation would map into this reduced representation. It is a list of the length of the molecule (full set of atoms), where the atom counting starts from zero and takes into account:- Each non-hydrogen atom maps onto the respective index in the reduced representation.
- Each explicit hydrogen maps onto the index of its explicitly represented counterpart.
- Each implicit hydrogen maps onto the reduced index of the respective backbone atom.
In the case of ethanol, a full structures
representation (dictionary saved here in the variable ethanol_representation
) may look like:
from hqs_nmr_parameters import ChemicalStructure
ethanol_representation = {
"XYZ": ChemicalStructure(representation="XYZ", content="9\n\nC -0.88708900 0.17506400 -0.01253500\nC 0.46048900 -0.51551600 -0.04653500\nO 1.44296500 0.30726700 0.56557200\nH -0.84747800 1.12776800 -0.55081700\nH -1.65878200 -0.45332700 -0.46584200\nH -1.17694400 0.40367600 1.01830600\nH 0.76871200 -0.72432800 -1.07546000\nH 0.41948600 -1.46207300 0.50017700\nH 1.47864000 1.14146800 0.06713500\n", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 3, 4, 5, 6, 7, 8]),
"Molfile": ChemicalStructure(representation="Molfile", content="\n RDKit 2D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 3 2 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C -1.299038 -0.250000 0.000000 0\nM V30 2 C 0.000000 0.500000 0.000000 0\nM V30 3 O 1.299038 -0.250000 0.000000 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 1 2 CFG=3\nM V30 2 1 2 3 CFG=3\nM V30 END BOND\nM V30 END CTAB\nM END\n", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 0, 0, 0, 1, 1, 2]),
"SMILES": ChemicalStructure(representation="SMILES", content="CCO", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 0, 0, 0, 1, 1, 2])
}
Each of the entries corresponds to a ChemicalStructure
object:
from pprint import pprint
smiles_representation = ethanol_representation["SMILES"]
print(type(smiles_representation)) # ChemicalStructure
pprint(smiles_representation.model_dump())
<class 'hqs_nmr_parameters.code.data_classes.ChemicalStructure'>
{'atom_map': [0, 1, 2, 0, 0, 0, 1, 1, 2],
'charge': 0,
'content': 'CCO',
'representation': 'SMILES',
'symbols': ['C', 'C', 'O', 'H', 'H', 'H', 'H', 'H', 'H']}
ethanol_representation["Molfile"].atom_map
and ethanol_representation["SMILES"].atom_map
will return:
[0, 1, 2, 0, 0, 0, 1, 1, 2]
In both cases, only the carbon and the oxygen atoms have been indicated explicitly. Therefore, the first three atoms are listed expressly (carbon atoms 0
and 1
, and oxygen atom 2
, i.e. the atom counting starts from zero) and the six following hydrogens are assigned to one of those backbone atoms. The full atomic indices associated with each reduced index can be obtained using the inverted_map
method:
from pprint import pprint
smiles_representation = ethanol_representation["SMILES"]
print(smiles_representation.atom_map)
print(smiles_representation.inverted_map)
[0, 1, 2, 0, 0, 0, 1, 1, 2]
[{0, 3, 4, 5}, {1, 6, 7}, {8, 2}]
Hence, C (index = 0
) is connected to H atoms 3
, 4
, 5
, while C (index = 1
) is connected to H atoms 6
, 7
, and the O (index = 2
), to H 8
.
However, ethanol_representation["XYZ"].atom_map
will return:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Since all the atoms are explicitly provided in an XYZ format.
The ChemicalStructure
class can also yield the chemical formula in Hill notation using the formula
method. smiles_representation.formula
will return:
'C2H6O'
Serialization (saving in a JSON file) of a MolecularData
object and deserialization
We can save a MolecularData
instance in a JSON file for future uses. To do so, employ the write_file
method of the
MolecularData
class:
from hqs_nmr_parameters.examples import molecules
parameters = molecules["C2H5OH"]
parameters.write_file("etoh_molecular_data.json")
This JSON file can be easily loaded and validated against the MolecularData
model using the read_file
method:
from hqs_nmr_parameters import MolecularData
loaded_parameters = MolecularData.read_file("etoh_molecular_data.json")
print(type(loaded_parameters))
<class 'hqs_nmr_parameters.code.data_classes.MolecularData'>
Data sets
Molecular data for a group of molecules can be collected using the MolecularDataSet
Pydantic class, which contains instances of type MolecularData
. It is composed of two attributes:
description
: Summary of the content of the data set.dataset
: Dictionary where the keys are identifiers for the molecules (e.g., molecule names) and the values correspond to aMolecularData
object per molecule.
A list with the keys of the dataset
can be obtained directly from the MolecularDataSet.keys
property. This allows us to conveniently access the molecular data of each molecule using its key as string.
In oder to have a brief summary of the molecules belonging to a data set, we can use the get_names
method. It retrieves a dictionary where the keys correspond to the keys of the dataset
and the values are the chemical names. An equivalent dictionary providing the chemical formulas can be accessed via the get_formulas
method.
Similar to keep_nuclei
/drop_nuclei
in MolecularData
, keep_isotopes
, drop_isotopes
keeps/drops selected isotopes for all molecules in a data set. An extra description
can be added with updated information about the set (whitespace for separation from the original description content must be included).
As in MolecularData
, it is possible to save or load data sets thanks to the read_file
and write_file
methods that
(de)serialize JSON files.
Data sets implemented in the hqs_nmr_parameters
package will be explained in the follow.
Examples module
The first data set that is worth to mention is the examples
module which has a set of molecule definitions
encapsulated in the molecules
MolecularDataSet
object. This set can be accessed via:
from pprint import pprint
from hqs_nmr_parameters.examples import molecules
print(type(molecules)) # MolecularDataSet
# Keys of the data set:
print(molecules.keys)
<class 'hqs_nmr_parameters.code.data_classes.MolecularDataSet'>
['CH3Cl', 'limonene_DFT', '1,2,4-trichlorobenzene', 'Anethole', 'Artemisinin_exp', 'endo-dicyclopentadiene_DFT', 'CH3Cl_13C', 'C2H3CN', 'Artemisinin', 'camphor_DFT', 'C6H6', 'C10H8', 'Triphenylphosphine_oxide', 'H2CCF2', 'C2H5Cl', 'Androstenedione', 'Cinnamaldehyde', 'CHCl3_13C', 'C6H5NO2', 'C2H5OH', 'C2H6', 'C10H7Br', 'CHCl3', 'camphor_exp', 'exo-dicyclopentadiene_DFT', '1,2-di-tert-butyl-diphosphane', 'C2H3NC', 'cyclopentadiene_DFT', 'cis-3-chloroacrylic_acid_exp', 'C3H8']
Note that the content of this list of keys is just an example and might appear in a different order or with different entries depending on the installed version of hqs_nmr_parameters
. The same holds for the dictionary of molecule names which can be obtained as follows:
pprint(molecules.get_names())
{'1,2,4-trichlorobenzene': '1,2,4-Trichlorobenzene',
'1,2-di-tert-butyl-diphosphane': 'tert-Butyl(tert-butylphosphanyl)phosphane',
'Androstenedione': 'Androstenedione',
'Anethole': 'Anethole',
'Artemisinin': 'Artemisinin',
'Artemisinin_exp': 'Artemisinin',
'C10H7Br': '2-Bromonaphthalene',
'C10H8': 'Naphthalene',
'C2H3CN': 'Acrylonitrile',
'C2H3NC': 'Vinyl isocyanide',
'C2H5Cl': 'Chloroethane',
'C2H5OH': 'Ethanol',
'C2H6': 'Ethane',
'C3H8': 'Propane',
'C6H5NO2': 'Nitrobenzene',
'C6H6': 'Benzene',
'CH3Cl': 'Chloromethane',
'CH3Cl_13C': 'Chloromethane',
'CHCl3': 'Chloroform',
'CHCl3_13C': 'Chloroform',
'Cinnamaldehyde': 'Cinnamaldehyde',
'H2CCF2': '1,1-Difluoroethene',
'Triphenylphosphine_oxide': 'Triphenylphosphine oxide',
'camphor_DFT': 'Camphor',
'camphor_exp': 'Camphor',
'cis-3-chloroacrylic_acid_exp': 'cis-3-Chloroacrylic acid',
'cyclopentadiene_DFT': 'Cyclopentadiene',
'endo-dicyclopentadiene_DFT': 'endo-Dicyclopentadiene',
'exo-dicyclopentadiene_DFT': 'exo-Dicyclopentadiene',
'limonene_DFT': 'Limonene'}
In addition, if we want to have a feeling of the size of the molecules in the set, we could print their formula using the
get_formulas
method.
The full molecular definition for a given molecule can be loaded using its string key. Each entry of this data set includes a 2D representation (Molfile or SMILES string) of the molecule. Let us consider an example:
from pprint import pprint
from hqs_nmr_parameters.examples import molecules
# Obtain the MolecularData object for acrylonitrile
parameters = molecules["C2H3CN"]
# Print parameters
pprint(parameters.model_dump())
{'description': '1H parameters for acrylonitrile.\n'
"Values were obtained from Hans Reich's Collection, NMR "
'Spectroscopy.\n'
'https://organicchemistrydata.org\n',
'formula': 'C3H3N',
'isotopes': [(3, (1, 'H')), (4, (1, 'H')), (5, (1, 'H'))],
'j_couplings': [((3, 4), 0.9), ((3, 5), 11.8), ((4, 5), 17.9)],
'method_json': '',
'name': 'Acrylonitrile',
'shifts': [(3, 5.79), (4, 5.97), (5, 5.48)],
'solvent': '',
'structures': {'Molfile': {'atom_map': [0, 1, 2, 3, 4, 5, 6],
'charge': 0,
'content': '\n'
'JME 2022-02-26 Wed Sep 07 15:54:28 '
'GMT+200 2022\n'
'\n'
' 0 0 0 0 0 0 0 0 0 0999 '
'V3000\n'
'M V30 BEGIN CTAB\n'
'M V30 COUNTS 7 6 0 0 0\n'
'M V30 BEGIN ATOM\n'
'M V30 1 C 2.4249 2.1000 0.0000 0\n'
'M V30 2 C 3.6373 1.4000 0.0000 0\n'
'M V30 3 C 1.2124 1.4000 0.0000 0\n'
'M V30 4 H 0.0000 2.1000 0.0000 0\n'
'M V30 5 H 1.2124 0.0000 0.0000 0\n'
'M V30 6 H 2.4249 3.5000 0.0000 0\n'
'M V30 7 N 4.8497 0.7000 0.0000 0\n'
'M V30 END ATOM\n'
'M V30 BEGIN BOND\n'
'M V30 1 1 1 2\n'
'M V30 2 2 1 3\n'
'M V30 3 1 3 4\n'
'M V30 4 1 3 5\n'
'M V30 5 1 1 6\n'
'M V30 6 3 2 7\n'
'M V30 END BOND\n'
'M V30 END CTAB\n'
'M END\n',
'representation': 'Molfile',
'symbols': ['C', 'C', 'C', 'H', 'H', 'H', 'N']}},
'temperature': None}
As we can see, data for setting up 1H-NMR spectrum of the acrylonitrile molecule has been stored together with its Molfile.
To set up an NMR calculation we are only interested in some of the previous data. To retrieve it, use the spin_system
method:
nmr_parameters = parameters.spin_system()
pprint(nmr_parameters.model_dump())
{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H')],
'j_couplings': [((0, 1), 0.9), ((0, 2), 11.8), ((1, 2), 17.9)],
'shifts': [5.79, 5.97, 5.48]}
CHESHIRE module
In the cheshire
module, one can find molecular data for molecules belonging to the CHESHIRE database. Five data sets (MolecularDataSet
objects) have been created from this database depending on the NMR data:
experimental_shifts_only
: It includes the experimental shifts (for 13C and 1H) of 105 molecules.calculated_full
: It has theoretical NMR data for the 65 rigid molecules (molecules with only one conformer) of the previous set. Details of the calculations can be found under thedescription
attribute of each item (see below).combined_full
: It contains experimental shifts and theoretical J-couplings for the rigid molecules (in this case only 60 molecules due to incomplete number of shifts inexperimental_shifts_only
).- The
calculated
andcombined
data sets are the reduced versions of the aforementioned sets and contain only the NMR data required for simulating 1H-NMR spectra.
In addition, for non-expert users, we have included the alias molecules
, which returns the combined
set,
i.e., 1H-NMR data for the rigid molecules, with experimental shifts and calculated J-couplings.
These data sets can be imported from hqs_nmr_parameters.cheshire
as: experimental_shifts_only
, calculated_full
,
combined_full
, calculated
, and combined
. Here, we will focus on the molecules
set that it will be imported as
cheshire_molecules
to avoid confusion with the examples
module.
from hqs_nmr_parameters.cheshire import molecules as cheshire_molecules
In the follow, we will retrieve some interesting information from the set, as a brief explanation of the set:
from pprint import pprint
pprint(cheshire_molecules.description)
('Experimental shifts and theoretical J-couplings for the rigid molecules of '
"the Cheshire set, except for ['Cyclopropanone', 'Bicyclobutane', "
"'Cyclopentanone', 'Fluorobenzene', 'Indole'] due to incompatible data.\n"
"Shifts and couplings only for nuclei ['1H', '19F', '31P', '29Si'].")
Or the keys of the molecules that give access to the molecular data. For simplicity, they correspond to a string representation of integers that go from 1 to 105. For the molecule
set, where only rigid molecules are included, some numbers are missing:
print(cheshire_molecules.keys)
['1', '2', '4', '5', '6', '9', '10', '11', '12', '13', '15', '16', '17', '18', '20', '23', '29', '30', '32', '33', '34', '36', '39', '41', '42', '44', '46', '47', '48', '49', '50', '51', '52', '54', '55', '59', '60', '61', '66', '68', '71', '73', '74', '75', '76', '77', '81', '84', '85', '86', '87', '88', '91', '92', '93', '95', '96', '99', '100', '105']
As before, the molecule names could be obtained using the get_names
function. But here, we will focus on a single entry:
print(cheshire_molecules["1"].name)
'Dichloromethane'
In the description of each entry we find important information about how the NMR parameters were obtained.
print(cheshire_molecules["1"].description)
Geometries in chloroform at B97-3c.
Experimental shifts from CHESHIRE: http://cheshirenmr.info/.
J-couplings (gas-phase) at PBE/pcJ-3.
Parameters averaged over rotamers using permutations.
Each entry in the data set includes both a 2D (Molfile) and a 3D (XYZ) representation of the molecule.
pprint(cheshire_molecules["1"].structures)
{'XYZ': ChemicalStructure(representation='XYZ', content='5\n\nC -0.00000000 0.00000000 0.77788868\nCl 0.00000000 1.49340832 -0.21847658\nCl 0.00000000 -1.49340833 -0.21847658\nH -0.89835401 0.00000000 1.37677523\nH 0.89835401 0.00000000 1.37677523\n', charge=0, symbols=['C', 'Cl', 'Cl', 'H', 'H'], atom_map=[0, 1, 2, 3, 4]),
'Molfile': ChemicalStructure(representation='Molfile', content='\n RDKit 2D\n\n 0 0 0 0 0 0 0 0 0 0999 V3000\nM V30 BEGIN CTAB\nM V30 COUNTS 5 4 0 0 0\nM V30 BEGIN ATOM\nM V30 1 C 0.000000 -0.000000 0.000000 0\nM V30 2 Cl 0.000000 1.500000 0.000000 0\nM V30 3 Cl -0.000000 -1.500000 0.000000 0\nM V30 4 H 1.500000 -0.000000 0.000000 0\nM V30 5 H -1.500000 0.000000 0.000000 0\nM V30 END ATOM\nM V30 BEGIN BOND\nM V30 1 1 2 1\nM V30 2 1 3 1\nM V30 3 1 1 4 CFG=3\nM V30 4 1 5 1\nM V30 END BOND\nM V30 END CTAB\nM END\n', charge=0, symbols=['C', 'Cl', 'Cl', 'H', 'H'], atom_map=[0, 1, 2, 3, 4])}
To access the NMR data:
pprint(cheshire_molecules["1"].spin_system().model_dump())
{'isotopes': [(1, 'H'), (1, 'H')],
'j_couplings': [((0, 1), -5.171)],
'shifts': [5.28, 5.28]}
Phytolab module
The phytolab
module contains some selected molecules from a catalogue by Phytolab. It includes the following three sets:
calculated_full
: all NMR parameters in this set are computed. Details can be obtained from the description of each item in the set. Where 13C data has also been calculated, shifts and couplings are included for each nucleus.calculated
: this set of computed parameters is intended for the calculation of one-dimensional 1H-NMR spectra, as the parameters for 13C are omitted.combined
: where possible, the chemical shifts have been adjusted manually to achieve a better match with experimental 1H-NMR spectra. Therefore, this set contains a combination of adjusted or computed shifts, and computed J-couplings. This set is recommended to simulate 1H-NMR spectra to obtain the closest agreement with experiment.
In addition, the module includes the set molecules
, which is an alias for combined
as described above.
To access these data sets:
from hqs_nmr_parameters.phytolab import molecules as phytolab_molecules
print("Dataset content:")
print(phytolab_molecules.description + "\n")
print(f"Entries of the set: {phytolab_molecules.keys}\n")
print("NMR Parameters details for Psoralen:")
print(phytolab_molecules["psoralen"].description)
Dataset content:
NMR parameters for selected natural products from a catalogue by Phytolab.
Where possible, chemical shifts have been adjusted to match experimental spectra. The remaining parameters are computed.
For further details, please refer to descriptions of the individual items in the set.
Entries of the set: ['angelicin', 'bakuchicin', 'psoralen', 'friedelin']
NMR Parameters details for Psoralen:
Geometry in chloroform at B97-3c.
Shifts manually adjusted to match 1H-NMR spectrum at 80 MHz in CDCl3 provided by Phytolab.
J-couplings (gas-phase) at PBE/pcJ-3.
As for the examples
module, the content of the sets will depend on the installed version of hqs_nmr_parameters
.
This module is included as of version 0.9.1
.
Assignments module
The assignments
module contains example data of other complex molecules.
Patchoulol
The patchoulol
data set contains two molecules, the originally proposed structure for patchouli alcohol
and the correct structure. We can access it as:
from hqs_nmr_parameters.assignments import patchoulol
To get an overview of the set, we could print its description
:
print(patchoulol.description)
Theoretical <sup>1</sup>H-NMR data (shifts and J-couplings) for patchouli alcohol (patchoulol).
This set contains molecular data (with NMR parameters) related to the experimentally confirmed (correct) structure of patchoulol as well as for the structure initially (erroneous) attributed to patchoulol (see Scheme 1 of https://doi.org/10.1002/anie.200460864 for details).
The experimental <sup>1</sup>H-NMR spectrum of correct patchoulol is available at https://doi.org/10.13018/BMSE001312.
Only the two mentioned molecules are present in the set, we can access them via their keys
:
print(patchoulol.keys)
print(patchoulol["correct"].name,"\n",patchoulol["erroneous"].name)
['correct', 'erroneous']
Patchouli alcohol
4,10,11,11-Tetramethyltricyclo[5.3.1.01,5]undecan-10-ol
With this data, we can now use the HQS NMR Tool to simulate both spectra and see the differences between these two similar molecules as well as compare with the experimental spectrum.
Menthol isomers
The menthol_isomers
data set is a collection of the four possible diastereomers of menthol
(5-methyl-2-(propan-2-yl)cyclohexan-1-ol). With three chiral centers at positions 1, 2, and 5 (in IUPAC convention), there are the following eight possible structures:
-
Menthol:
- (+)-enantiomer, with stereocenters 1S, 2R, 5S.
- (-)-enantiomer, with stereocenters 1R, 2S, 5R.
-
Neomenthol:
- (+)-enantiomer, with stereocenters 1S, 2S, 5R.
- (-)-enantiomer, with stereocenters 1R, 2R, 5S.
-
Isomenthol:
- (+)-enantiomer, with stereocenters 1S, 2R, 5R.
- (-)-enantiomer, with stereocenters 1R, 2S, 5S.
-
Neoisomenthol:
- (+)-enantiomer, with stereocenters 1R, 2R, 5R.
- (-)-enantiomer, with stereocenters 1S, 2S, 5S.
Since enantiomers are not distinguishable by conventional NMR spectroscopy, there are four different possible NMR spectra. The given data set contains NMR parameters calculated with density functional theory (DFT) for one enantiomer of each pair and can be imported from the assignments
module as menthol_isomers_full
for 1H- and 13C-NMR parameters or as menthol_isomers
for only 1H-NMR data.
The description gives an overview of the data set:
from hqs_nmr_parameters.assignments import menthol_isomers
print(menthol_isomers.description)
Calculated NMR parameters (only 1H) for all four stereoisomers of menthol.
Structures are (absolute stereochemistry indicated by chiral centers at positions 1, 2, and 5 in IUPAC convention):
(1S,2S,5R)-(+)-Neomenthol (SSR): data averaged over 3 conformers.
(1R,2S,5R)-(-)-Menthol (RSR): data averaged over 3 conformers.
(1S,2R,5R)-(+)-Isomenthol (SRR): data averaged over 6 conformers.
(1S,2S,5S)-(-)-Neoisomenthol (SSS): data averaged over 8 conformers.
Various experimental NMR spectra of (1S,2S,5R)-(+)-neomenthol are available at https://doi.org/10.13018/BMSE000498.
The keys of the structures in the data set can be listed as:
for key in menthol_isomers.keys:
print(f"{key}: {menthol_isomers[key].name}")
SSR: (+)-Neomenthol (SSR)
RSR: (-)-Menthol (RSR)
SRR: (+)-Isomenthol (SRR)
SSS: (-)-Neoisomenthol (SSS)
For more information on the applied computational level of theory, please inspect the individual element descriptions with the description
attribute.
This data can be used to simulate the NMR spectra of all diastereomers as explained earlier and compare them to experimental ones, e.g., to that of neomenthol available here. Due to the limited accuracy of DFT calculations, it is not always straightforward to identify the correct isomer if the exact structure of the experimental measurement is unknown, but the comparison with all four possibilities will provide valuable insights for structure elucidation.
Input of molecular NMR parameters via a YAML file
NMR parameters for molecules can be provided in a YAML file. A brief summary of relevant YAML features is provided before proceeding to more detailed explanations and examples.
- Dictionaries in YAML are defined as
key: value
pairs. Most commonly, a dictionary contains one key/value pair per line:key 1: value 1 key 2: value 2
value 1
is interpreted as a string,1
is interpreted as an integer and1.0
is interpreted as a floating-point number. To avoid problems with special characters (e.g., square brackets), strings may be enclosed in single or double quotes (they have different meanings, and single quotes should be preferred for a literal interpretation of the string).- Lists can be defined over multiple lines as:
Lists can also be enclosed in square brackets:- item 1 - item 2
[item 1, item 2, item 3]
. The nested list[[1, 2, 3], [4, 5, 6]]
is equivalent to:- [1, 2, 3] - [4, 5, 6]
- Indentation is part of the syntax: key/value pairs or list entries over multiple lines need to have the same number of leading spaces (no tabs).
- Comments start with a hash,
#
.
Definition of the molecular structure
Definition using SMILES
A molecular structure needs to be provided along with its NMR parameters in order to get a complete molecular data input. The YAML input accepts 2D structural representations, i.e., SMILES strings or Molfiles.
The simplest way to define a structure in the input file is through its SMILES representation. This is done using the key smiles
, followed by a representation of the molecule. For acetic acid:
# Acetic acid defined using SMILES.
smiles: CC(=O)O
SMILES strings often contain square brackets [...]
. In such cases, the string should be enclosed within quotes, '...'
, to avoid problems with the YAML parser.
Definition using a Molfile
Manual definition of increasingly large molecules using SMILES can be cumbersome. For example, the string representation of penicillin V would be:
smiles: 'CC1(C)S[C@@H]2[C@H](NC(=O)COc3ccccc3)C(=O)N2[C@H]1C(=O)O'
Instead, it is easier to draw a graphical representation such as the one below using one of many available proprietary or open source packages.
data:image/s3,"s3://crabby-images/7fcfb/7fcfb2b514357ecde8035adee66a26b36aafdded" alt=""
Such structural 2D representations are commonly stored in Molfiles. A molecule can be read from a Molfile by specifying the file name after the key molfile
:
molfile: penicillin_v.mol
Note that the YAML input needs to contain either a Molfile or SMILES, but it is not possible to specify both at the same time. Both the V2000 and V3000 variants of the Molfile specification are supported in the input.
Hydrogens in the molecular structure
2D representations of molecular structures, whether as skeletal formulas or as SMILES, tend to omit hydrogens. Instead, the number of hydrogen atoms is inferred from the atomic valencies, especially those of the carbons. Any of the following three structures can be provided as a Molfile for the acrylamide molecule:
Where hydrogens are suppressed (not drawn out as separate atoms with a bond), their NMR parameters are specified through assignment to the respective skeletal atom. In the leftmost of the three structures shown above, it would be not possible to assign different parameters to the two protons in the CH2 group. Instead, one of the two other structures shown above could be used to specify different parameters for those protons.
The only restriction with regards to hydrogens is that any skeletal atom can be connected either to suppressed or to stand-alone hydrogens, but not to a mixture of both. Thus, the following two structures would be rejected during input parsing:
The structure to the left mixes a non-suppressed hydrogen with a suppressed "implicit" hydrogen (CH) on the carbon atom; the structure to the right mixes a non-suppressed hydrogen with a suppressed "explicit" hydrogen (NH) on the nitrogen atom.
Numbering of atoms
Atoms in the structural representation of the molecule are labelled with integers for the assignment of parameters. Indices can be counted starting from zero or from one. To avoid errors or misunderstandings, it is mandatory to specify a count from
key in the input file, followed by either 0 or 1. The choice between those two options is entirely arbitrary.
An example for acetamide with atom counting starting from zero:
# Atom indices are 0, 1, 2, 3 in their order of appearance in the SMILES string.
smiles: CC(=O)N
count from: 0
An example for atom counting starting from one:
# Atom indices are 1, 2, 3, 4 in their order of appearance in the SMILES string.
smiles: CC(=O)N
count from: 1
The atoms in a molecule are indexed by their order of appearance in the Molfile. Acetamide may be represented by a Molfile with the following content:
RDKit 2D
4 3 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5981 -0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 2 0
2 4 1 0
M END
Counting the atoms starting from 1 (count from: 1
), the appropriate indices are represented in the image below.
data:image/s3,"s3://crabby-images/d7e72/d7e72cf3e7e378b078e9859979754c3f1f2bc36f" alt=""
Chemical shifts
Providing chemical shifts will be illustrated using the nitrobenzene molecule as an example. Its molecular structure is contained in the file PhNO2.mol and shown in the picture below, including a numbering of its atoms.
data:image/s3,"s3://crabby-images/44817/44817bd11f0f834f92c0fc7f2a83de285fd19c01" alt=""
All chemical shifts are specified under the key shifts
. Additionally, the values need to be nested under keys representing isotopes, which contain the atomic mass number and the element symbol (e.g., 1H
or 13C
). For each isotope, the chemical shifts are provided in pairs of an atomic index and the associated value in ppm (parts per million):
# Structure with suppressed protons.
molfile: PhNO2.mol
count from: 0
shifts:
# chemical shifts for protons in ppm
1H:
1: 8.23
2: 7.56
3: 7.71
4: 7.56
5: 8.23
# chemical shifts for carbon-13 nuclei in ppm
13C:
0: 148.5
1: 123.7
2: 129.4
3: 134.3
4: 129.4
5: 123.7
To assign chemical shifts for suppressed protons (that are not provided explicitly in the skeletal structure), the indices of the respective non-hydrogen atoms are used instead. All suppressed protons connected to the same atom are assigned an identical shift value.
If a Molfile contains hydrogens as standalone atoms, the chemical shifts are assigned to those protons using their respective atom indices. This is illustrated using the file PhNO2_allH.mol. Its structure is shown below.
data:image/s3,"s3://crabby-images/a894c/a894cbcc00173f1d0adfaffa5a722b328e9b6775" alt=""
In this example, the 1H shifts need to be assigned to atoms 9-13. Assigning them to atoms 1-5, as in the previous example, would produce an error.
# Structure with suppressed protons.
molfile: PhNO2_allH.mol
count from: 0
shifts:
# chemical shifts for protons in ppm
1H:
9: 8.23
10: 7.56
11: 7.71
12: 7.56
13: 8.23
# chemical shifts for carbon-13 nuclei in ppm
13C:
0: 148.5
1: 123.7
2: 129.4
3: 134.3
4: 129.4
5: 123.7
Indirect spin-spin coupling constants
Indirect spin-spin coupling constants are provided under the key J-couplings
in the YAML file. Additionally, the coupling constant values need to be grouped together by isotopes. Keys for each combination of isotopes are combined as isotope1-isotope2
: e.g., 1H-1H
for coupling constants between two protons or 1H-13C
for the associated heteronuclear coupling.
J-coupling constants in units of Hz for each combination of nuclei are provided as a list of lists with the following structure:
J-couplings:
isotope1-isotope2:
- [atom index 1, atom index 2, coupling constant in Hz]
- [atom index 1, atom index 2, coupling constant in Hz]
- [...]
isotope1-isotope2:
- [atom index 1, atom index 2, coupling constant in Hz]
- [...]
The first atom index refers to the first isotope and the second atom index refers to the second isotope. As with shifts, values for suppressed hydrogens are assigned via the associated skeletal carbon or heteroatom. If multiple protons are connected to the same skeletal atom, they are assigned the same coupling constant. Inequivalent protons attached to the same skeletal atom, need to be specified explicitly as standalone atoms in the molecule definition, so that they can be referred to via their respective atom indices.
Examples
1H parameters for propane with SMILES input
smiles: CCC
count from: 0
shifts:
1H:
0: 0.9
1: 1.3
2: 0.9
J-couplings:
1H-1H:
- [0, 1, 7.26]
- [1, 2, 7.26]
The protons at the terminal CH3 groups are assigned chemical shifts of 0.9 ppm each, and the protons of the central CH2 group are given a value of 1.3 ppm. J-couplings between all protons of the neighboring CH3 and CH2 groups are assigned as 7.26 Hz. While an indirect spin-spin coupling interaction exists between equivalent protons within the CH3 and CH2 groups, it is not observed in the spectrum and the associated values are left out.
1H parameters for acrylonitrile
The structure of acrylonitrile is provided as a Molfile in acrylonitrile.mol. Its depiction is shown below.
data:image/s3,"s3://crabby-images/d51e2/d51e23e53fb1985273ab05de44bd7eeec49d41da" alt=""
Since protons 4 and 5 are inequivalent, they are specified as standalone atoms with different parameters. In addition, hydrogen 6 is represented as a standalone atom, though suppressing it would also be an equally valid choice.
molfile: acrylonitrile.mol
count from: 1
shifts:
1H:
4: 5.79 # H(trans)
5: 5.97 # H(cis)
6: 5.48 # H(gem)
J-couplings:
1H-1H:
- [4, 5, 0.9]
- [4, 6, 11.8]
- [5, 6, 17.9]
Combined 1H and 13C parameters for chloromethane
To illustrate the definition of heteronuclear coupling constants, the following example shows parameters for 13C-enriched chloromethane. The parameters include the shifts of the three protons and the 13C nucleus, as well as the coupling constants between these nuclei.
# CH3Cl with 13C, the hydrogens inside square brackets are implicit.
smiles: '[13CH3]Cl'
# C will have index 1 and Cl will have index 2
count from: 1
shifts:
# Shifts of the three protons.
1H:
1: 3.05
# Shift of carbon-13 in the molecule.
13C:
1: 25.6
J-couplings:
# Coupling between the three protons (would normally not be observed).
1H-1H:
- [1, 1, -10.8]
# Coupling between the three protons and the 13C atom.
1H-13C:
- [1, 1, 150.0]
Stored YAML files
For the molecules included in the examples
module of the hqs_nmr_parameters
package, YAML input files (and Molfiles when indicated as molecular structure) are available in the file system. To access them, use the following command, taking into account that each YAML file takes the name of its key in the data set. For acrylonitrile (with key "C2H3CN"), we will have:
from pathlib import Path
from hqs_nmr_parameters import examples
identifier = "C2H3CN"
acrylonitrile_yaml = Path(examples.__file__).parent.joinpath("parameters", identifier + ".yaml")
print(acrylonitrile_yaml.read_text())
name: Acrylonitrile
molfile: C2H3CN.mol
count from: 1
shifts:
1H:
4: 5.79 # H(trans)
5: 5.97 # H(cis)
6: 5.48 # H(gem)
J-couplings:
1H-1H:
- [4, 5, 0.9]
- [4, 6, 11.8]
- [5, 6, 17.9]
description: |
1H parameters for acrylonitrile.
Values were obtained from Hans Reich's Collection, NMR Spectroscopy.
https://organicchemistrydata.org
As we have seen, to get this data as MolecularData
instance:
from hqs_nmr_parameters.examples import molecules
identifier = "C2H3CN"
parameters = molecules[identifier]
print(parameters.shifts)
[(3, 5.79), (4, 5.97), (5, 5.48)]
Therefore, even if the atom counting in the YAML file starts at 1, the numbering in MolecularData
always starts at 0.
Reading and processing YAML input files
In the HQS NMR Tool, a YAML file containing NMR parameters is parsed using the read_parameters_yaml
function
from the hqs_nmr_parameters
package:
from hqs_nmr_parameters import read_parameters_yaml
parameters = read_parameters_yaml("input_file.yaml")
The read_parameters_yaml
function can read the following keywords from a YAML file:
name
: An optional name of the molecule.shifts
: Chemical shifts in format {isotope: {index: value}}.j_couplings
/J-couplings
: J-coupling values in format {isotope1-isotope2: [[index1, index2, value], ...]}.count from
/count_from
: Specifies whether to count atomic indices starting from zero or from one.smiles
: SMILES string of the molecule (better enclosed in quotation marks).molfile
: Path to a Molfile with the molecular structure.molblock
: Compressed Molfile content (not in clear text).temperature
: Temperature in K.solvent
: Name of the solvent.description
: Additional further description.
parameters
is an instance of the Pydantic MolecularData
class.
Datatypes
Molecular NMR parameters, spectra and calculation results are stored in Python objects in the HQS NMR Tool. These data structures are all Pydantic
data classes and the data can be accessed as attributes of the class. See NMRResultSpectrum1D
for an example.
Note that for the full signature of each datatype you can check out the API-documentation. Here we will just point out the most important attributes of each class.
NMRResultSpectrum1D
The NMRResultSpectrum1D
is used to store all of the input and output of a NMR spectrum calculation when using the calculate_spectrum routine. It comprises:
-
The molecular NMR parameters of datatype
NMRParameters
used for the calculation (molecule_parms
). -
All the parameters handed to
calculate_spectrum
in form of aNMRCalculationParameters
object (calculation_parms
). -
The calculated spectrum of datatype
NMRSpectrum1D
(spectrum
).
Assuming we have obtained an object called result_spectrum
of datatype NMRResultSpectrum1D
the attributes can be accessed as follows:
molecule_parms = result.molecule_parms
calculation_parms = result.calculation_parms
spectrum = result.spectrum
In the following we will elaborate on the datatype of each of these objects in more detail.
NMRResultGreensFunction1D
The NMRResultGreensFunction1D
is the analog datatype to NMRResultSpectrum1D
when using calculate_greens_function instead of calculate_spectrum. The only difference is that instead of the attribute spectrum
it has the attribute greens_function
, which stores a NMRGreensFunction1D
datatype.
Assuming we have an object called result_greens_function
of datatype NMRResultGreensFunction1D
the greens_function
attribute can be accessed as follows:
greens_function = result_greens_function.greens_function
NMRParameters
The NMRParameters
datatype is defined in the hqs_nmr_parameters
repository and holds the reduced set of molecular parameters required for an NMR calculation,
The class holds:
- A list of chemical
shifts
in ppm for every nucleus in the system that has NMR parameters. - A List of isotopes for every nucleus, in the same ordering as
shifts
. TheNamedTuple
Isotope
has the two attributesmass_number
andsymbol
. - A list containing pairs of atomic indices and the associated J-coupling values. The atomic indices refer to the ordering in the
shifts
andisotopes
lists.
For an example you may run the following code:
from hqs_nmr_parameters.examples import molecules
# Obtain example molecule.
nmr_parameters = molecules["C10H7Br"].spin_system()
print(type(nmr_parameters)) # NMRParameters
print(nmr_parameters.shifts)
print(nmr_parameters.isotopes)
print(nmr_parameters.j_couplings)
NMRParameters
objects are used as an input when calculating a spectrum using the calculate_spectrum or calculate_greens_function method.
NMRCalculationParameters
The NMRCalculationParameters
datatype is defined in the hqs_nmr
repository and stores all possible customization options when performing spectra calculations with calculate_spectrum
or calculate_greens_function
.
The only parameter the user always has to specify is the spectrometer frequency in MHz, so the most simple and often already sufficient instantiation of this datatype may look as follows:
from hqs_nmr.datatypes import NMRCalculationParameters
calculation_parms = NMRCalculationParameters(frequency_MHz=500)
Another important option the user can set is the homoisotope
, which is the Isotope specified as Isotope(mass, symbol)
to define the frequency of the rotating frame. Furthermore, the object contains multiple attributes allowing to improve the resolution. For a detailed explanation of the available customization options check out the tutorial notebooks or take a look at the API-documentation.
NMRSolverSettings
The NMRSolverSettings
datatype is also defined in the hqs_nmr
repository and stores customization options specific for the solver backend. It is stored as an attribute of a NMRCalculationParmameters
object and typically does not need to be altered.
It is however important to know that by default a spin-dependent clustering of the molecule into overlapping clusters is performed (for details check here). This is an extremely accurate approximation especially at high field, however in some cases this might not be wanted. In these cases you can set the attribute solve_exactly=True
, such that the spectrum of the entire molecule is calculated at once and symmetries will be exploited where applicable, but no approximations are made. Note that this can lead to a significant increase in runtime, so typically it is more advisable to increase the size of the clusters instead. To do so the attribute max_cluster_size
has to be specified:
from hqs_nmr.datatypes import NMRSolverSettings
solver_settings = NMRSolverSettings(max_cluster_size=16)
To see all customization options check the API-documentation.
NMRSpectrum1D
The NMRSpectrum1D
datatype holds an NMR spectrum comprising the frequencies in ppm (omegas_ppm
), an array with the individual spin contributions to the spectrum (values
), the full-width-half-maximum in ppm (fwhm_ppm
) and the reference energy (reference_energy
) to allow easy conversion of attributes from ppm into other units.
Assuming we have performed a NMR spectrum calculation using calculate_spectrum and obtained as output an NMRResultSpectrum1D
object we called result_spectrum
, we can obtain an object of datatype NMRSpectrum1D
as an attribute of this object:
spectrum = result_spectrum.spectrum
The values
are a numpy array where each row holds the contribution of the corresponding spin, i.e.:
spectrum.values[0,:]
holds the contribution of the first spin (index 0). The total spectrum that would be measured experimentally can be calculated using
import numpy as np
np.sum(spectrum.values, axis=0)
NMRGreensFunction1D
The NMRGreensFunction1D
datatype basically has the same attributes as NMRSpectrum1D
, however values
is a complex array, as it stores the full NMR Green's function and not just the spectral function. This datatype is typically only used, if one wants to perform post-processing steps that require also the real part of the Green's function.
Assuming we have performed a NMR Green's function calculation using calculate_greens_function and obtained as output an NMRResultGreensFunction1D
object called result_greens_function
, we can obtain an object of datatype NMRGreensFunction1D
as an attribute of this object:
greens_function = result_greens_function.greens_function
The total spectrum can then be obtained as follows:
import numpy as np
np.sum(- np.imag(greens_function.values), axis=0)
Serialization (Saving and Loading)
The datatypes described above have a common interface for data serialization. There exists a function called to_json
common to all classes that can take a class object and a file name (with or without the full path) to store the data as a JSON
file. For example, if we have an object called spectrum
of the data class NMRSpectrum1D
and want to save it as spectrum_data.json
:
from hqs_nmr.datatypes import to_json
to_json(spectrum, "spectrum_data.json")
There is also the inverse method from_json
to load the data again. It needs as additional information the type of class the JSON
file has stored, e.g.:
from hqs_nmr.datatypes import from_json, NMRSpectrum1D
spectrum = from_json(NMRSpectrum1D, "spectrum_data.json")
NMR spectra calculations
calculate_spectrum
For most use cases the calculate_spectrum
function should be sufficient to perform NMR spectra calculations. By default, it uses the clustering approach discussed in the solver chapter. This approach is exact for molecules smaller than the specified maximum cluster size (set by default to 12) and for larger systems still extremely accurate, while also being very fast.
For a quick introduction to this method you can just read on, however, for a proper walkthrough of all the customization options check out the example notebooks.
calculate_spectrum
takes as input an object of datatype NMRParameters
, which stores the molecular isotopes, shifts and J-coupling values, and a second of type NMRCalculationParameters
, which stores all parameters necessary to specify a NMR spectrum calculation.
You can calculate the spectrum for one of the example molecules simply as follows:
from hqs_nmr.calculate import calculate_spectrum
from hqs_nmr.datatypes import NMRCalculationParameters
from hqs_nmr_parameters.examples import molecules
# Obtain example molecule of datatype NMRParameters.
molecule_parms = molecules["C10H7Br"].spin_system()
# Define the calculation parameters.
calculation_parms = NMRCalculationParameters(frequency_MHz=500)
# Calculate the spectrum.
nmr_result = calculate_spectrum(
molecule_parms,
calculation_parms
)
nmr_result
is now an object of datatype NMRResultSpectrum1D
and stores the calculated spectrum as well as the input to the calculation. The spectrum could now be plotted as follows:
import numpy as np
import matplotlib.pyplot as plt
summed_spectrum = np.sum(nmr_result.spectrum.values, axis=0)
plt.plot(nmr_result.spectrum.omegas_ppm, summed_spectrum, linewidth=0.3, label="spectrum")
plt.title("C10H7Br")
plt.xlabel("$\\delta$ [ppm]")
plt.legend()
plt.show()
calculate_greens_function
The calculate_greens_function
method behaves in the exact same way as calculate_spectrum
with the only difference that the full Green's function is calculated instead of just the spectral function. It returns an object of datatype NMRResultGreensFunction1D
. To get the full spectrum one can simply call:
from hqs_nmr.calculate import calculate_greens_function
nmr_result = calculate_greens_function(
molecule_parms,
calculation_parms
)
summed_spectrum = np.sum(
- np.imag(nmr_result.spectrum.values), axis=0
)
However, typically the need to calculate the Green's function only arises, if some sort of post-processing needs to be performed.