Introduction

The HQS NMR Tool is the first package in the HQStage module HQS Spectrum Tools. With the HQS NMR Tool you can calculate and analyze NMR spectra for molecules or other spin systems. Given the NMR parameters of a molecule, HQS Spectrum Tools provides a set of solvers via a python API to calculate the NMR spectrum. RUST and C++ components are used internally to achieve remarkable speed and accuracy, even for complex, large molecules.

Fastest NMR Spectrum Solver

“One of the fastest Hilbert-space NMR simulation tools I have ever seen, with a remarkably efficient frequency-domain implementation.” - Ilya Kuprov, Professor of Physics, University of Southampton

The HQS NMR Tool is also an integral part of HQSpectrum, our end-to-end solution for NMR Spectra analysis. If you are interested in using this cloud service please request access by sending an email to hqspectrum@quantumsimulations.de.

NMR spectra prediction is also a very promising use case for the quantum computer. To explore this possibility, we invite you to take a look at the HQS Qorrelator App.

If you are interested in the theory behind NMR or the math of how to calculate NMR spectra, take a look at the background chapter.

Applications

Nuclear magnetic resonance (NMR) spectroscopy is a key analytical tool in chemistry and related disciplines. Its broad application not only facilitates the identification of molecules, but also provides intricate details about their structure, dynamics, and chemical environment. The ability to compare an experimental NMR spectrum with theoretical predictions for various compounds is crucial for the accurate identification and characterization of chemical entities. T

Our NMR spectrum solvers are designed for both speed and precision. They are based on a very efficient implementation in the frequency domain. With a variety of solver strategies at your disposal, you can leverage symmetries and clustering techniques either fully automated or customized according to your specifications. This flexibility allows you to tailor your approach based on the specific characteristics of your system, optimizing performance without sacrificing accuracy.

Getting started

To use HQS NMR Tool you need a running version of HQStage. You can check here how to set it up. Once you have HQStage correctly configured you can install the HQS NMR Tool using the command:

hqstage install hqs-nmr

For some of the functionalities of the HQS NMR Tool (especially the provided example NMR parameters), the RDKit cheminformatics package is required. It is strongly recommended to install it via the conda-forge channel to ensure the access to its latest version. Additionally it is highly recommended to install the Intel oneAPI Math-Kernel-Library (MKL) as otherwise performance will be critically impaired. However, these packages have to be installed manually. The easiest way to install these libraries is running the following command in an active conda environment:

conda install rdkit mkl

Alternative ways to install the required libraries

If you do not have conda available you can also install RDKit via pip. This is not recommended as currently only an older version of RDKit is supported via pip and could cause issues in the future. Also it requires a numpy version smaller numpy 2.0:

pip install "numpy<2.0"
pip install rdkit

To install the MKL you basically have three options:

You can use HQStage to install the MKL into the currently active virtual environment.
```
hqstage install mkl
```
You can manually provide a version of the MKL by making sure that the file libmkl_rt.so is found by the dynamic linker. That means, that a system-wide installation should be found automatically.

You can install the MKL via pip. However, you need to create a symlink for the file libmkl_rt.so. The commands below perform the necessary steps.

pip install mkl
PLATLIB_DIR="$(python -c "import sysconfig; print(sysconfig.get_path('platlib'))")"
ln -fs libmkl_rt.so.2 "${PLATLIB_DIR}/../../libmkl_rt.so"

The following code snippet can be used to create your first program in HQS NMR Tool.For an expanded collection of examples please see the examples section.

from hqs_nmr.calculate import calculate_spectrum
from hqs_nmr.datatypes import NMRCalculationParameters
from hqs_nmr_parameters.examples import molecules

import numpy as np
import matplotlib.pyplot as plt

# Obtain example molecule of datatype NMRParameters.
molecule_parameters = molecules["C3H8"].spin_system()

# Define the calculation parameters. The only required field is the magnetic
# field in Tesla
calculation_parameters = NMRCalculationParameters(field_T=11.7433)

# Calculate the individual spin contributions of the spectrum.
nmr_result = calculate_spectrum(
    molecule_parameters,
    calculation_parameters
)

# Sum up the individual spin contributions.
spectrum = np.sum(nmr_result.spectrum.spin_contributions, axis=0)

# Plot the spectrum.
plt.plot(nmr_result.spectrum.omegas_ppm, spectrum, linewidth=0.3)
plt.title("500 MHz, Propane")
plt.xlabel("$\\delta$ [ppm]")
plt.ylabel("$Intensity \\, [a. u.]$")
plt.savefig("propane_500MHz_NMR_spectrum.png", dpi=2000)
plt.show()

Executing the code snippet using python should result in the following plot:

Please note that HQS Spectrum Tools is currently only supported on Linux.

Features

Frequency domain-based fast and precise NMR spectra calculation.
Automatic implementation to compute accurate 1D NMR spectra with minimal truncation errors for large numbers of spins, featuring linear scaling with system size.
NMR example parameters (chemical shifts and J-couplings) for molecules of different sizes, accessible via data classes containing structural information.
Custom NMR parameters can be specified conveniently for NMR spectrum simulation.
Easy construction of an NMR Spin Hamiltonian from provided or custom parameters using the struqture-py package.
Extensive customization options of the NMR solver for expert users.
Postprocessing methods to analyze simulated spectra and compare them to experiment.

You can get an overview of the HQS NMR Tool Python library in the components chapter or take a look at the API-documentation.

Examples

The best way to learn about HQS NMR Tool is to go through our example notebooks. You can find information on how to download and run the examples in the basic usage section of our HQStage documentation. The following examples are available:

1_getting_started: Introduces the main routines of HQS NMR Tool. These are mostly automated and include all you need to simulate NMR spectra.
2_customization: Explains customization options, e.g., to decrease runtime or improve resolution by discussing the solver strategies employed. This includes the clustering approach and the frequency-based implementation.
3_spin_lattice_models: Showcases how to extend the provided functionality to spin lattice models.
4_high_symmetry_molecules: Details the special case of a highly symmetric, strongly coupled molecule where the clustering approach fails and how to obtain the correct result.
5_user_defined_solver: Allows to add a custom solver to the framework provided by the HQS NMR Tool.
6_struqture_nmr_hamiltonian: Elaborates on the clustering methods in more detail and how to use it to obtain cluster Spin Hamiltonians in a struqture format.

Background

This chapter provides an overview of the fundamentals behind the HQS NMR Tool.

In section NMR, we describe the basics of nuclear magnetic resonance (NMR) from an experimental and theoretical perspective. We discuss, for example, the different relevant parameters entering an NMR calculation, such as the chemical shifts or couplings between nuclear spins, and introduce the relevant Hamiltonian of a molecule for NMR.

In Mathematical Background, we describe in greater detail the math behind calculation of spectra in the HQS NMR Tool.

Finally, in the section Calculating NMR Spectra, we discuss approaches for evaluating NMR spectra of large molecules.

NMR

Nuclear magnetic resonance (NMR) spectroscopy is one of the most important analytical techniques in chemistry and related fields. It is widely used to identify molecules, but also to obtain information about their structure, dynamics, and chemical environment.

For a detailed overview to the field there are many suitable textbooks like High Resolution NMR Techniques in Organic Chemistry by T. Claridge or Understanding NMR Spectroscopy by J. Keeler. However, in the following we will only summarize some fundamental aspects of NMR which should be sufficient to get you started to use HQS NMR Tool.

NMR spectrometers place the sample into a strong, but constant magnetic field and use a weak, electromagnetic pulse to perturb the nuclei. At or near resonance, when the oscillation frequency matches the intrinsic frequency of a nucleus, the system responds by producing an electromagnetic signal with a frequency characteristic of the magnetic field at the respective nucleus.

The NMR Hamiltonian used to simulate NMR spectra as well as the effects leading to the intrinsic frequency of a nucleus are discussed below, while the origin of the electromagnetic signal as well as a possible experimental setup are discussed in detail in section Measurement of a 1D NMR spectrum.

Zeeman interaction

For the simulation of NMR spectra for molecules, the most important part is the Zeeman effect describing the interaction of the nuclear spin with the external magnetic field $B$ . The corresponding Hamiltonian can be written as

$\hat{H}_{Z} = - γ B \cdot \hat{S},$

where $γ$ is the gyromagnetic ratio, the ratio of a system's magnetic moment to its angular momentum, and $\hat{S} = (\hat{S}^{x}, \hat{S}^{y}, \hat{S}^{z})^{T}$ is the total spin operator. Assuming that the strong (and constant) magnetic field is in $z$ -direction only, meaning $B = (0, 0, B^{z})$ , the Hamiltonian simplifies to

$\hat{H}_{Z} = - γ B^{z} \hat{S}^{z} = ω_{0} \hat{S}^{z},$

where $ω_{0} = - γ B^{z}$ is the so-called Larmor frequency, the angular frequency corresponding to the precession of the spin magnetization around the magnetic field at the position of the nucleus.

As an example, consider the simplest nucleus, ¹H, consisting of only one proton, for which the gyromagnetic ratio is $γ /2 π = 42.6$ MHz T⁻¹, meaning that a 500 MHz NMR spectrometer has a static magnetic field of about 11.7 Tesla. The energy of radiation of the Larmor frequency $ν = 500$ MHz ( $h ν \approx 3.3 \cdot 1 0^{- 25}$ J) is several orders of magnitude smaller than the average thermal energy of a molecule at a temperature of $T = 298$ K ( $k_{B} T \approx 4.1 \cdot 1 0^{- 21}$ J). Therefore, the occupations of the spin states are almost equal at room temperature, only a small surplus is responsible for the sample magnetization.

Chemical shift

Perhaps the most important aspect for NMR spectroscopy in chemistry is that the nuclei in molecules are shielded against the external magnetic field by the electrons surrounding them. This can be expressed by adding a correction term to the Hamiltonian as

$\hat{H}_{Z} = - γ (1 - σ) B \cdot \hat{S},$

where $σ$ is referred to as the shielding tensor quantifying the change in the local magnetic field experienced by the nucleus in the molecule relative to a bare nucleus in vacuum. A significant simplification occurs if the molecules of interest are in solution, or in liquid phase in general, as they can rotate freely and only the isotropic chemical shift $σ = \frac{1}{3} Tr (σ)$ is of interest,

$\hat{H}_{Z} = - γ (1 - σ) B^{z} \hat{S}^{z} .$

In practice, chemical shifts are normally used instead of chemical shieldings: instead of invoking the Larmor frequency of a nucleus in vacuum, shifts are defined with respect to the resonance frequency $ν_{ref}$ of a reference compound:

$δ = \frac{ν - ν _{ref}}{ν _{ref}} \approx σ_{ref} - σ,$

The standard reference for ¹H-NMR is the Larmor frequency of the protons in TMS [tetramethylsilane, Si(CH₃)₄]. Chemical shifts are normally reported on a scale of ppm (parts per million): most ¹H chemical shifts are observed in the range between 0 and 10 ppm, and most ¹³C chemical shifts between 0 and 200 ppm. Since a constant shift of the form $γ σ_{ref} B^{z} \hat{S}$ leaves the spectrum unchanged up to a scaling factor and the scale of chemical shieldings is so small in absolute terms $(∣ σ ∣ ≪ 1)$ , for practical intents and purposes the chemical shift can be substituted directly into the Hamiltonian:

$\hat{H}_{Z} = - γ (1 + δ) B^{z} \hat{S}^{z} .$

Spin-spin coupling

Up to this point, the nuclear spins have been regarded to be isolated from each other. However, their magnetic moments have an effect on neighboring spins. The interaction of the nuclear spins can happen through two different mechanisms. The first one is the direct (or through space) spin-spin coupling, where the interaction strength depends on the distance of the two nuclei and the angle of their distance vector relative to the external field. As it comes from the direct interaction of two magnetic dipoles, it is also referred to as dipolar coupling. However, the effect is generally not observable in liquid phase since the free rotation of the molecules averages over all orientations and thus results in a vanishing average coupling.

An effect observable in the NMR spectrum is indirect spin-spin coupling, which is mediated by the electrons of a chemical bond. Due to the Pauli principle, the electrons of a covalent bond always have an anti-parallel spin orientation, and one electron will be closer to one nucleus than to the other, preferring an anti-parallel orientation with the nearby nucleus. Depending on the number of electrons involved in the transmission of the interaction, either a parallel or an anti-parallel orientation of two nuclei may result in a lower energy. Importantly, this interaction does not average out in solution since it mainly depends on the electron density at the position of the nucleus and not on the orientation of the distance vector relative to the field, which is why it is also referred to as scalar coupling. Since only s-orbitals have a finite electron density at the nucleus, the coupling depends on the electron density in those orbitals alone.

The interaction Hamiltonian in the case of homonuclear coupling is given by

$\hat{H}_{J} = 2 π ℏ J \hat{I}_{1} \cdot \hat{I}_{2},$

where $\hat{I} = ℏ^{- 1} \hat{S}$ .

It should be noted that the $J$ -coupling tensor is a real $3 \times 3$ matrix that depends on the molecular orientation, but in liquid phase only its isotropic part $J$ is observed due to motional averaging. Typical $J$ -coupling strengths between protons in ¹H-NMR amount to a few Hz.

NMR spin Hamiltonian for molecules in liquid phase

The spin Hamiltonian in a static magnetic field in frequency units (rad s⁻¹) is given by

$\hat{H} = ℏ^{- 1} (\hat{H}_{Z} + \hat{H}_{J}) = - k \sum γ_{k} (1 + δ_{k}) B^{z} \hat{I}_{k}^{z} + 2 π k < l \sum J_{k l} \hat{I}_{k} \cdot \hat{I}_{l},$

where the sum runs over all nuclear spins of interest.

There are several interactions that have not been taken into account here. As already mentioned, the direct dipolar spin-spin interaction vanishes in liquids due to motional averaging. Beyond dipolar coupling, such as quadrupolar interactions, for instance, are relevant only for nuclei with spin quantum number $I \geq 1$ . Furthermore, interactions with unpaired electrons need a special treatment as well. While most organic compounds are diamagnetic (closed-shell), paramagnetic NMR also exists.

Mathematical Background

The Hamiltonian $\hat{H}$ we are using for the NMR systems is of the form

$\hat{H} = - l \sum γ_{l} (1 + δ_{l}) B^{z} \hat{I}_{l}^{z} + 2 π k < l \sum J_{k l} \hat{I}_{k} \cdot \hat{I}_{l},$

where $γ_{ℓ}$ are the gyromagnetic factors and $δ_{ℓ}$ the chemical shifts of nuclear spin $ℓ$ . $J_{k l}$ denotes the coupling between spins $k$ and $l$ and $\hat{I}^{α} = \hat{S}^{α} /ℏ$ where the $\hat{S}^{α}$ are the usual SU(2) spin operators with $α \in {x, y, z}$ .

Note in the following, whenever we denote a spin operator as $\hat{I}^{α}$ without an additional site index, we mean the sum over the individual spin operators for each nuclear spin $\hat{I}^{α} = \sum_{i} \hat{I}_{i}^{α}$ .

In NMR spectroscopy, there is a strong magnetic field $B^{z}$ in the $z$ -direction, and electromagnetic pulses / oscillating fields are applied to flip the spins into the $x / y$ -plane. $B^{z}$ is typically of the order of 500 MHz, the bandwidth of the pulses is about 10 kHz, and the required resolution is less than 1 Hz.

The spectrum measured in an NMR experiment corresponds to the spectral function calculated in the HQS NMR Tool.

Measurement of a 1D NMR spectrum

A 1D NMR spectrum can be obtained using a pulse-acquire experiment. To discuss how to calculate it, we will first elaborate on how the experiment is performed. For this, after a probe has been placed inside an NMR spectrometer, a strong magnetic field is applied along the $z$ -axis, which leads to a net magnetization along the same axis. Note that at room temperature this magnetization is typically quite small. One then applies a $π /2$ -pulse along the $x$ -axis which flips the magnetization to the minus $y$ -direction. The spins then start to precess in the $x y$ -plane, which creates a signal that can be recorded by measuring the magnetic field along the $x$ - or $y$ -axis. The induced magnetic field $B^{x y}$ is directly linked to the magnetization $M$ of the sample as

$B^{x y} (t) = μ_{0} M^{x y} (t),$

which in turn can be linked to the magnetic moment $m$

$m^{x y} (t) = \int M^{x y} (t) d V = M^{x y} (t) V,$

where the integral goes over the volume $V$ of the sample and we are assuming a uniform magnetization within the sample to evaluate the integral. At the same time, the magnetic moment of an individual molecule is given as

$m^{x y} (t) = ⟨ k \sum γ_{k} \hat{S}_{k}^{x y} ⟩ (t) .$

Therefore, to calculate the spectrum, we simply have to calculate the time-dependent expectation value of the nuclear spin operators of each spin along the $x$ - or $y$ -direction and scale them with the corresponding gyromagnetic ratio. Note that in the actual implementation we perform the calculation using the dimensionless nuclear spin operator, so we explicitly calculate

$⟨ k \sum γ_{k} \hat{I}_{k}^{x y} ⟩ (t) = ℏ^{- 1} ⟨ k \sum γ_{k} \hat{S}_{k}^{x y} ⟩ (t) .$

The sum of these contributions is directly proportional to the measured magnetic field. To obtain the absolute measured magnetic field, an additional scaling would be required based on information on the sample volume and concentration of the molecule. However, this scaling is typically unnecessary as one is not interested in absolute values. So in HQS NMR Tool the spectrum is by default normalized to integrate to the number of relevant spins, i.e., to those of the same isotope type as the reference isotope.

Calculation of the spectrum

Time domain

Let us now discuss how to evaluate these expectation values. For this, we will only consider the field along the $x$ -direction. The time-dependent expectation value can be calculated using the density operator $\overset{ρ}{^} (t)$

$⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (t) = ⟨ γ_{k} \hat{I}_{k}^{x} \overset{ρ}{^} (t)⟩ .$

This implies that we need to find an expression for the time evolution of the density operator, which is described by the Liouville-von-Neumann equation

$\frac{d ρ ^}{d t} = \frac{i}{ℏ} [\hat{H}, \overset{ρ}{^}] .$

In this formulation the equation assumes a Hamiltonian in units of Joule, however the Hamiltonian as defined above is in units of radians per second, which allows us to drop the $ℏ$ factor. Therefore the time evolution can be written as

$\overset{ρ}{^} (t) = e^{- i \hat{H} t} \overset{ρ}{^} (t_{0}) e^{i \hat{H} t},$

which can easily be verified by reinserting this expression into the Liouville-von-Neumann equation. Note that $\hat{H}$ is the Hamiltonian of the process taking place within the time interval $[t_{0}, t]$ . Therefore we have to perform two time evolutions: we start with the density operator at some time $t_{0}$ and perform a time evolution using the Hamiltonian associated with the $π /2$ pulse. Then we perform a second time evolution using the NMR Hamiltonian, describing the precession of the spins in the $x y$ -plane.

Due to the setup of the experiment we can assume a hard (or instantaneous) pulse, meaning we will not be interested in the explicit time evolution during the pulse and can therefore write the time evolution operator of the pulse as

$\hat{P}_{π /2}^{x} = e^{- i \frac{π}{2} \hat{I}^{x}} .$

This leads to the following density operator after the pulse

$\overset{ρ}{^} (t_{p}) = \hat{P}_{π /2}^{x} \overset{ρ}{^} (t_{0}) (\hat{P}_{π /2}^{x})^{†},$

where $t_{p}$ is the time duration of the pulse. Performing the second time evolution under the NMR Hamiltonian we arrive at

$\overset{ρ}{^} (t) = e^{- i \hat{H} t} \overset{ρ}{^} (t_{p}) e^{i \hat{H} t} = e^{- i \hat{H} t} \hat{P}_{π /2}^{x} \overset{ρ}{^} (t_{0}) (\hat{P}_{π /2}^{x})^{†} e^{i \hat{H} t} .$

We now just have to define the density operator at time $t_{0}$ . Due to the experimental setup it is simply given via the Boltzmann distribution

$\overset{ρ}{^} (t_{0}) = \frac{e ^{- β ℏ \hat{H}}}{Z}$

with the inverse temperature $β = \frac{1}{k _{B} T}$ , where $k_{B}$ is the Boltzmann constant and $T$ the temperature, as well as the partition function $Z$ . Note that the factor $ℏ$ is necessary as the Hamiltonian $\hat{H}$ is given in units of radians per second.

While HQS NMR tool provides a solver that evaluates a spectrum using the exact expression of the density operator, one should note that for typical NMR experiments it is sufficient to approximate it using a series expansion of the exponential

$\overset{ρ}{^} (t_{0}) = \frac{e ^{- β ℏ \hat{H}}}{Z} \approx \frac{1}{Z} (I - β ℏ \hat{H}) \approx \frac{1}{Z} - \frac{β ℏ B ^{z}}{Z} l \sum γ_{l} (1 + δ_{l}) \hat{I}_{l}^{z} .$

In the second approximation applied here, the second term is replaced by the magnetic field term of the Hamiltonian. This is valid as the contribution from the J-coupling term to the distribution is insignificant in comparison. Furthermore, the first term with the identity $I$ can be dropped in practical calculations as it does not contribute to the overall expectation value.

Note that in the NMR literature this step is often skipped and sometimes even $\hat{I}^{z}$ is called a density operator. This is technically incorrect as $\hat{I}^{z}$ is not positive semi-definite, a key requirement for a density operator. Furthermore, the prefactor $\frac{β ℏ B ^{z}}{Z}$ is often omitted, as it only serves as normalization.

However, with this approximation one can now further simplify the calculation using the identity

$\hat{P}_{α}^{x} \hat{I}^{z} (\hat{P}_{α}^{x})^{†} = e^{- i α \hat{I}^{x}} \hat{I}^{z} e^{i α \hat{I}^{x}} = cos (α) \hat{I}^{z} + i sin (α) [\hat{I}^{z}, \hat{I}^{x}],$

which in our case of a $π /2$ -pulse reduces to

$\hat{P}_{π /2}^{x} \hat{I}^{z} (\hat{P}_{π /2}^{x})^{†} = - \hat{I}^{y} .$

This leads us to the final expression for the expectation value

$⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (t) = ⟨ γ_{k} \hat{I}_{k}^{x} \overset{ρ}{^} (t)⟩ = ⟨ γ_{k} \hat{I}_{k}^{x} e^{- i \hat{H} t} \hat{P}_{π /2}^{x} \overset{ρ}{^} (t_{0}) (\hat{P}_{π /2}^{x})^{†} e^{i \hat{H} t} ⟩ \approx \frac{⟨ γ _{k} I ^ _{k}^{x} e ^{- i \hat{H} t} P ^ _{π /2}^{x} ( 1 - β ℏ B ^{z} \sum _{l} γ _{l} ( 1 + δ _{l} ) I ^ ^{z} ) ( P ^ _{π /2}^{x} ) ^{†} e ^{i \hat{H} t} ⟩}{Z} = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) ⟨ \hat{I}_{k}^{x} e^{- i \hat{H} t} \hat{I}_{l}^{y} e^{i \hat{H} t} ⟩ .$

Frequency domain

Up to now we have only discussed how to calculate the time signal, however what we are actually interested in is the spectrum given by its Fourier transform. Most implementations therefore first calculate a time evolution according to the derivation performed above and then do a Fourier transform of the result. However, as will be explained in the next section, it is much more efficient to do the Fourier transform analytically first and then perform all calculations in the frequency domain. To be able to do this, we have to rewrite the above formulation into a so called Lehmann representation

$⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (t) = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) ⟨ \hat{I}_{k}^{x} e^{- i \hat{H} t} \hat{I}_{l}^{y} e^{i \hat{H} t} ⟩ = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n \sum ⟨ n ∣ \hat{I}_{k}^{x} m \sum ∣ m ⟩ ⟨ m ∣ e^{- i \hat{H} t} \hat{I}_{l}^{y} e^{i \hat{H} t} ∣ n ⟩ = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n, m \sum ⟨ n ∣ \hat{I}_{k}^{x} ∣ m ⟩ ⟨ m ∣ \hat{I}_{l}^{y} ∣ n ⟩ e^{i (E_{n} - E_{m}) t},$

where $∣ m ⟩$ and $∣ n ⟩$ are the eigenvectors of the NMR Hamiltonian constructed in a many-body spin basis with the respective eigenvalues $E_{m}$ and $E_{n}$ . These can be obtained from numerical diagonalization using standard dense linear algebra routines. By introducing a small convergence aiding factor $η$ we can now do the Fourier transform

$⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (ω) = \int e^{i (ω + i η) t} ⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (t) d t = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n, m \sum ⟨ n ∣ \hat{I}_{k}^{x} ∣ m ⟩ ⟨ m ∣ \hat{I}_{l}^{y} ∣ n ⟩ \int e^{i (ω + i η + E_{n} - E_{m}) t} d t = \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n, m \sum \frac{⟨ n ∣ I ^ _{k}^{x} ∣ m ⟩ ⟨ m ∣ I ^ _{l}^{y} ∣ n ⟩}{i ( ω + i η + E _{n} - E _{m} )} = - i \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n, m \sum \frac{⟨ n ∣ I ^ _{k}^{x} ∣ m ⟩ ⟨ m ∣ I ^ _{l}^{y} ∣ n ⟩}{ω + i η + E _{n} - E _{m}} .$

This represents the final form we are evaluating, although for the spectrum we are measuring in NMR, typically just the real part is required. However for some postprocessing routines also the imaginary part is required, in which case we will refer to this as a Green's function.

Additionally, note that the factor $η$ leads to a broadening of the peaks, giving them generally the shape of a Lorentzian. This broadening has to be chosen to fit the expected resolution, as discussed in the next section.

Energy rediscretization

One problem that arises when simulating NMR spectra is that the peaks are usually very sharp, which means that resolving them numerically poses a challenge in of itself. Therefore, performing calculations in time domain typically requires a lot of time steps that have to be evaluated. Also in the frequency domain, discretizing the frequency axis with a very fine grid would be computationally ineffective. Therefore, we perform several rounds of computing the spectral function and use the result of each round to rediscretize the frequency grid for the next. We first calculate the spectral function with a linear grid in frequency space and a rather large artificial broadening $η$ . Based on this result we rediscretize the frequency space in equal weight partitions of the total spectral function. This gives us a new non-linear frequency grid with grid points accumulated around the spectral function peaks. We then repeat this process with a smaller broadening $η$ , which allows us to resolve sharper peaks. This way, we can iteratively reach the desired resolution.

Calculating NMR Spectra

While using the Lehmann representation for the expectation value

$⟨ γ_{k} \hat{I}_{k}^{x} ⟩ (ω) = - i \frac{β ℏ B ^{z}}{Z} γ_{k} l \sum γ_{l} (1 + δ_{l}) n, m \sum \frac{⟨ n ∣ I ^ _{k}^{x} ∣ m ⟩ ⟨ m ∣ I ^ _{l}^{y} ∣ n ⟩}{ω + i η + E _{n} - E _{m}}$

is useful to resolve sharp features in an NMR spectrum, it does not address the problem of the exponential scaling of the Hamiltonian dimension. Evaluating this expression in a brute force approach would imply diagonalizing a Hamiltonian of dimension $2^{M}$ where $M$ is the number of spins in the molecule. This would restrict one to around 10 spins on a modern laptop. Therefore, different strategies need to be employed to evaluate NMR spectra for larger molecules.

Symmetry

First one can use the fact that the NMR Hamiltonian always conserves the $I^{z}$ quantum number, which means that we can identify a block diagonal structure in the Hamiltonian, where each block can be diagonalized individually. While this is always possible for the standard NMR Hamiltonian, it only restricts the dimension of the largest block to $(M /2 M)$ which still grows rather quickly.

For some molecules one can also identify magnetically equivalent groups. Such a symmetry group is defined as a group of spins, where each individual spin has the same chemical shift and couples in the exact same way to the rest of the system. Identifying these groups is advantageous, as one can combine them into higher order spin representations, which allows to exploit the local SU(2) symmetry of these groups. As an example consider a propane molecule, which has eight hydrogen atoms. By identifying all symmetrically coupled groups in this molecule, the number of spins can be reduced to two. One being the CH₂ group which has a combined spin representation of one and the second being the two methyl groups each representing a spin 3/2 and adding up to a total spin representation of spin 3. As the corresponding operators $\hat{I}^{2}$ of these two combined spins commute with the Hamiltonian, we can construct a basis in which the Hamiltonian has again a block diagonal structure, however the individual blocks are even smaller, as when exploiting just the $I^{z}$ conservation.

While these symmetry considerations are exact and can lead to a reasonable reduction in computational effort, they eventually break down when going to larger and larger molecules. Therefore, also approximate methods have to be used.

Clustering methods

The main approximation method used in the HQS NMR Tool is based on the observation that we do not just evaluate the full sum over spin contributions to the spectrum at once, but rather determine the contribution of each individual spin separately. This allows us to identify an effective Hamiltonian for each spin and evaluate the spectral function for it independently from the other spin contributions. It turns out that a good approximation for the Hamiltonian for a specific spin contribution is typically given by simply identifying the cluster of spins most strongly coupled to the spin of interest. A good measure for this coupling is the following weight matrix, motivated from perturbation theory

$Δ_{k l} = \frac{J _{k l}^{2}}{( γ _{k} δ _{k} - γ _{l} δ _{l} ) B ^{z}}$

Here, $J_{k l}$ are the entries in the J-coupling matrix connecting spins with index $k$ and $l$ , $γ_{k / l}$ the gyromagnetic ratios, $δ_{k / l}$ the chemical shift values, and $B^{z}$ the magnetic field strength in $z$ -direction.

Especially at high field, this method is extremely accurate allowing to choose cluster sizes around 8–12 spins for basically all molecules.

Program components

The following components will enable you to quickly and conveniently run NMR spectrum calculations using the HQS NMR Tool Python library:

A convenience function for quick and easy spectrum calculations.
The molecule input to define parameters for NMR calculations.
Different datatypes to store the input and output of NMR calculations.

While these modules are typically all you need, you may also browse through the API-documentation to check out all available features. However, we recommend to first go through the example notebooks to get an idea of how the package is typically used.

NMR spectra calculations

calculate_spectrum

For most use cases the calculate_spectrum function should be sufficient to perform NMR spectra calculations. By default, it uses the clustering approach discussed in the solver chapter. This approach is exact for molecules smaller than the specified maximum cluster size (set by default to 12) and for larger systems still accurate and efficient.

For a quick introduction to this method you can just read on, however, for a proper walkthrough of all the customization options check out the example notebooks.

The calculate_spectrum function takes as input an object with the molecular data of the system of interest which can be of type MolecularData or NMRParameters. The latter is derived from the former using the spin_system method and only stores the isotopes, chemical shifts and J-coupling values. The second input argument of calculate_spectrum is of type NMRCalculationParameters and stores all parameters necessary to specify a NMR spectrum calculation. For more details on these objects check out the corresponding sections Molecule input and Data types.

As part of HQS NMR Tool a module containing example molecules is provided. You can calculate the spectrum for one of the example molecules simply as follows:

from hqs_nmr.calculate import calculate_spectrum
from hqs_nmr.datatypes import NMRCalculationParameters
from hqs_nmr_parameters.examples import molecules

# Obtain example molecule of datatype NMRParameters.
molecule_parameters = molecules["C3H8"].spin_system()

# Define the calculation parameters. The only required parameter is the
# magnetic field in Tesla.
calculation_parameters = NMRCalculationParameters(field_T=11.7433)

# Calculate the spectrum.
nmr_result = calculate_spectrum(
    molecule_parameters,
    calculation_parameters
)

nmr_result is now an object of datatype NMRResultSpectrum1D and stores the calculated spectrum as well as the input to the calculation. The spectrum can now be plotted as follows:

import numpy as np
import matplotlib.pyplot as plt

summed_spectrum = np.sum(nmr_result.spectrum.spin_contributions, axis=0)

plt.plot(nmr_result.spectrum.omegas_ppm, summed_spectrum, linewidth=0.3, label="spectrum")
plt.title("500 MHz, Propane")
plt.xlabel("$\\delta$ [ppm]")
plt.ylabel("$Intensity \\, [a. u.]$")
plt.savefig("propane_500MHz_NMR_spectrum.png", dpi=2000)
plt.show()

Note that we store the individual spin contributions rather than the summed spectrum in nmr_result.spectrum.spin_contributions. Hence we have to sum them before plotting the spectrum.

The result should look as follows:

Input structure for molecular data

In this section we will introduce the molecular data structure used as input in the HQS NMR Tool. It involves:

Chemical name and molecular formula
Chemical structures
Conditions of the experiment/simulation
NMR parameters

Some groups of molecules are already included in our internal data sets. In addition external data can be employed for setting up an NMR calculation.

In the following, the Pydantic classes related to molecular data as well as how to provide external data will be explained.

Representation in Python of molecular data structures

Molecular data

Inside the hqs-nmr-parameters package, the MolecularData class, which is derived from Pydantic's BaseModel, has been implemented to describe a molecule. It contains the following attributes:

name: The name of the molecule (just a simple string).
isotopes: List containing pairs of an atom index and the associated isotope. Atom indices are associated with the order in which atoms appear in the chemical structure representations, starting by index 0.
shifts: List containing pairs of an atom index and the associated chemical shift in ppm.
j_couplings: List containing pairs of atomic indices and the associated J-coupling values in Hz. Note that atom index pairs are unique: if a value is provided for an atom pair (k, l), then no value is provided for pair (l, k).
structures: Contains a dictionary with chemical structure representations. It accepts entries for a SMILES string ("SMILES"), a Molfile ("Molfile"), and an XYZ file ("XYZ"). Each value will correspond to a ChemicalStructure object. See below for a full explanation of the chemical structure representations and the atom numeration.
formula: The molecular formula of the corresponding molecule.
temperature: An optional temperature in K.
solvent: Name of the solvent. An empty string represents an unknown or undefined solvent, or the absence of a solvent.
description: Optional further information.
method_json: Stores a JSON serialization of computational method settings. An empty string indicates that the field is not applicable. Creating and interpreting the content is the responsibility of the user of the model.

This is best illustrated by an example from one of our available data sets:

# Import a MolecularDataSet
from hqs_nmr_parameters.examples import molecules

# Choose ethanol as an example from the available molecules
parameters = molecules["C2H5OH"]

print(type(parameters))
print("Chemical name:", parameters.name)
print("Molecular Formula:", parameters.formula)
print("Type(s) of representations:", parameters.structures.keys())
print("Information about solvent:", parameters.solvent)
print("Shifts:", parameters.shifts)
print("###")
print(parameters.description)

<class 'hqs_nmr_parameters.code.data_classes.MolecularData'>
Chemical name: Ethanol
Molecular Formula: C2H6O
Type(s) of representations: dict_keys(['SMILES'])
Information about solvent: chloroform
Shifts: [(3, 1.25), (4, 1.25), (5, 1.25), (6, 3.72), (7, 3.72), (8, 1.32)]
###
1H parameters for ethanol in CDCl3.
Shifts from: https://doi.org/10.1021/om100106e
J-couplings estimated.

In this case, we are getting information for the ethanol molecule, for which we have stored a SMILES representation (check below for details) and according to the description experimental ¹H-NMR parameters in chloroform.

NMR parameters subset

The best way to check the NMR data is getting only the information necessary to perform an NMR calculation, i.e., a subset of the parameters object containing only the attributes isotopes, shifts, and j_couplings. Those can be obtained using the spin_system method of MolecularData:

from pprint import pprint
from hqs_nmr_parameters.examples import molecules

parameters = molecules["C2H5OH"]
nmr_parameters = parameters.spin_system()

print(type(nmr_parameters))
pprint(nmr_parameters.model_dump())

<class 'hqs_nmr_parameters.code.data_classes.NMRParameters'>
{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H'), (1, 'H'), (1, 'H'), (1, 'H')],
 'j_couplings': [((0, 3), 7.0),
                 ((0, 4), 7.0),
                 ((1, 3), 7.0),
                 ((1, 4), 7.0),
                 ((2, 3), 7.0),
                 ((2, 4), 7.0)],
 'shifts': [1.25, 1.25, 1.25, 3.72, 3.72, 1.32]}

As we can see, nmr_parameters is an instance of the Pydantic NMRParameters class, where the atomic indices always start from 0. The total number of spins, determined from the number of chemical shifts, can be accessed with the nspins property. The NMRParameters class is used as input for several functions to calculate an NMR spectrum. To perform a ¹H-NMR simulation (default case), we can do the following providing the molecular data either as NMRParameters or directly as MolecularData object:

from hqs_nmr_parameters.examples import molecules
from hqs_nmr import calculate_spectrum, NMRCalculationParameters

parameters = molecules["C2H5OH"]
nmr_parameters = parameters.spin_system()
print(type(nmr_parameters))
print(nmr_parameters.nspins)
calculation_parameters = NMRCalculationParameters(field_T=9.395)

# Using parameters instead of nmr_parameters would also be valid
spectrum = calculate_spectrum(nmr_parameters, calculation_parameters)
print(type(spectrum))

<class 'hqs_nmr_parameters.code.data_classes.NMRParameters'>
6
<class 'hqs_nmr.datatypes.NMRResultSpectrum1D'>

For getting a better insight into NMRParameters and the output object NMRResultSpectrum1D have a look here.

Setting up a Spin Hamiltonian

With the molecular data or just the NMR parameters mentioned above, it is possible to set up the Spin Hamiltonian of the spin system relevant for an NMR simulation. This can be achieved by calling the hqs_nmr_parameters.nmr_hamiltonian function that returns the Spin Hamiltonian as defined in struqture. It contains the terms normally needed to represent the Spin Hamiltonian of a closed-shell organic molecule in solution:

Zeeman interaction of nuclei with the static magnetic field
isotropic chemical shifts
indirect spin-spin scalar couplings

Note that the units of the Spin Hamiltonian conform to the conventions in the field of NMR. In consequence, the Hamiltonian is in angular frequency units of rad s⁻¹. To convert to Hz, the Hamiltonian needs to be divided by 2π. To convert to energy units (Joule), the Hamiltonian needs to be multiplied with the reduced Planck constant ℏ.

The parameters argument in nmr_hamiltonian can accept either a MolecularData or an NMRParameters object. For example, an NMR Spin Hamiltonian with the example parameters for nitrobenzene at a field of 2.5 T can be set up as shown below:

from hqs_nmr_parameters import examples, nmr_hamiltonian

parameters = examples.molecules["C6H5NO2"]
hamiltonian = nmr_hamiltonian(parameters=parameters, field=2.5)

Furthermore, the following optional arguments can be set:

reference_isotope: Used to set up a relative Spin Hamiltonian for use within the rotating frame approximation. The Larmor frequency of the reference isotope is subtracted from the Zeeman terms of all nuclei. The default value is None, i.e., the full Hamiltonian will be set up instead.
reference_shift: Shift in ppm for the reference frequency in the rotating frame approximation. It can help to evaluate the Hamiltonian in the correct range of frequencies. Default: 0.0.
gyromagnetic_ratios: Gyromagnetic ratios in rad s⁻¹ T⁻¹ of relevant isotopes. Default: pre-saved values.

As mentioned above, the reference_isotope option should be set to construct the Spin Hamiltonian in the rotating frame approximation. For example, using the Larmor frequency of ¹H with a shift of 0 ppm as the reference frequency (default), a Spin Hamiltonian is obtained via:

from hqs_nmr_parameters import examples, nmr_hamiltonian

parameters = examples.molecules["C6H5NO2"]
hamiltonian_rf = nmr_hamiltonian(
    parameters=parameters,
    field=2.5,
    reference_isotope=(1, "H"),
)

For a more detailed introduction on how to set up and work with a Spin Hamiltonian, please have a look at the example notebooks, especially to number 6.

Isotopes/atoms selection

MolecularData objects can contain data for several isotopes. However, we are often interested in creating a spin system for a given nucleus only. For example, if we want to simulate a ¹H-NMR spectrum but we also have ¹³C-NMR parameters in our data, it is possible either to select only ¹H using keep_nuclei(isotopes=["1H"]) or to drop ¹³C using drop_nuclei(isotopes=["13C"]). Let us take a look at this in more detail:

from pprint import pprint
from hqs_nmr_parameters.examples import molecules

parameters = molecules["CH3Cl_13C"]
pprint(parameters.isotopes)

drop_C = parameters.drop_nuclei(isotopes=["13C"])
keep_H = parameters.keep_nuclei(isotopes=["1H"])
print(drop_C == keep_H)

nmr_parameters_1H = drop_C.spin_system()
pprint(nmr_parameters_1H.model_dump())

[(0, Isotope(mass_number=13, symbol='C')),
 (2, Isotope(mass_number=1, symbol='H')),
 (3, Isotope(mass_number=1, symbol='H')),
 (4, Isotope(mass_number=1, symbol='H'))]
True
{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H')],
 'j_couplings': [((0, 1), -10.8), ((0, 2), -10.8), ((1, 2), -10.8)],
 'shifts': [3.05, 3.05, 3.05]}

We should take into account that the dropping/keeping of isotopes needs to be done carefully. The ¹³C nucleus has a nuclear spin of 1/2 (like ¹H) and is NMR-activate but has a very low natural abundance, and the ¹³C-¹H coupling pattern is only barely visible in ¹H-NMR spectra. Therefore, it is common to simulate only ¹³C-decoupled ¹H-NMR spectra. However, the coupling of ¹H with other isotopes such as ¹⁹F or ³¹P should not be ignored in order to get the correct peak pattern.

keep_nuclei can also take an atoms argument, which accepts a list of atom indices for which the NMR parameters are to be kept. In the case of drop_nuclei, specified nuclei can be dropped via atoms, but a keep_atoms argument prevents certain atoms from being dropped (useful in combination with isotopes).

Structure representations

As shown above a MolecularData object has the attribute structures, which is a dictionary that can accept three entries (keys): "SMILES", "Molfile" and "XYZ". The value of each of these entries is a ChemicalStructure object that stores chemical structure representations of the type defined in the key:

SMILES (Simplified Molecular-Input Line-Entry System): Strings representing the connectivity of all non-hydrogen atoms in a molecule. They become laborious to understand for complex molecules.
Molfiles: Text files with a two-dimensional (2D) structure of a molecule (skeletal representation). It may omit hydrogen atoms. The number of hydrogen atoms is inferred from the atomic valencies of the heavy atoms.
XYZ files: Text files containing an integer with the total number of atoms in the first line, followed by a comment line, and then the chemical element symbols and three-dimensional atomic positions for all the atoms explicitly.

Connectivity and charge information can be extracted from SMILES strings or Molfiles, but not from XYZ files. More information about Molfiles and SMILES can be found here.

The attributes of the ChemicalStructure class are:

representation: Type of chemical structure representation present in the content attribute. It can be "SMILES", "Molfile" or "XYZ".
content: The raw structure representation (as a string), i.e., a SMILES string or the content of an XYZ file or a Molfile.
charge: Total net electric charge of the molecule.
symbols: List of chemical element symbols for the full set of atoms.
atom_map: As we have seen, 2D representations can omit hydrogen atoms, this attribute contains information about how the full representation would map into this reduced representation. It is a list of the length of the molecule (full set of atoms), where the atom counting starts from zero and takes into account:
- Each non-hydrogen atom maps onto the respective index in the reduced representation.
- Each explicit hydrogen maps onto the index of its explicitly represented counterpart.
- Each implicit hydrogen maps onto the reduced index of the respective backbone atom.

In the case of ethanol, a full structures representation may look like the following. It might look a bit crowded as the content attribute contains the full content of the corresponding files (or SMILES string):

from hqs_nmr_parameters import ChemicalStructure

ethanol_representation = {
"XYZ": ChemicalStructure(representation="XYZ", content="9\n\nC       -0.88708900     0.17506400    -0.01253500\nC        0.46048900    -0.51551600    -0.04653500\nO        1.44296500     0.30726700     0.56557200\nH       -0.84747800     1.12776800    -0.55081700\nH       -1.65878200    -0.45332700    -0.46584200\nH       -1.17694400     0.40367600     1.01830600\nH        0.76871200    -0.72432800    -1.07546000\nH        0.41948600    -1.46207300     0.50017700\nH        1.47864000     1.14146800     0.06713500\n", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 3, 4, 5, 6, 7, 8]),
"Molfile": ChemicalStructure(representation="Molfile", content="\n     RDKit          2D\n\n  0  0  0  0  0  0  0  0  0  0999 V3000\nM  V30 BEGIN CTAB\nM  V30 COUNTS 3 2 0 0 0\nM  V30 BEGIN ATOM\nM  V30 1 C -1.299038 -0.250000 0.000000 0\nM  V30 2 C 0.000000 0.500000 0.000000 0\nM  V30 3 O 1.299038 -0.250000 0.000000 0\nM  V30 END ATOM\nM  V30 BEGIN BOND\nM  V30 1 1 1 2 CFG=3\nM  V30 2 1 2 3 CFG=3\nM  V30 END BOND\nM  V30 END CTAB\nM  END\n", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 0, 0, 0, 1, 1, 2]),
"SMILES": ChemicalStructure(representation="SMILES", content="CCO", charge=0, symbols=["C", "C", "O", "H", "H", "H", "H", "H", "H"], atom_map=[0, 1, 2, 0, 0, 0, 1, 1, 2])
}

Each of the entries corresponds to a ChemicalStructure object:

from pprint import pprint

smiles_representation = ethanol_representation["SMILES"]
print(type(smiles_representation))
pprint(smiles_representation.model_dump())

<class 'hqs_nmr_parameters.code.data_classes.ChemicalStructure'>
{'atom_map': [0, 1, 2, 0, 0, 0, 1, 1, 2],
 'charge': 0,
 'content': 'CCO',
 'representation': 'SMILES',
 'symbols': ['C', 'C', 'O', 'H', 'H', 'H', 'H', 'H', 'H']}

ethanol_representation["Molfile"].atom_map and ethanol_representation["SMILES"].atom_map will return:

[0, 1, 2, 0, 0, 0, 1, 1, 2]

In both cases, only the carbon and the oxygen atoms have been indicated explicitly. Therefore, the first three atoms are listed expressly (carbon atoms 0 and 1, and oxygen atom 2, i.e., the atom counting starts from zero) and the six following hydrogens are assigned to one of those backbone atoms. The full atomic indices associated with each reduced index can be obtained using the inverted_map method:

from pprint import pprint

smiles_representation = ethanol_representation["SMILES"]
print(smiles_representation.atom_map)
print(smiles_representation.inverted_map)

[0, 1, 2, 0, 0, 0, 1, 1, 2]
[{0, 3, 4, 5}, {1, 6, 7}, {8, 2}]

Hence, C (index = 0) is connected to H atoms 3, 4, 5, while C (index = 1) is connected to H atoms 6, 7, and the O (index = 2), to H 8.

However, ethanol_representation["XYZ"].atom_map will return:

[0, 1, 2, 3, 4, 5, 6, 7, 8]

Since all the atoms are explicitly provided in an XYZ format.

The ChemicalStructure class can also yield the chemical formula in Hill notation using the formula method. smiles_representation.formula will return:

'C2H6O'

Serialization (saving in a JSON file) of a `MolecularData` object and deserialization

We can save a MolecularData instance in a JSON file for future uses. To do so, employ the write_file method of the MolecularData class:

from hqs_nmr_parameters.examples import molecules

parameters = molecules["C2H5OH"]
parameters.write_file("etoh_molecular_data.json")

This JSON file can easily be loaded and validated against the MolecularData model using the read_file method:

from hqs_nmr_parameters import MolecularData

loaded_parameters = MolecularData.read_file("etoh_molecular_data.json")
print(type(loaded_parameters))

<class 'hqs_nmr_parameters.code.data_classes.MolecularData'>

Data sets

Molecular data for a group of molecules can be collected using the MolecularDataSet Pydantic class, which contains instances of type MolecularData. It is composed of two attributes:

description: Summary of the content of the data set.
dataset: Dictionary where the keys are identifiers for the molecules (e.g., molecule names) and the values correspond to a MolecularData object per molecule.

A list with the keys of the dataset can be obtained directly from the MolecularDataSet.keys property. This allows us to conveniently access the molecular data of each molecule using its key as string.

In oder to have a brief summary of the molecules belonging to a data set, we can use the get_names method. It retrieves a dictionary where the keys correspond to the keys of the dataset and the values are the chemical names. An equivalent dictionary providing the chemical formulas can be accessed via the get_formulas method.

Similar to keep_nuclei/drop_nuclei in MolecularData, keep_isotopes, drop_isotopes keeps/drops selected isotopes for all molecules in a data set. An extra description can be added with updated information about the set (whitespace for separation from the original description content must be included).

As in MolecularData, it is possible to save or load data sets thanks to the read_file and write_file methods that (de)serialize JSON files.

Data sets implemented in the hqs-nmr-parameters package will be explained in the follow.

General remarks

The hqs-nmr-parameters package contains several data sets divided into modules with data from different origins and for different purposes. In general, the MolecularDataSet object of a specific set can be imported with:

from hqs_nmr_parameters.<dataset> import <variant>

With the exception of assignments, the data sets have a variant called molecules which contains the recommended data to be used.

To import the different modules in a single MolecularDataSet object, one can do:

from hqs_nmr_parameters import molecules

which is equivalent to:

from hqs_nmr_parameters.merged import molecules

Including the one imported above, there are three variants that integrate the data available in hqs-nmr-parameters in some way:

molecules: Includes all available molecules with any data, using experimental shifts if possible.
calculated: Includes all available molecules that contain purely calculated parameters (shifts and J-couplings).
combined: Includes all available molecules for which a combination of experimental shifts and calculated J-couplings is available.

The data sets used here are described in more detail below along with the most important methods to access their contents. These methods (and properties) are accessible for each data set, including the merged ones presented here.

Examples module

The first data set that is worth to mention is the examples module which has a set of molecule definitions encapsulated in the MolecularDataSet object molecules. This set can be accessed via:

from pprint import pprint
from hqs_nmr_parameters.examples import molecules

print(type(molecules)) # MolecularDataSet
# Keys of the data set:
print(molecules.keys)

<class 'hqs_nmr_parameters.code.data_classes.MolecularDataSet'>
['CH3Cl', 'limonene_DFT', '1,2,4-trichlorobenzene', 'Anethole', 'Artemisinin_exp', 'endo-dicyclopentadiene_DFT', 'CH3Cl_13C', 'C2H3CN', 'Artemisinin', 'camphor_DFT', 'C6H6', 'C10H8', 'Triphenylphosphine_oxide', 'H2CCF2', 'C2H5Cl', 'Androstenedione', 'Cinnamaldehyde', 'CHCl3_13C', 'C6H5NO2', 'C2H5OH', 'C2H6', 'C10H7Br', 'CHCl3', 'camphor_exp', 'exo-dicyclopentadiene_DFT', '1,2-di-tert-butyl-diphosphane', 'C2H3NC', 'cyclopentadiene_DFT', 'cis-3-chloroacrylic_acid_exp', 'C3H8']

Note that the content of this list of keys is just an example and might appear in a different order or with different entries depending on the installed version of hqs-nmr-parameters. The same holds for the dictionary of molecule names which can be obtained as follows:

pprint(molecules.get_names())

{'1,2,4-trichlorobenzene': '1,2,4-Trichlorobenzene',
 '1,2-di-tert-butyl-diphosphane': 'tert-Butyl(tert-butylphosphanyl)phosphane',
 'Androstenedione': 'Androstenedione',
 'Anethole': 'Anethole',
 'Artemisinin': 'Artemisinin',
 'Artemisinin_exp': 'Artemisinin',
 'C10H7Br': '2-Bromonaphthalene',
 'C10H8': 'Naphthalene',
 'C2H3CN': 'Acrylonitrile',
 'C2H3NC': 'Vinyl isocyanide',
 'C2H5Cl': 'Chloroethane',
 'C2H5OH': 'Ethanol',
 'C2H6': 'Ethane',
 'C3H8': 'Propane',
 'C6H5NO2': 'Nitrobenzene',
 'C6H6': 'Benzene',
 'CH3Cl': 'Chloromethane',
 'CH3Cl_13C': 'Chloromethane',
 'CHCl3': 'Chloroform',
 'CHCl3_13C': 'Chloroform',
 'Cinnamaldehyde': 'Cinnamaldehyde',
 'H2CCF2': '1,1-Difluoroethene',
 'Triphenylphosphine_oxide': 'Triphenylphosphine oxide',
 'camphor_DFT': 'Camphor',
 'camphor_exp': 'Camphor',
 'cis-3-chloroacrylic_acid_exp': 'cis-3-Chloroacrylic acid',
 'cyclopentadiene_DFT': 'Cyclopentadiene',
 'endo-dicyclopentadiene_DFT': 'endo-Dicyclopentadiene',
 'exo-dicyclopentadiene_DFT': 'exo-Dicyclopentadiene',
 'limonene_DFT': 'Limonene'}

In addition, if we want to have a feeling of the size of the molecules in the set, we can print their formulas using the get_formulas method.

The full molecular definition for a given molecule can be loaded using its string key. Each entry of this data set includes a 2D representation (Molfile or SMILES string) of the molecule. Let us consider an example:

from pprint import pprint
from hqs_nmr_parameters.examples import molecules

# Obtain the MolecularData object for acrylonitrile
parameters = molecules["C2H3CN"]
# Print parameters
pprint(parameters.model_dump())

{'description': '1H parameters for acrylonitrile.\n'
                "Values were obtained from Hans Reich's Collection, NMR "
                'Spectroscopy.\n'
                'https://organicchemistrydata.org\n',
 'formula': 'C3H3N',
 'isotopes': [(3, (1, 'H')), (4, (1, 'H')), (5, (1, 'H'))],
 'j_couplings': [((3, 4), 0.9), ((3, 5), 11.8), ((4, 5), 17.9)],
 'method_json': '',
 'name': 'Acrylonitrile',
 'shifts': [(3, 5.79), (4, 5.97), (5, 5.48)],
 'solvent': '',
 'structures': {'Molfile': {'atom_map': [0, 1, 2, 3, 4, 5, 6],
                            'charge': 0,
                            'content': '\n'
                                       'JME 2022-02-26 Wed Sep 07 15:54:28 '
                                       'GMT+200 2022\n'
                                       '\n'
                                       '  0  0  0  0  0  0  0  0  0  0999 '
                                       'V3000\n'
                                       'M  V30 BEGIN CTAB\n'
                                       'M  V30 COUNTS 7 6 0 0 0\n'
                                       'M  V30 BEGIN ATOM\n'
                                       'M  V30 1 C 2.4249 2.1000 0.0000 0\n'
                                       'M  V30 2 C 3.6373 1.4000 0.0000 0\n'
                                       'M  V30 3 C 1.2124 1.4000 0.0000 0\n'
                                       'M  V30 4 H 0.0000 2.1000 0.0000 0\n'
                                       'M  V30 5 H 1.2124 0.0000 0.0000 0\n'
                                       'M  V30 6 H 2.4249 3.5000 0.0000 0\n'
                                       'M  V30 7 N 4.8497 0.7000 0.0000 0\n'
                                       'M  V30 END ATOM\n'
                                       'M  V30 BEGIN BOND\n'
                                       'M  V30 1 1 1 2\n'
                                       'M  V30 2 2 1 3\n'
                                       'M  V30 3 1 3 4\n'
                                       'M  V30 4 1 3 5\n'
                                       'M  V30 5 1 1 6\n'
                                       'M  V30 6 3 2 7\n'
                                       'M  V30 END BOND\n'
                                       'M  V30 END CTAB\n'
                                       'M  END\n',
                            'representation': 'Molfile',
                            'symbols': ['C', 'C', 'C', 'H', 'H', 'H', 'N']}},
 'temperature': None}

As we can see, data for setting up a ¹H-NMR spectrum of the acrylonitrile molecule has been stored together with its Molfile.

To set up an NMR calculation we are only interested in some of the previous data. To retrieve it, use the spin_system method (for more information, see the section on the NMRParameters class):

nmr_parameters = parameters.spin_system()
pprint(nmr_parameters.model_dump())

{'isotopes': [(1, 'H'), (1, 'H'), (1, 'H')],
 'j_couplings': [((0, 1), 0.9), ((0, 2), 11.8), ((1, 2), 17.9)],
 'shifts': [5.79, 5.97, 5.48]}

CHESHIRE module

In the cheshire module, one can find molecular data for molecules belonging to the CHESHIRE database. Five data sets (MolecularDataSet objects) have been created from this database depending on the collected NMR data:

experimental_shifts_only: It includes the experimental shifts (for ¹³C and ¹H) of all 105 molecules, but no J-coupling values.
calculated_full: It has theoretical NMR data for a selection of rigid molecules (molecules with only one conformer) of the previous set. Details of the calculations can be found under the description attribute of each item (see below).
combined_full: It contains experimental shifts and theoretical J-couplings for the rigid molecules (in this case, a few more molecules were sorted out due to an incomplete number of shifts in experimental_shifts_only).
The calculated and combined data sets are the reduced versions of the aforementioned sets and contain only the NMR data required for simulating ¹H-NMR spectra.

In addition, for non-expert users, we have included the alias molecules, which returns the combined set, i.e., ¹H-NMR data with experimental shifts and calculated J-couplings.

These data sets can be imported as follows (we will focus on the molecules set that will be imported as cheshire_molecules to avoid confusion with the examples module):

from hqs_nmr_parameters.cheshire import molecules as cheshire_molecules

As a brief illustration of the type of data included in the sets, we will retrieve some important information:

print(cheshire_molecules.description)

Experimental shifts and theoretical J-couplings for the rigid molecules of the Cheshire set, except for ['Cyclopropanone', 'Bicyclobutane', 'Cyclopentanone', 'Fluorobenzene', 'Indole'] due to incompatible data.
Shifts and couplings only for nuclei ['1H', '19F', '31P', '29Si'].

The keys of the molecules give access to the molecular data. For simplicity, they correspond to a string representation of integers that go from 1 to 105. Note that depending on the imported data set, some numbers might be missing. For instance, only rigid molecules are included in the molecule set.

print(cheshire_molecules.keys[:10])

['1', '2', '4', '5', '6', '9', '10', '11', '12', '13']

As before, the molecule names can be obtained using the get_names function. But here, we will focus on a single entry:

print(cheshire_molecules["1"].name)

'Dichloromethane'

In the description of each entry we find important information about how the NMR parameters were obtained.

print(cheshire_molecules["1"].description)

Geometries in chloroform at B97-3c.
Experimental shifts from CHESHIRE: http://cheshirenmr.info/.
J-couplings (gas-phase) at PBE/pcJ-3.
Parameters averaged over rotamers using permutations.

Each entry in the data set includes both a 2D (Molfile) and a 3D (XYZ) representation of the molecule.

print(cheshire_molecules["1"].structures.keys())

dict_keys(['XYZ', 'Molfile'])

To access the NMR data, use the spin_system method:

pprint(cheshire_molecules["1"].spin_system().model_dump())

{'isotopes': [(1, 'H'), (1, 'H')],
 'j_couplings': [((0, 1), -5.171)],
 'shifts': [5.28, 5.28]}

Phytolab module

The phytolab module contains some selected molecules from a catalogue by Phytolab. It includes three variants of data:

calculated_full: All NMR parameters in this set are computed. Details can be obtained from the description of each item in the set. Where ¹³C data has also been calculated, shifts and couplings are included for each nucleus.
calculated: This set of computed parameters is intended for the calculation of one-dimensional ¹H-NMR spectra, as the parameters for ¹³C are omitted.
combined: Where possible, the chemical shifts have been adjusted manually to achieve a better match with experimental ¹H-NMR spectra. Therefore, this set contains a combination of adjusted or computed shifts, and computed J-couplings. This set is recommended to simulate ¹H-NMR spectra to obtain the closest agreement with experiment.

In addition, the module includes the set molecules, which is an alias for combined as described above.

To access these data sets, import them analogously to the other modules:

from hqs_nmr_parameters.phytolab import molecules as phytolab_molecules

print("Dataset content:")
print(phytolab_molecules.description + "\n")
print(f"Entries of the set: {phytolab_molecules.keys}\n")
print("Details on the NMR parameters for Psoralen:")
print(phytolab_molecules["psoralen"].description)

Dataset content:
NMR parameters for selected natural products from a catalogue by Phytolab.
Where possible, chemical shifts have been adjusted to match experimental spectra. The remaining parameters are computed.
For further details, please refer to descriptions of the individual items in the set.
Shifts and couplings only for nuclei ['1H'].

Entries of the set: ['angelicin', 'psoralen', 'friedelin']

Details on the NMR parameters for Psoralen:
Geometry in chloroform at B97-3c.
Shifts manually adjusted to match 1H-NMR spectrum at 80 MHz in CDCl3 provided by Phytolab.
J-couplings (gas-phase) at PBE/pcJ-3.

As for the examples module, the content of the sets will depend on the installed version of hqs-nmr-parameters.

Assignments module

The assignments module contains example data of other complex molecules.

Patchoulol

The patchoulol data set contains two molecules, the originally proposed structure for patchouli alcohol and the correct structure. We can access it as:

from hqs_nmr_parameters.assignments import patchoulol

To get an overview of the set, we can access a brief summary with print(patchoulol.description).

Only the two mentioned molecules are present in the set, we can access them via their keys:

for key in patchoulol.keys:
    print(f"{key}: {patchoulol[key].name}")

correct: Patchouli alcohol
erroneous: 4,10,11,11-Tetramethyltricyclo[5.3.1.01,5]undecan-10-ol

With this data, we can now use the HQS NMR Tool to simulate both spectra and see the differences between these two similar molecules as well as compare with the experimental spectrum.

Menthol isomers

The menthol_isomers data set is a collection of the four possible diastereomers of menthol (5-methyl-2-(propan-2-yl)cyclohexan-1-ol). With three chiral centers at positions 1, 2, and 5 (in IUPAC convention), there are the following eight possible structures:

Menthol:
- (+)-enantiomer, with stereocenters 1S, 2R, 5S.
- (−)-enantiomer, with stereocenters 1R, 2S, 5R.
Neomenthol:
- (+)-enantiomer, with stereocenters 1S, 2S, 5R.
- (−)-enantiomer, with stereocenters 1R, 2R, 5S.
Isomenthol:
- (+)-enantiomer, with stereocenters 1S, 2R, 5R.
- (−)-enantiomer, with stereocenters 1R, 2S, 5S.
Neoisomenthol:
- (+)-enantiomer, with stereocenters 1R, 2R, 5R.
- (−)-enantiomer, with stereocenters 1S, 2S, 5S.

Since enantiomers are not distinguishable by conventional NMR spectroscopy, there are four different possible NMR spectra. The given data set contains NMR parameters calculated with density functional theory (DFT) for one enantiomer of each pair and can be imported from the assignments module as menthol_isomers_full for ¹H- and ¹³C-NMR parameters or as menthol_isomers for only ¹H-NMR data.

For an overview of the data set, just print its description:

from hqs_nmr_parameters.assignments import menthol_isomers

print(menthol_isomers.description)

The molecular keys and names of the structures in the data set can be listed as:

for key in menthol_isomers.keys:
    print(f"{key}: {menthol_isomers[key].name}")

SSR: (+)-Neomenthol (SSR)
RSR: (-)-Menthol (RSR)
SRR: (+)-Isomenthol (SRR)
SSS: (-)-Neoisomenthol (SSS)

For more information on the applied computational level of theory, please inspect the individual descriptions with the description attribute.

This data can be used to simulate the NMR spectra of all diastereomers as explained earlier and compare them to experimental ones, e.g., to that of neomenthol available here. Due to the limited accuracy of DFT calculations, it is not always straightforward to identify the correct isomer if the exact structure of the experimental measurement is unknown, but the comparison with all four possibilities will provide valuable insights for structure elucidation. Furthermore, the postprocessing module of the HQS NMR Tool allows the user to modify the simulated spectrum to better match an experimental reference which will help to reduce the number of reasonable candidate structures.

Input of molecular NMR parameters via a YAML file

NMR parameters for molecules can be provided in a YAML file. A brief summary of relevant YAML features is provided before proceeding to more detailed explanations and examples.

Dictionaries in YAML are defined as key: value pairs. Most commonly, a dictionary contains one key/value pair per line:
```
key 1: value 1
key 2: value 2
```
value 1 is interpreted as a string, 1 is interpreted as an integer and 1.0 is interpreted as a floating-point number. To avoid problems with special characters (e.g., square brackets), strings may be enclosed in single or double quotes (they have different meanings, and single quotes should be preferred for a literal interpretation of the string).
Lists can be defined over multiple lines as:
```
- item 1
- item 2
```
Lists can also be enclosed in square brackets: [item 1, item 2, item 3]. The nested list [[1, 2, 3], [4, 5, 6]] is equivalent to:
```
- [1, 2, 3]
- [4, 5, 6]
```
Indentation is part of the syntax: key/value pairs or list entries over multiple lines need to have the same number of leading spaces (no tabs).
Comments start with a hash, #.

Definition of the molecular structure

Definition using SMILES

A molecular structure needs to be provided along with its NMR parameters in order to get a complete molecular data input. The YAML input accepts one 2D structural representation, i.e., a SMILES string or a Molfile.

The simplest way to define a structure in the input file is through its SMILES representation. This is done using the key smiles, followed by a representation of the molecule. For acetic acid:

# Acetic acid defined using SMILES.
smiles: CC(=O)O

SMILES strings often contain square brackets [...]. In such cases, the string should be enclosed within quotes, '...', to avoid problems with the YAML parser.

Definition using a Molfile

Manual definition of increasingly large molecules using SMILES can be cumbersome. For example, the string representation of penicillin V would be:

smiles: 'CC1(C)S[C@@H]2[C@H](NC(=O)COc3ccccc3)C(=O)N2[C@H]1C(=O)O'

Instead, it is easier to draw a graphical representation such as the one below using one of many available proprietary or open-source packages.

Such structural 2D representations are commonly stored in Molfiles. A molecule can be read from a Molfile by specifying the file name after the key molfile:

molfile: penicillin_v.mol

Note that the YAML input needs to contain either a Molfile or SMILES, but it is not possible to specify both at the same time. Both the V2000 and V3000 variants of the Molfile specification are supported in the input.

Hydrogens in the molecular structure

2D representations of molecular structures, whether as skeletal formulas or as SMILES, tend to omit hydrogens. Instead, the number of hydrogen atoms is inferred from the atomic valencies, especially those of the carbons. Any of the following three structures can be provided as a Molfile for the acrylamide molecule:

Where hydrogens are suppressed (not drawn out as separate atoms with a bond), their NMR parameters are specified through assignment to the respective skeletal atom. In the leftmost of the three structures shown above, it would not be possible to assign different parameters to the two protons in the CH₂ group. Instead, one of the two other structures shown above could be used to specify different parameters for those protons.

The only restriction with regard to hydrogens is that any skeletal atom can be connected either to suppressed or to stand-alone hydrogens, but not to a mixture of both. Thus, the following two structures would be rejected during input parsing:

The structure to the left mixes a non-suppressed hydrogen with a suppressed "implicit" hydrogen (CH) on the carbon atom; the structure to the right mixes a non-suppressed hydrogen with a suppressed "explicit" hydrogen (NH) on the nitrogen atom.

Numbering of atoms

Atoms in the structural representation of the molecule are labelled with integers for the assignment of parameters. Indices can be counted starting from zero or from one. To avoid errors or misunderstandings, it is mandatory to specify a count from key in the input file, followed by either 0 or 1. The choice between those two options is entirely arbitrary.

An example for acetamide with atom counting starting from zero:

# Atom indices are 0, 1, 2, 3 in their order of appearance in the SMILES string.
smiles: CC(=O)N
count from: 0

An example for atom counting starting from one:

# Atom indices are 1, 2, 3, 4 in their order of appearance in the SMILES string.
smiles: CC(=O)N
count from: 1

The atoms in a molecule are indexed by their order of appearance in the Molfile. Acetamide may be represented by a Molfile with the following content (note that the file header must contain three lines):


     RDKit          2D

  4  3  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    2.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  2  0
  2  4  1  0
M  END

Counting the atoms starting from 1 (count from: 1), the appropriate indices are represented in the image below.

Chemical shifts

Providing chemical shifts will be illustrated using the nitrobenzene molecule as an example. Its molecular structure is contained in the file PhNO2.mol and shown in the picture below, including a numbering of its atoms.

All chemical shifts are specified under the key shifts. Additionally, the values need to be nested under keys representing isotopes, which contain the atomic mass number and the element symbol (e.g., 1H or 13C). For each isotope, the chemical shifts are provided in pairs of an atomic index and the associated value in ppm (parts per million):

# Structure with suppressed protons.
molfile: PhNO2.mol
count from: 0
shifts:
  # chemical shifts for protons in ppm
  1H:
    1: 8.23
    2: 7.56
    3: 7.71
    4: 7.56
    5: 8.23
  # chemical shifts for carbon-13 nuclei in ppm
  13C:
    0: 148.5
    1: 123.7
    2: 129.4
    3: 134.3
    4: 129.4
    5: 123.7

To assign chemical shifts for suppressed protons (that are not provided explicitly in the skeletal structure), the indices of the respective non-hydrogen atoms are used instead. All suppressed protons connected to the same atom are assigned an identical shift value.

If a Molfile contains hydrogens as standalone atoms, the chemical shifts are assigned to those protons using their respective atom indices. This is illustrated using the file PhNO2_allH.mol. Its structure is shown below.

In this example, the ¹H shifts need to be assigned to atoms 9–13. Assigning them to atoms 1–5, as in the previous example, would produce an error.

# Structure with explicit protons.
molfile: PhNO2_allH.mol
count from: 0
shifts:
  # chemical shifts for protons in ppm
  1H:
    9: 8.23
    10: 7.56
    11: 7.71
    12: 7.56
    13: 8.23
  # chemical shifts for carbon-13 nuclei in ppm
  13C:
    0: 148.5
    1: 123.7
    2: 129.4
    3: 134.3
    4: 129.4
    5: 123.7

Indirect spin-spin coupling constants

Indirect spin-spin coupling constants are provided under the key J-couplings in the YAML file. Additionally, the coupling constant values need to be grouped together by isotopes. Keys for each combination of isotopes are combined as isotope1-isotope2: e.g., 1H-1H for coupling constants between two protons or 1H-13C for the associated heteronuclear coupling.

J-coupling constants in units of Hz for each combination of nuclei are provided as a list of lists with the following structure:

J-couplings:
  isotope1-isotope2:
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [...]
  isotope1-isotope2:
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [...]

The first atom index refers to the first isotope and the second atom index refers to the second isotope. As with shifts, values for suppressed hydrogens are assigned via the associated skeletal carbon or heteroatom. If multiple protons are connected to the same skeletal atom, they are assigned the same coupling constant. Inequivalent protons attached to the same skeletal atom, need to be specified explicitly as standalone atoms in the molecule definition, so that they can be referred to via their respective atom indices.

Examples

¹H parameters for propane with SMILES input

smiles: CCC
count from: 0
shifts:
  1H:
    0: 0.9
    1: 1.3
    2: 0.9
J-couplings:
  1H-1H:
    - [0, 1, 7.26]
    - [1, 2, 7.26]

The protons at the terminal CH₃ groups are assigned chemical shifts of 0.9 ppm each, and the protons of the central CH₂ group are given a value of 1.3 ppm. J-couplings between all protons of the neighboring CH₃ and CH₂ groups are assigned as 7.26 Hz. While an indirect spin-spin coupling interaction exists between equivalent protons within the CH₃ and CH₂ groups, it is not observed in the spectrum and the associated values are left out.

¹H parameters for acrylonitrile

The structure of acrylonitrile is provided as a Molfile in acrylonitrile.mol. Its depiction is shown below.

Since protons 4 and 5 are inequivalent, they are specified as standalone atoms with different parameters. In addition, hydrogen 6 is represented as a standalone atom, though suppressing it would also be an equally valid choice.

molfile: acrylonitrile.mol
count from: 1
shifts:
  1H:
    4: 5.79  # H(trans)
    5: 5.97  # H(cis)
    6: 5.48  # H(gem)
J-couplings:
  1H-1H:
    - [4, 5, 0.9]
    - [4, 6, 11.8]
    - [5, 6, 17.9]

Combined ¹H and ¹³C parameters for chloromethane

To illustrate the definition of heteronuclear coupling constants, the following example shows parameters for ¹³C-enriched chloromethane. The parameters include the shifts of the three protons and the ¹³C nucleus, as well as the coupling constants between these nuclei.

# CH3Cl with 13C, the hydrogens inside square brackets are implicit.
smiles: '[13CH3]Cl'
# C will have index 1 and Cl will have index 2
count from: 1
shifts:
  # Shifts of the three protons.
  1H:
    1: 3.05
  # Shift of carbon-13 in the molecule.
  13C:
    1: 25.6
J-couplings:
  # Coupling between the three protons (would normally not be observed).
  1H-1H:
    - [1, 1, -10.8]
  # Coupling between the three protons and the 13C atom.
  1H-13C:
    - [1, 1, 150.0]

Stored YAML files

For the molecules included in the examples module of the hqs-nmr-parameters package, YAML input files (and Molfiles when indicated as molecular structure) are available in the file system. To access them, use the following command, taking into account that each YAML file takes the name of its key in the data set. For acrylonitrile (with key "C2H3CN"), we will have:

from pathlib import Path
from hqs_nmr_parameters import examples

identifier = "C2H3CN"
acrylonitrile_yaml = Path(examples.__file__).parent.joinpath("parameters", identifier + ".yaml")
print(acrylonitrile_yaml.read_text())

name: Acrylonitrile
molfile: C2H3CN.mol
count from: 1
shifts:
  1H:
    4: 5.79  # H(trans)
    5: 5.97  # H(cis)
    6: 5.48  # H(gem)
J-couplings:
  1H-1H:
    - [4, 5, 0.9]
    - [4, 6, 11.8]
    - [5, 6, 17.9]
description: |
  1H parameters for acrylonitrile.
  Values were obtained from Hans Reich's Collection, NMR Spectroscopy.
  https://organicchemistrydata.org

As we have seen, this data is already included as MolecularData instance:

from hqs_nmr_parameters.examples import molecules

identifier = "C2H3CN"
parameters = molecules[identifier]
print(parameters.shifts)
print(parameters.j_couplings)

[(3, 5.79), (4, 5.97), (5, 5.48)]
[((3, 4), 0.9), ((3, 5), 11.8), ((4, 5), 17.9)]

Therefore, even if the atom counting in the YAML file starts at 1, the numbering in MolecularData always starts at 0.

Reading and processing YAML input files

In the HQS NMR Tool, a YAML file containing NMR parameters is parsed using the read_parameters_yaml function from the hqs-nmr-parameters package:

from hqs_nmr_parameters import read_parameters_yaml

parameters = read_parameters_yaml("input_file.yaml")

The read_parameters_yaml function can read the following keywords from a YAML file:

name: An optional name of the molecule.
shifts: Chemical shifts in format {isotope: {index: value}}.
j_couplings/J-couplings: J-coupling values in format {isotope1-isotope2: [[index1, index2, value], ...]}.
count from/count_from: Specifies whether to count atomic indices starting from zero or from one.
smiles: SMILES string of the molecule (better enclosed in quotation marks).
molfile: Path to a Molfile with the molecular structure. As mentioned above, only one representation entry can be set, i.e., either smiles or molfile.
molblock: Compressed Molfile content (not in clear text).
temperature: Temperature in K.
solvent: Name of the solvent.
description: Additional further description.

parameters is then an instance of the Pydantic MolecularData class.

Data types

Molecular NMR parameters, spectra and calculation results are stored in Python objects in the HQS NMR Tool. These data structures are all Pydantic data classes and the data can be accessed as attributes of the class. See NMRResultSpectrum1D for an example.

Note that for the full signature of each datatype you can check out the API-documentation. Here we will just point out the most important attributes of each class.

NMRResultSpectrum1D

The NMRResultSpectrum1D is used to store all of the input and output of a NMR spectrum calculation when using the calculate_spectrum routine. It comprises:

The molecular data used for the calculation, which can include the full data in a MolecularData object or only the NMR parameters with data type NMRParameters (molecule_parameters).
All the parameters handed to calculate_spectrum in form of a NMRCalculationParameters object (calculation_parameters).
The calculated spectrum of data type NMRSpectrum1D (spectrum).

Assuming we have obtained an object called result_spectrum of data type NMRResultSpectrum1D the attributes can be accessed as follows:

molecule_parameters = result.molecule_parameters
calculation_parameters = result.calculation_parameters
spectrum = result.spectrum

In the following we will elaborate on the data type of each of these objects in more detail.

NMRResultGreensFunction1D

The NMRResultGreensFunction1D is the analog data type to NMRResultSpectrum1D when using calculate_greens_function instead of calculate_spectrum. The only difference is that instead of the attribute spectrum, it has the attribute greens_function, which stores a NMRGreensFunction1D data type.

Assuming we have an object called result_greens_function of data type NMRResultGreensFunction1D the greens_function attribute can be accessed as follows:

greens_function = result_greens_function.greens_function

NMRParameters

The NMRParameters data type is defined in the hqs-nmr-parameters repository and holds the reduced set of molecular parameters required for an NMR calculation,

The class holds:

A list of chemical shifts in ppm for every nucleus in the system that has NMR parameters.
A List of isotopes for every nucleus, in the same ordering as shifts. The NamedTuple Isotope has the two attributes mass_number and symbol.
A list containing pairs of atomic indices and the associated J-coupling values. The atomic indices refer to the ordering in the shifts and isotopes lists.
The nspins property for quick access to the number of spins in the system (same value as the number of shifts).

For an example you may run the following code:

from hqs_nmr_parameters.examples import molecules

# Obtain example molecule.
nmr_parameters = molecules["C10H7Br"].spin_system()

print(type(nmr_parameters))        # NMRParameters
print(nmr_parameters.nspins)
print(nmr_parameters.shifts)
print(nmr_parameters.isotopes)
print(nmr_parameters.j_couplings)

NMRParameters objects are used as an input when calculating a spectrum using the calculate_spectrum or calculate_greens_function method.

NMRCalculationParameters

The NMRCalculationParameters data type is defined in the hqs_nmr repository and stores all possible customization options when performing spectra calculations with calculate_spectrum or calculate_greens_function.

The only parameter the user always has to specify is the magnetic field strength in Tesla, so the most simple and often already sufficient instantiation of this data type may look as follows:

from hqs_nmr.datatypes import NMRCalculationParameters

calculation_parameters = NMRCalculationParameters(field_T=11.7433)

Another important option the user can set is the reference_isotope, which is the Isotope specified as Isotope(mass, symbol) to define the frequency of the rotating frame. Furthermore, the object contains multiple attributes allowing to improve the resolution. For a detailed explanation of the available customization options check out the tutorial notebooks or take a look at the API-documentation.

NMRSolverSettings

The NMRSolverSettings data type is also defined in the hqs_nmr repository and stores customization options specific for the solver backend. It is stored as an attribute of a NMRCalculationParmameters object and typically does not need to be altered.

It is however important to know that by default a spin-dependent clustering of the molecule into overlapping clusters is performed (for details check here). This is an extremely accurate approximation especially at high field, however in some cases this might not be wanted. In these cases you can set the attribute perform_clustering=False, such that the spectrum of the entire molecule is calculated at once and symmetries will be exploited where applicable, but no approximations are made. Note that this can lead to a significant increase in runtime, so typically it is more advisable to increase the size of the clusters instead. To do so the attribute max_cluster_size has to be specified:

from hqs_nmr.datatypes import NMRSolverSettings

solver_settings = NMRSolverSettings(max_cluster_size=16)

To see all customization options check the API-documentation.

NMRSpectrum1D

The NMRSpectrum1D data type holds an NMR spectrum comprising the frequencies in ppm (omegas_ppm), an array with the individual spin contributions to the spectrum (spin_contributions) and the full-width-half-maximum in ppm (fwhm_ppm).

Assuming we have performed a NMR spectrum calculation using calculate_spectrum and obtained as output an NMRResultSpectrum1D object we called result_spectrum, we can obtain an object of data type NMRSpectrum1D as an attribute of this object:

spectrum = result_spectrum.spectrum

The spin_contributions are a $N_{spin} \times N_{omegas}$ NumPy array, where each row holds the contribution of the corresponding spin, i.e.:

spectrum.spin_contributions[0,:]

holds the contribution of the first spin (index 0). The total spectrum that would be measured experimentally can be calculated using

import numpy as np
np.sum(spectrum.spin_contributions, axis=0)

NMRGreensFunction1D

The NMRGreensFunction1D data type basically has the same attributes as NMRSpectrum1D, however spin_contributions is a complex array, as it stores the full NMR Green's function and not just the spectral function. This data type is typically only used if one wants to perform post-processing steps that require also the real part of the Green's function.

Assuming we have performed a NMR Green's function calculation using calculate_greens_function and obtained as output an NMRResultGreensFunction1D object called result_greens_function, we can obtain an object of data type NMRGreensFunction1D as an attribute of this object:

greens_function = result_greens_function.greens_function

The total spectrum can then be obtained as follows:

import numpy as np
np.sum(- np.imag(greens_function.spin_contributions), axis=0)

Serialization (Saving and Loading)

The data types described above have a common interface for data serialization. There exists a function called to_json common to all classes that can take a class object and a file name (with or without the full path) to store the data as a JSON file. For example, if we have an object called spectrum of the data class NMRSpectrum1D and want to save it as spectrum_data.json:

from hqs_nmr.datatypes import to_json

to_json(spectrum, "spectrum_data.json")

There is also the inverse method from_json to load the data again. It needs as additional information the type of class the JSON file has stored, e.g.:

from hqs_nmr.datatypes import from_json, NMRSpectrum1D

spectrum = from_json(NMRSpectrum1D, "spectrum_data.json")

HQS NMR Tool Documentation