Types (Input / Output)
In HQS Tasks, inputs and outputs of tasks need to be transmittable over the network. To this end, we use JSON as the data format. Therefore, it needs to be possible to serialize the Python values to a JSON document and parse them back from JSON. This mechanism happens automatically, but implies some restrictions on what types can be used. Also, it helps understanding performance issues when dealing with larger amounts of data.
This repository makes heavy use of the Pydantic validation library, which is installed as one of its dependencies.
As a general rule of thumb, any type understood by pydantic.TypeAdapter is supported by HQS Tasks out of the box.
Built-in Python Types
The "typical" primitive types in Python such as bool, int, float, str are supported.
Furthermore, collections such as list, dict, tuple, set, are supported. We highly engourage developers to furthermore specify the collection's item types, i.e., use list[str] instead of just list when you want to express that the list's items shall be strings.
Also, None can be used, and type alternatives can be declared using Union (and Optional) from typing.
Although highly discouraged, Any from typing or object can also be used to allow any (JSON-serializable) types.
Models
While it is in principle possible to model arbitrarily complex data structures using the above mentioned built-in types, they do not "document" themselves and are (out of the box) unconstrained: The dict type allows arbitrary keys, and restricting the set of allowed keys is not possible with such a simple type annotation. Furthermore, in most use-cases of tuples there is a specific "meaning" of each tuple item.
These gaps are solved by writing models: a Python class lists explicit members, each with a type. This class can then extend one of the supported model base classes to let HQS Tasks understand how to serialize them.
Current supported model base classes are:
BaseModelfrompydantic(serialized as a JSON object)NamedTuplefromtyping(serialized as a JSON array)Enumfromenum(serialized as the enum value's type)
Each member (field, enum value) of these models can be any type supported by HQS Tasks, including another model or collection.
Models can be wrapped in (arbitrary combinations of) collection types, e.g., you can use a list of a model type. [This is actually something we were wondering about already, so good to know (:]
FileRefs
A special role is taken by the FileRef type. For that, please refer to the File API.
This type can also be placed in (arbitrary combinations of) collection types and models.
Numpy Arrays
Numpy arrays and numeric types are omnipresent in Python-based scientific computing. Their usage with HQS Tasks, however, requires some minor adaptions.
When numpy arrays are used as fields of a pydantic model, one has to provide custom methods for serializing and deserializing the field.
One way of doing so is to create an annotated type containing the (de-)serialization functions using PlainSerializer and BeforeValidator from the pydantic package.
Using the example of a 1D numpy array the annotated type may be written as follows:
from typing import Annotated, Any
import numpy as np
from pydantic import BeforeValidator, PlainSerializer, WithJsonSchema
def _list_to_array(input: list[float]) -> np.ndarray:
return np.asarray(input)
def _array_to_list(input: np.ndarray) -> list[float]:
return list(map(float, input.flatten()))
FloatArray1D = Annotated[
np.ndarray[Any, np.dtype[np.floating]],
BeforeValidator(_list_to_array),
PlainSerializer(_array_to_list),
WithJsonSchema({"items": {"type": "number"}, "type": "array"}),
]
Here, _list_to_array is a function converting the primitive type (list[float]) to the numpy array, while _array_to_list performs the opposite operation.
Note the WithJsonSchema annotation overrides the generated JSON Schema for the given type.
Unfortunately, the annotated type alone is not sufficient to be compatible with pydantic.TypeAdapter as it lacks a core schema. A simple workaround is to create a pydantic model that allows arbitrary types:
from pydantic import BaseModel, ConfigDict
class PlotModel(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
x: FloatArray1D
y: FloatArray1D
A pydantic model defined in this manner then supports (de-)serialization to JSON and can be used in the context of HQS Tasks.
It is recommended to test building the client package locally to verify the correct integration of types used in the tasks.
To do so, install the hqs_tasks_generator package, dump the task definition to JSON, and build the client package:
hqstage install hqs_tasks_generator
my_task_cli --dump-task-definitions task-registry.json
hqs-tasks-generate --registry-file task-registry.json --python-package-name my_tasks_client
For alternative solutions and detailed explanations, we recommend consulting the pydantic documentation ("Handling third-party types").
Other Types
To some degree, some other types are supported or can be with some help. At this moment, this is beyond the scope of this documentation.