HQS Tasks

HQS Tasks is a solution to execute scientific software applications on a remote high-performance computing environment, and to control the execution from anywhere, such as your favorite local Python environment.

Providing access to cloud and therefore a virtually unlimited amount and a variety of compute hardware, the user is able to leverage a vast amount of compute resources with almost no additional setup effort.

Every software HQS releases as a task can be executed on the respective computing platform.

About this documentation

This documentation is targeted to developers who want to write new tasks to be deployed to HQS Tasks.

Since this can be done either on a lower or on a higher level, the documentation is divided into the corresponding sections:

  • Python (higher Level; recommended): To write tasks, we recommend developers to utilize a dedicated library for programming languages such as Python which provide an easy to use interface. The library essentially implements the communication aspects on top of the below mentioned CLI, and therefore simplifies the implementation of new tasks significantly.

    Currently, we provide such a library for Python only. This interface is documented in the section Python Interface of this documentation.

  • CLI (low level): Tasks can also be written as a CLI tool. The developer is responsible for implementing the corresponding task CLI using their favorite programming language and toolset. Therefore, an understanding of how CLIs are written and how their basic functions are working is required. Furthermore, knowledge about JSON schemas is required, since that is the fundamental aspect for defining new tasks on the level of CLI.

    This interface and other core concepts which are relevant when writing your tasks on the level of CLI are documented in the section Core Concepts of this documentation.

  • In addition, the documentation explains some tooling for developers in the section Developer Tooling.

We assume the reader has the corresponding understanding of the technologies depending on which interface is to be used. We furthermore assume a basic understanding of HQS Tasks from the users perspective and refer to the HQS Tasks User Documentation.

First Steps

First of all, in this documentation we assume you are developing (one or more) tasks in a Python package which is then deployed in some CI/CD pipeline to a target environment. This pipeline also generates a client library.

The task can then be invoked by installing the client library and invoking the according client functions. This is also explained in the HQS Tasks user documentation.

Example Project

An example project setup is given in the hqs_task_example Python package. You can install that using hqstage:

hqstage install hqs-task-example

It can then be inspected by browsing the installed package folder inside your Python environment's site packages directory (for example <ENVIRONMENT_FOLDER>/lib/python3.13/site-packages/hqs_task_example, but this varies).

Project Setup

Usually a "tasks project" is a simple Python project which makes use of the Python package hqs_task. If not yet done, install it using hqstage:

hqstage install hqs-task

Most essentially, in the current version of that package (Note: we might simplify that in the future), you need to ensure the following things:

  • Import cli from hqs_task in your package's root module's __init__.py file:

    from hqs_task import cli
    

    Furthermore, make sure that in the root module of your package all tasks or modules which define them are imported.

  • Register this cli function as a CLI script in your package's pyproject.toml file:

    [project.scripts]
    hqs_task_example = "hqs_task_example:cli"
    

    The script name on the left hand side specifies the name of the CLI program which will be defined once that package is installed.

    The value on the right hand side is the module name (simply the package name if it's the package's root module) and the cli function name separated by a colon.

  • Configure your CI/CD process to build the client library and to deploy your tasks in the target environment. The steps for that are explained in the documentation. HQS developers may use the prepared CI/CD templates as explained below.

HQS-specific Setup

Warning

This section only applies to HQS-internal developers. For other developers, the steps need to be done manually or a CI/CD process needs to be setup separately. See the corresponding sections of this documentation.

For HQS Task repositories developed at HQS the task deployment and generation of the client package is made convenient by the provided CI/CD templates. A developer only has to specify a few CI/CD variables to enable automatic task deployment and client package generation.

The tasks/deploy_aws.yml template handles the deployment of tasks to AWS and will only be executed if a new version has been tagged. Note that for this to work, version tags must be protected (this can be set in the repository settings via Settings > Repository > Protected Tags).

Once the CLI script is registered as explained above, configure the CI/CD pipeline to pick this script name up as the task CLI program. In HQS' Gitlab setup with the CI/CD template, this is done by

  • setting the variable HQS_TASK_CLI to the task CLI program defined in the previous step

  • including the template files for deploying the task to a target environment; in this example tasks/deploy_aws.yml for deploying it to AWS:

    include:
      - project: templates/templates
        file:
          - tasks/deploy_aws.yml
    

Tip

It is recommended to verify the correctness of important CI/CD template variables before deploying tasks to AWS. When including files from the CI/CD template repository, the order of files can be important as some template variables may be overwritten by subsequent template files. For instance, with the custom container template it occurred that the CICD_CONTAINER was redefined by another step. Thus, the corresponding container/custom_cicd_container.yml template file is usually included last.

To let the pipeline handle building and deploying the client package to the HQS internal PyPi server, the HQS_CLIENT_PACKAGE variable needs to be set.

Writing a Task

A task is nothing more than a Python function which is decorated with the "task decorator" as you will see in the following example. The decorator registers the function as a task and enables it to be invoked as such from a task client script after deployment.

A simple "hello world" example may look like this:

from hqs_task import task

@task
def hello() -> None:
    print("Hello world!")

There are some restrictions on which functions can be declared as tasks:

  • The function must be defined on the root scope of a module. That means, it must not be defined inside a class or inside another function.

    A task is allowed to be defined inside a branch (conditional statement) on the root scope, which enables developers to only define tasks if some condition is met (for example optional dependencies being available).

  • The function is allowed to have none, one, or many explicitly named arguments, but no variable length arguments are supported (i.e., neither *args nor **kwargs can be specified in the signature).

  • For any arguments, default values may be supplied. As with usual Python rules, an argument with a default value may not be followed by an argument without a default value.

  • The function signature must include full type annotations: All arguments as well as the return type have to be strictly typed.

  • The types used for arguments as well as the return value are restricted to the ones listed under Types (Input / Output).

  • The function may throw exceptions (of any type), but it shall not terminate the program (intentionally) in any other abnormal way (i.e., exiting on failure).

Inputs and Outputs

Each task typically has an input and an output. Note that in the Python interface, multiple arguments are represented as a single input document (wrapped as a tuple). The return value maps to the output directly.

Technically, the input is optional, while the output is mandatory (but None is allowed to be used for an output).

For more details see Types (Input / Output).

Types (Input / Output)

In HQS Tasks, inputs and outputs of tasks need to be transmittable over the network. To this end, we use JSON as the data format. Therefore, it needs to be possible to serialize the Python values to a JSON document and parse them back from JSON. This mechanism happens automatically, but implies some restrictions on what types can be used. Also, it helps understanding performance issues when dealing with larger amounts of data.

This repository makes heavy use of the Pydantic validation library, which is installed as one of its dependencies. As a general rule of thumb, any type understood by pydantic.TypeAdapter is supported by HQS Tasks out of the box.

Built-in Python Types

The "typical" primitive types in Python such as bool, int, float, str are supported.

Furthermore, collections such as list, dict, tuple, set, are supported. We highly engourage developers to furthermore specify the collection's item types, i.e., use list[str] instead of just list when you want to express that the list's items shall be strings.

Also, None can be used, and type alternatives can be declared using Union (and Optional) from typing.

Although highly discouraged, Any from typing or object can also be used to allow any (JSON-serializable) types.

Models

While it is in principle possible to model arbitrarily complex data structures using the above mentioned built-in types, they do not "document" themselves and are (out of the box) unconstrained: The dict type allows arbitrary keys, and restricting the set of allowed keys is not possible with such a simple type annotation. Furthermore, in most use-cases of tuples there is a specific "meaning" of each tuple item.

These gaps are solved by writing models: a Python class lists explicit members, each with a type. This class can then extend one of the supported model base classes to let HQS Tasks understand how to serialize them.

Current supported model base classes are:

  • BaseModel from pydantic (serialized as a JSON object)
  • NamedTuple from typing (serialized as a JSON array)
  • Enum from enum (serialized as the enum value's type)

Each member (field, enum value) of these models can be any type supported by HQS Tasks, including another model or collection.

Models can be wrapped in (arbitrary combinations of) collection types, e.g., you can use a list of a model type. [This is actually something we were wondering about already, so good to know (:]

FileRefs

A special role is taken by the FileRef type. For that, please refer to the File API.

This type can also be placed in (arbitrary combinations of) collection types and models.

Numpy Arrays

Numpy arrays and numeric types are omnipresent in Python-based scientific computing. Their usage with HQS Tasks, however, requires some minor adaptions. When numpy arrays are used as fields of a pydantic model, one has to provide custom methods for serializing and deserializing the field. One way of doing so is to create an annotated type containing the (de-)serialization functions using PlainSerializer and BeforeValidator from the pydantic package.

Using the example of a 1D numpy array the annotated type may be written as follows:

from typing import Annotated, Any

import numpy as np
from pydantic import BeforeValidator, PlainSerializer, WithJsonSchema


def _list_to_array(input: list[float]) -> np.ndarray:
    return np.asarray(input)


def _array_to_list(input: np.ndarray) -> list[float]:
    return list(map(float, input.flatten()))


FloatArray1D = Annotated[
    np.ndarray[Any, np.dtype[np.floating]],
    BeforeValidator(_list_to_array),
    PlainSerializer(_array_to_list),
    WithJsonSchema({"items": {"type": "number"}, "type": "array"}),
]

Here, _list_to_array is a function converting the primitive type (list[float]) to the numpy array, while _array_to_list performs the opposite operation. Note the WithJsonSchema annotation overrides the generated JSON Schema for the given type.

Unfortunately, the annotated type alone is not sufficient to be compatible with pydantic.TypeAdapter as it lacks a core schema. A simple workaround is to create a pydantic model that allows arbitrary types:

from pydantic import BaseModel, ConfigDict


class PlotModel(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)

    x: FloatArray1D
    y: FloatArray1D

A pydantic model defined in this manner then supports (de-)serialization to JSON and can be used in the context of HQS Tasks.

Note

It is recommended to test building the client package locally to verify the correct integration of types used in the tasks. To do so, install the hqs_tasks_generator package, dump the task definition to JSON, and build the client package:

hqstage install hqs_tasks_generator
my_task_cli --dump-task-definitions task-registry.json
hqs-tasks-generate --registry-file task-registry.json --python-package-name my_tasks_client

For alternative solutions and detailed explanations, we recommend consulting the pydantic documentation ("Handling third-party types").

Other Types

To some degree, some other types are supported or can be with some help. At this moment, this is beyond the scope of this documentation.

Errors & Exceptions

As with every software program, a task may crash or run into an error situation.

To help the task caller understand what's going on, we distinguish between different kinds of errors and try our best to report most of them to the task client. Details are explained in Error Reporting.

Within Python, when the task finds itself in a situation where it can't continue, it shall raise an exception. This is then forwarded to the client. In the following, we explain which information from the exception is made to the client program and how you as a developer should design your exceptions to help the user solve issues.

Error Type (Exception Name)

An error report in HQS Tasks includes a type which classifies the error, intended to allow some automated handling of the situation. When a task raises an exception, the exception's class name is used as the error type.

We recommend developers to define exception classes which reflect different error situations for their tasks.

Keep in mind that the error type, and hence the class name, is machine-processed. It shall identify the type of error which occurred. Therefore, other code may rely on the exact name chosen here, hence we advise to treat it as a breaking change to change this name between versions of the task. So please ensure that the class names for exceptions are chosen carefully.

Python's built in exception classes can be used explicitly (or indirectly) to express error cases. For example, assume your task has a dictionary as the input, and it expects some keys to be present. (Let's ignore for a moment that a proper schema could express this constraint very nicely.) Once it attemps to access that key, if it's not present, it will (indirectly) raise a KeyError even you did not write code to handle this case. You could however also check the key for being present in the given dictionary and (explicitly) raise a KeyError (or any other exception class) to explain this situation in more detail. In both cases the user will receive an error report with type = KeyError which they can then deal with accordingly. Since the KeyError that Python raises automatically for you already contains all information the user needs to solve the issue, there is not much benefit in adding the explicit code to check the key and raise an error.

Error Message

An error report in HQS Tasks includes a message which is intended to explain the situation to the (human) user. When a task raises an exception, the exception's string representation is used as the error message.

The built-in exception classes in Python provide a message argument as their first (and usually only) argument in their constructor. This message is returned when attempting to convert the exception object to a string.

>>> str(RuntimeError("Something bad happened."))
'Something bad happened.'

We recommend custom exception classes to be used in HQS Tasks to also have such a message argument which is forwarded to the constructor of the base class. This is the default behavior if you merely extend a built-in exception class without adding any constructor:

class MyErrorType(RuntimeError):
    pass
>>> str(MyErrorType("Something bad happened."))
'Something bad happened.'

You can also define the message in the class constructor, when there is only one "explanation" for each case where this error is raised:

class MyErrorType(RuntimeError):
    def __init__(self) -> None:
        super().__init__("Something bad happened.")
>>> str(MyErrorType())
'Something bad happened.'

Error Details (Arguments & Stacktrace)

An error report in HQS Tasks includes a defails dictionary which is intended to give technical details to be machine-processable.

Once you want an exception to carry more information, for example when the error arose from a parameter having a particular, invalid value, you can add that information to the error report. It is not recommended to (only) add this information in the error message, as this is not intended to be machine-processed, as parsing such a string is prone to failure in particular when the message layout changes.

In Python, we populate that with the exception's arguments as well as the stacktrace. Other (custom) details are not supported currently in the Python language interface.

Exception's Arguments

The arguments to the constructor of a built-in exception class are exposed in the details dictionary under the args key. We extract these by accessing the args property on the exception value. The requirement for this is that the base class of the exception class is based on BaseException, which is the base of all Python built-in errors. To add custom items to it, simply add values to the constructor call:

class MyErrorType(RuntimeError):
    def __init__(self, some_value: float, number_of_whatever: int) -> None:
        super().__init__("Something bad happened.", some_value, number_of_whatever)
>>> MyErrorType(3.141, 42).args
('Something bad happened.', 3.141, 42)

Exception's Stacktrace

We furthermore provide the stacktrace of the exception in the details dictionary under the key stacktrace. Similar to the arguments, this requires the class to be based on BaseException. However, this might get stripped for production environments targetting end-users, as it exposes the task's source code.

Custom Details

Note that there is currently no way to add details to the root level of the details dictionary, but that might be added in the future and probably then we also encourage developers to use this, as the args wrapper was just a "workaround" solution for the early prototype of the Python language interface.

File API

A task can write (for outputs) or read (for inputs) binary files.

Background

Note that in the JSON representation of inputs and outputs, we do not want to encode the binary file content, wile in theory possible. The idea of the File API is to only transmit references to binary files in these documents.

The most natural representation of a reference to a binary file would be a simple file path. Of course, it is possible to just pass such a file path in the form of a string in the input / output of a task. However, keep in mind that the typical use-case of HQS Tasks plans to execute the task not on the same machine where the client is running, but instead on some remote machine. Also, when invoking a follow-up task, which shall read the file written by the previous task, just passing over the path of a local file is also not working, since the tasks are executed in an isolated, containerized environment and therefore do not share any file system in general.

So when it comes to exchanging data via files it's important that also the file itself is transmitted, not just the path. HQS Tasks does exactly that automatically for you when you tell it that the string is a file path. This is triggered by having the file path wrapped in a JSON object with the special key $file. To represent such a JSON document, hqs_task provides a dedicated FileRef type.

Writing Files (Output)

When a task has written a file as part of its implementation, it can tell HQS Tasks that this file is meant to be made available to the caller by including a FileRef value in the return value (usually by adding a field of type FileRef to a model which is the return type or part of it).

Simple example, where the whole output is just a FileRef:

from hqs_task import task
from hqs_task.types.fileref import FileRef

@task
def write_file() -> FileRef:
    local_filename = "file.txt"
    with open(local_filename, "wt") as f:
        f.write("Hello world!")

    return FileRef(local_filename)

Reading Files (Input)

When a task wants to receive a file as part of its input, it can tell HQS Tasks that by including a FileRef value in (one of) the arguments (by using FileRef as the argument's type or by adding a field of type FileRef to a model used in the argument type).

Simple example, where the whole input (since the function is just taking a single argument) is just a FileRef:

from hqs_task import task
from hqs_task.types.fileref import FileRef

@task
def read_file(file: FileRef) -> None:
    with open(file.filename, "rt") as f:
        print(f.read())

Provisioning Defaults

When running a task, the user may specify options for provisioning the hardware on which the task will run.

However, for each task, the developer can specify default options, which will apply once the user does not specify any.

If none of them are set (neither by the task nor by the user), the HQS Tasks environment falls back to default values.

The following parameters, explained in the sections below, can be used to specify hardware requirements, which will determine on which hardware the task will run:

  • vcpu: The number of (virtual) CPU shares (a.k.a. threads)
  • memory: The amount of memory in MiB
  • gpu: The number of GPUs (experimental)

Additionally, the following parameter specifies limits beyond the hardware itself:

  • timeout: The maximum time allowed for the execution, see here for its definition

CPU and Memory

The parameter vcpu defines how many virtual CPU threads (which usually equals twice the number of virtual CPU cores) the task shall be assigned. Depending on the execution model this either limits the number of CPU threads visible to the task process (even in the kernel), will throttle the CPU usage or may even do nothing.

The parameter memory specifies how much memory in MiB is made available to the (process or container which runs the) task. Similar to the vcpu parameter, details on what that means exactly depend on the environment on which the task is being executed. But a developer can expect that the task is allowed to allocate the specified amount of memory, and when attempting to allocate more, it may get killed, resulting the execution to fail.

To specify the default values for these parameters, pass an instance of the HardwareOptions class to the hardware parameter of the task decorator, as visualized in the following example:

from hqs_task import task
from hqs_core_models.provisioning_options import HardwareOptions

@task(hardware=HardwareOptions(vcpu=4))
def example_with_vcpu_4() -> None:
    ...

@task(hardware=HardwareOptions(memory=4096))
def example_with_memory_4gib() -> None:
    ...

Of course, multiple options can be combined in the constructor of HardwareOptions:

@task(hardware=HardwareOptions(vcpu=4, memory=4096))
def example_with_vcpu_4_and_memory_4gib() -> None:
    ...

GPU (Experimental)

The parameter gpu defines how many GPUs are made available to the task. However there is currently no way to specify what kind of GPU (vendor, generation, number of cores, amount of GRAM, etc.) the task requests.

Note: GPU support is still experimental and only provided by some HQS Tasks environments.

from hqs_task import task
from hqs_core_models.provisioning_options import HardwareOptions

@task(hardware=HardwareOptions(gpu=1))
def example_with_gpu() -> None:
    ...

Timeout

The parameter timeout can be used to overwrite the environment's default task timeout.

This parameter is set as an argument directly for the task decorator in the form of a timedelta object from the datetime library. This constructor has several keyword arguments to specify the timeout in a human-readable way:

from hqs_task import task
from datetime import timedelta

@task(timeout=timedelta(hours=24))
def example_with_timeout_24h() -> None:
    ...

Note that it is currently not possible to simply provide the timeout in seconds or specify that the task shall not have any timeout at all.

Metrics & Profiling

To help the developer to determine the appropriate hardware requirements and to track down performance bottlenecks, HQS Tasks has some built-in features which expose some metrics and profiling data to the client. In particular, the metrics are stored in the execution_meta property of task response object.

Note that the availability of metrics and profiling data depends on the environment on which the task is being executed.

Metrics

HQS Tasks tries to collect the following information for the task execution:

  • duration: The time the task executed (wall time)
  • cpu_avg: The CPU usage (ratio of CPU time over wall time)
  • memory_peak: The memory usage (peak memory allocation)

Note that these numbers are measured at the scope of the task's process or even the container, depending on the environment on which the task is being executed.

Profiling Data

Note: Currently, this is only available for fast-running tasks as a feature preview. In the future, it will be extended to allow the developer to profile sections of the task, as well as being made available to other execution environments.

The amount of time which was spent in the following phases of the task execution are measured and reported to the client:

  • service_receive_request: Receiving input data (Note: Only the backend-internal data transmission is measured. Will be improved in the future.)
  • task_input_deserialization: Validating (deserializing) the input data from JSON to form the task function's argument values
  • task_handler: Invoking the Python function which implements the task (i.e. the @task-decorated function)
  • task_output_serialization: Serializing the returned value (output) into JSON

Task CLI

At the lower level, a task in the HQS Tasks system is defined as a CLI program. This enables a task developer to implement it using their favorite programming language and toolset. The Python interface already provides a convenient method to automatically create the task CLI.

The other components in the HQS Tasks system then will execute the task by merely invoking the CLI according to this documentation. Also, in some cases it helps the developers / testers to invoke tasks by hand using its CLI.

In this section we describe what the interface for such a CLI program shall look like in order to be a valid task to be used in HQS Tasks.

Note that it is allowed to implement multiple tasks in the same CLI program.

The following documentation assumes that the CLI program is called my_task_cli.

Exposing all task definitions

A valid task CLI program shall dump the definitions of all implemented tasks to stdout when being invoked using the command line

my_task_cli --dump-task-definitions

and dump the same information into a file (here: tasks.json) when being invoked as

my_task_cli --dump-task-definitions tasks.json

The dumped file content shall be a JSON array of task definitions which we also refer to as a task registry.

Exposing a single task definition

A valid task CLI program can implement one or many (or zero) tasks. For each such task, the task definition as dumped using the method above contains the command line to be used to invoke that specific task.

Usually (but not necessarily) these command lines are composed as the CLI program name and a "command" argument, such as my_task_cli first_task etc. In this documentation, when showing examples, we assume this principle applies.

Note that it is valid (e.g., when the CLI only implements a single task) that the command line of a task contains just the CLI program name.

Then, the task will dump its task definition to stdout when being invoked as

my_task_cli first_task --dump-task-definition

and dump the same information into a file (here: task.json) when being invoked as

my_task_cli first_task --dump-task-definition task.json

The dumped file content shall be the JSON-serialized task definition.

Invoking a task

In order to invoke a task, three (in some cases two) options with filename arguments need to be added to its command line. These are:

  • the input filename pointing to an existing JSON file (optional)
  • the output filename where the task shall write the result as a JSON document on success
  • the error filename where the task shall write error reports as a JSON document on failure

Note that these arguments are to be specified after option names, leading to 6 (or 4 if no input is passed) additional arguments to the command line.

How the option names are called is specified in the task definition for all three cases. In this example we assume they are called --input, --output, and --error respectively, but in reality they can be called anything (and dashes are optional).

Note that, while it might feel "intuitive," a task will not be invoked without these arguments to assume that the input is provided on stdin and the output is dumped to stdout and the error to stderr. But on the other hand, that behavior is also not forbidden for a CLI program to be considered a valid task CLI.

Assuming the above, a typical command line to invoke a task then looks like:

my_task_cli first_task --input input.json --output output.json --error error.json

A task CLI may assume that the given input document adheres to the input schema as described in the task definition. When this is not true, its behavior can be undefined, i.e., it is allowed to still run (and produce any result), to crash, or to produce a proper error case. Of course, the latter may be preferable when the task is invoked by hand using its CLI, such as when testing it. But for performance reasons, any validation checks can be skipped; the program is still being considered a valid task CLI.

Exit cases

Successful

When a task succeeded, it shall write the result in the specified output file in the form of a JSON document adhering to the schema as described in the task definition. It shall not write the error file, not even an empty one or an empty array.

The exit code does not matter technically, but should be zero to stick to best practices.

Failures

When a task fails to succeed, due to whatever reason, whenever possible it shall write one or more error reports in the specified error file in the form of a JSON document with an array of error reports. It shall not write the output file, not even an empty one.

Abnormal termination

It is also valid for a task to fail and not to produce a valid error document, such as to simply crash. In these cases, HQS Tasks implicitly "generates" an error report to be handed to the caller which explains this situation.

Depending on the environment in which the task is being executed, this may include cases where the program exceeds the resources which have been provisioned for it (i.e., out of memory, timeout, etc.). That means, a task implementation does not need to take any extra steps to deal with these cases.

Schemas (Input / Output)

Every task specifies a schema for its input, and a schema for its output. These define valid documents to be passed as the input, and what the caller can expect for the output document to look like, respectively.

These schemas are specified using JSON Schema.

Valid JSON Schema Versions

Since JSON Schema specifies several versions (sometimes called "drafts"), we need to be careful which versions are allowed to be used.

In the current implementation, schemas can be written in the JSON Schema versions draft-07 or in 2020-12, with the latter being strongly recommended.

Extension: "Type Sources"

To simplify the tools in the HQS Tasks system, we allow schemas (either root schemas or sub schemas which specify parts of the input/output documents) to have an additional key source which annotates where the type is implemented which corresponds to the schema.

This is particularly used in the Python interface implementation: When writing your tasks in that framework, the schemas for these tasks have a source annotation for each class (Pydantic model, enum type, etc.), which was used in the type annotation of an input or output. This allows the generated client package to simply refer to those types (by generating import statements) instead of requiring to re-generate Python code to implement those. This is in particular helpful if the type was defined outside of the task implementation (e.g., when using a class from an external package).

Question

Are there restrictions on externally defined types for the sources key? I.e. for the Python interface such types should still bee compatible with pydantic.TypeAdapter, correct?

This additional annotation is still valid in JSON Schema and usually ignored by other tools which process JSON Schema.

Error Reporting

As with every software program, a task may encounter an error case. We distinguish between different kinds of errors:

  1. Invalid input: The task can not run with the given input. We furthermore distinguish between the situations:
    • Input is not a valid instance of the input schema (a.k.a. "validation error")
    • Input is a valid instance of the input schema, i.e., the schema did not specify that this input is invalid. This is a very reasonable scenario as it is unfeasible to reflect every corner case (of what is valid and what is not) in the schema. An example is that the product of two numbers in the input must not exceed a specific value. We also call this a "extended validation" which failed in this case.
  2. (In principle valid but) Unprocessable input: The task is designed to handle the given input in principle, and also attempted to process it, but ran into the situation that it can't process it further. We furthermore distinguish:
    • The task detects the situation and writes an error report. This case also includes raised exceptions (as they will be translated to error reports, see below) in cases the software developer did not fully design the task for, and most "simple bugs".
    • The task crashes without writing an error report.
  3. (In principle valid and processable but) Unsolvable input: The task can process the given input in principle, but ran into the situation that a final solution could not be found. Depending on the subject, this might not be considered an error, but more like a "negative outcome", an example being an interative algorithm to not converge into a stable solution after many iterations.

Beyond these scenarios, in the overall HQS Tasks system, there are even more situations where we can run into exceptional cases, which we do not discuss here. An example being when the client tries to invoke a task that does not exist.

Validation Errors

t.b.d.

Error Reports (a.k.a. Exceptions)

Before diving deeper, let us first clarify that on the CLI level of a task we talk about "error reports", while in most programming languages these correspond to "exceptions": In HQS Tasks, we define an error report independent of a programming language, and then map these to exceptions in the programming languages being used.

An error report is a JSON document which describes what went wrong. A task may terminate with writing an error file which contains a list of such reports. Of course, usually there is only one error, as a typical task implementation will terminate once such a situation is encountered. But HQS Tasks is designed flexible enough to also process multiple errors generated in a task, such as when a validation step finds multiple issues, or when it forwards errors for multiple sub-tasks etc.

Each error report is a JSON object which has the following fields:

  • type (string): The name of the error category, like a primary "classification". For now, we do not define a set of types and the developer can choose these freely. Usually, they correspond to exception class names in the language models. While currently not enforcing it, we recommend to use simple terms written in upper camel case, such as SystemTooLarge.
  • message (string): A message in human-readable form. This is meant to be never machine-read: We explicitly tell developers to not include information only within the message (potentially forcing the client to parse this string). Machine-readable information shall be put in the details section, see below.
  • details (object): A JSON object which collects any additional information, particularly all details relevant for processing this error in a surrounding logic. The object is not further specified here. While also not strictly enforcing it, we recommend developers to keep the structure simple and compatible between versions, and use the same structure for all errors of the same type. The details object generated by our Python implementation includes the exception's arguments and a stacktrace.

Crashes (a.k.a. Abnormal Termination)

When any CLI task exits while not having written a proper output and also not an error report, we assume the task crashed.

The system which processes the response of the task CLI solves this situation by automatically generating an error report which explains this situation to the client.

Task Definition

In HQS Tasks, each task is defined by a so-called task definition. This is a JSON representation of

  • the name and version (the "identity") of the task
  • some description for the user
  • the specification of input and output schemas
  • information of how the task can be invoked via the CLI (the CLI command line and name of arguments to pass files)
  • the default provisioning options

The following is an example of how such a task definition can look like:

{
  "name": "hello",
  "description": "Some famous hello world example!",
  "version": "0.5.0",
  "input": {
    "file_argument": "--input-file",
    "json_schema": {
      "type": "string"
    }
  },
  "output": {
    "file_argument": "--output-file",
    "json_schema": {
      "type": "string"
    }
  },
  "error": {
    "file_argument": "--error-file"
  },
  "command": [
    "hqs_task_example",
    "hello"
  ],
  "provisioning_defaults": {}
}

It originates from the following task code written in Python:

from hqs_task import task

@task
def hello(name: str) -> str:
    """Some famous hello world example!"""
    return f"Hello {name}!"

Furthermore, the above definition shows us that the Python code can be invoked by the CLI script name hqs_task_example. That has been configured in the Python project file (pyproject.toml) using the section

[project.scripts]
hqs_task_example = "hqs_task_example:cli"

and by including the following line in the root module's __init__.py file:

from hqs_task import cli

Task Registry

Basically, the so-called task registry is just a collection of onre or more task definitions. Speaking in terms of JSON, the task registry JSON file contains an array of task definition documents.

This file is then used for all follow-up steps of preparing an environment for running these tasks, such as generating the client package, containerization as well as deployment of tasks to a target environment.

Dump the Task Registry of a CLI

The task registry describes the set of tasks a CLI program (or multiple CLI programs) is able to execute. As described in the Task CLI, each such program shall be able to export this document in order to "describe itself":

my_task_cli --dump-task-definitions

Combining multiple Task Registries

It is possible to merge the registry files of multiple CLI programs by using an external program such as jq to concatenate the arrays of the two JSON files. Suppose you exported the task registry of two CLIs to the files a.json and b.json, then run the following command to concatenate these to c.json.

jq -s 'map(.[])' a.json b.json > c.json

(Generating the) Task Client Package

The idea of HQS Tasks is to run a task by simply calling a Python function (which we call client function) in a regular Python script (which we call user script). This function however is isolated from the actual task implementation and does not directly run it; however, it serves as a proxy for this. Technically speaking, this client function is just a wrapper (interface) to tell the task execution backend that a task shall be invoked, and to wait until the backend reports that the task finished executing, finally returning the result.

This "bigger picture" is also described in the HQS Task User Documentation, specifically in the Architecture section.

What are Client Functions?

The client functions for tasks are auto-generated by a tool provided by HQS. For each task, an individual client function is generated. To give you a better idea, the following (simplified) code and explanation of these generated client functions are provided, corresponding to two hypothetic tasks:

from hqs_tasks_execution import execute

from hqs_task_example import InputModel, OutputModel

async def hello(message: str) -> str:
    return await execute("hello", "1.2.3", message, str)

async def other_task(input: InputModel) -> OutputModel:
    return await execute("other_task", "1.2.3", input, OutputModel)

We observe:

  • The shape of the body of the client functions is always the same: a general execute function is being invoked. This function is implemented in the general client package hqs_tasks_execution (completely independent of any concrete task).
  • The task name is passed as a string to that function. The client function's name matches the task name.
  • Furthermore, the version is supplied (here: 1.2.3). This corresponds to the task version.
  • Then, the input (argument) is passed, i.e. forwarded exactly as it was provided by the caller.
  • Not visualized here, but easily imaginable: The input (argument) and output (return) types match the ones in the task definition. These types are imported from that package (where the tasks have been defined).
  • The output (return) type is passed as an argument to the general execute function. The reason for this is that after the task returned a result, we will create an instance of that type class in order to not only force type-safety but also to return an instance of the correct class at runtime, i.e. supporting isinstance checks instead of just duck-typing.
  • The client function is asynchronous. This is due to the general client's internal logic which waits for the task to be completed. This shall not be blocking other code in case the client script also processes other things or runs multiple tasks concurrently.

Note some (less relevant) technical details are kept unmentioned here to not confuse the reader too much.

What is the Client Package?

Now, the client package is not much more than just several of these (auto-generated) client functions. The code generator takes a task registry file and generates a Python package. In its current implementation, for each task a separate module is created, containing the client function, required import statements as well as, in some cases, generated Python code which defines Pydantic models.

The latter happens for all (sub-)schemas of the input and output for which it is not possible to just import an existing model from a Python package, which is the case when one of the following is true:

  • The schema was not generated from some code written in Python but instead in a different language (or without our provided task decorator).
  • For tasks which have multiple arguments, these will be wrapped in a tuple for which a class is generated mostly for internal reasons.
  • For types which are not supported by Pydantic out of the box, a generated code block will define a helper type which merely annotates the actual type with some information relevant for Pydantic to support validation and serialization.
  • The generator can be explicitly told to not import a specific package and instead, when the task uses models from that package, generate code for these. Note that some features, such as constraints or extra logic, will get "lost in translation", so this should be used with caution. See also below under Adding and Black-Listing Sources.

How to Generate the Client Package

The generator is a separate tool installable via the Python package hqs_tasks_generator using hqstage:

hqstage install hqs_tasks_generator

Suppose you have a task registry file under the path tasks.json. Then you can generate the corresponding client package using the command line

hqs-tasks-generate --registry-file tasks.json \
    --python-package-name my_tasks_client

In this case, the generated Python package will be named my_tasks_client and will be located under the path ./generated/python/. The latter can be customized using the option --target. The package specifies its version explicitly. Per default, this is taken from the version of the tasks (which only is possible if tall tasks have the same version). Otherwise, this can be explicitly specified using --python-package-version. For more options, read the usage by executing hqs-tasks-generate --help.

Adding and Black-Listing Sources

Normally, when using our task decorator, any model found in the signature of a task function (argument and return types) will be marked with a "source" in the generated task definition and registry. Hence, when using the generator to generate the client package, these types will be imported from the source package where they have been defined. This also has the consequence that the source package will be a dependency of the generated cilent package.

There might be cases where this might not be wanted, most notably when the source package is not public. In that case we want to black-list this package before running the client package code generator.

There might also be cases of the other way around: a source could not be determined or simply is not annotated in the task registry file, for example when the task was not been implemented using our Python framework with the task decorator, but there is a Python package which defines a model class which is compatible with the JSON schema. In this case we can add that information to the task registry prior to running the code generator.

For both cases there is a supplement CLI tool, both of which work the same way: You pass it the original registry file, let's call it tasks.json, and a target file name, tasks_modified.json, as well as a list of package names for which sources need to be added (or removed) from the task definitions.

hqs-tasks-add-sources tasks.json tasks_modified.json \
    --add-package package_which_defines_models
hqs-tasks-blacklist-sources tasks.json tasks_modified.json \
    --remove-package package_which_defines_models

Note that in the first case - when adding source information - for identifying the models we match the class name with where in the root JSON schema the (referenced) sub-schema is located (requiring it to be located under some path like /$defs/MyModel). This method may be changed using the --identiy-by switch, accepting one of the following values: name (default), title, title_type, title_type_properties, title_description. The identification then happens by the mentioned fields in the JSON schema.

Containerization of Tasks

In order to let the HQS Tasks backend execute a task, the task implementation needs to be deployed on some target environment.

When to Containerize and When To Not

Depending on which backend, i.e. environment type, is chosen for this, the task first needs to be containerized. Simply speaking, containerization is a way of packaging some software with the dependencies so it can be executed (almost) anywhere; most notably in the case of a Python software the Python stack is included in that container.

Not all backends supported by HQS Tasks need containers to run the software. However, the others require other means to ensure that the requirements, i.e. the Python stack and the dependant packages, are installed on the execution target environment.

At the moment of writing this, the only backend which uses containers is the REST backend, which utilizes AWS Batch for executing tasks. This AWS service is based on docker containers, hence we need to build one.

Prerequisites

You can ship several tasks, accessible by one or several task CLIs, in the same container. For the following, we assume that you already have a single task registry file which describes all tasks to be containerized, and you know what dependencies these tasks require.

In order to build a container, you need to have docker installed. Alternatives exist, but are not mentioned in this documentation.

To include your task implementation in the Docker container, you have essentially two options:

  1. Make it available as an installable Python package via some Python package index.
  2. Make the source code available to the build context.

We'll describe both methods below.

Choice of Base Contaienr

When building a docker container, we start with a so-called base container and add our software and its dependencies to it. To keep things simple, we start off with an official Python base container, such as python:3.13. Again, alternatives exist, but are not documented here.

Writing the Dockerfile

The so-called "Dockerfile" is the receipt for building a container; it lists several steps which are to be performed by the docker build system in order to create the docker image.

Without much detailed explanation, the following Dockerfile shall serve as an example, which can be easily adapted to other task projects.

# We build on top of the following base container.
FROM python:3.13

# Option 1: Install everything via pip.
RUN pip install hqs_task_example

# Option 2: Install from source code.
COPY ./ ./
RUN pip install ./

For option 1, we assume the Dockerfile is stored in the Python package root folder (from which it is installable). Adapt the paths if this assumption does not match your situation.

Building the Container

To now build the container, simply run the following command within the folder where you have put the Dockerfile. The name after -t is going to be the name of your docker container image, the so-called "tag".

docker build . -t hqs_task_example

Testing the Container

After building the container, you can run it with

docker run hqs_task_example

Deployment

The deployment of a task inherently depends on the backend on which it is supposed to be executed.

Note the additional general notes below, independent of the backend.

Backend: Local

For a task to be executable locally, nothing special needs to be done other than installing itself in the current Python environment. This is recommended when actively developing a task, and usually you want to install the package in editable mode.

For example, suppose you develop your task in a project folder my_task, run the following command:

pip install -e ./my_task

Then, generate the client package and install it too - here we again recommend the editable mode:

pip install -e ./generated/python

Backend: REST

Deploying tasks to the REST backend (the HQS cloud) is currently only possible by HQS developers.

If you are an HQS developer, deployment to REST is automatically done when using the CI/CD template.

Backend: Slurm (direct)

The deployment process for this backend furthermore depends on how you're going to setup and use Python environments.

For example, if you are using conda / micromamba (like shown in the example configuration in the user documentation), install the corresponding task package there.

General Notes

Independent of the backend, please note the following.

Version Matching

The client and the task itself need to be installed in the same version.

General Requirements in Client Environment (for First Deployment)

Additionally (and maybe prior) to the above, at least once you need to install hqs-tasks-execution in the client environment using hqstage:

hqstage install hqs-tasks-execution

Note that this needs to be done in the same environment your task is going to be submitted (client script).

General Requirements in Target Environment (for First Deployment)

Additionally (and maybe prior) to the above, at least once you need to install hqs-task-execute in the target environment using hqstage there:

hqstage install hqs-task-execute

Note that this needs to be done in the same environment your task is going to be executed (i.e, for Slurm: the target on the Slurm nodes and not in the environment you use locally to submit tasks). Maybe you first need to install hqstage there, too.

Then, also at least once, install the additional requirements for the above which can be done using the provided helper program:

hqs-task-execute-install-requirements