The (Binary) File API
Under the hood, a task client uses JSON to communicate with the backend. Since JSON, however, does not efficiently encode binary data, HQS Tasks has support for efficiently transmitting binary data between tasks, by providing the binary file API.
Briefly explained, every task can write (for outputs) or read (for inputs) binary files. (This, of course, depends on the concrete task we are dealing with. Some tasks do not make use of this feature.)
Then, these files are simply referenced in the JSON input or output by their filename (or URL, as described below), making it quite straightforward for both task implementations and the client to interact with the files: by simply reading from (or writing to) files as described by the JSON input / output.
Abstract Example
All is best understood by giving an example.
Let us consider two task definitions:
read_file: The JSON input contains a reference to a file. The task reads that file. The output is the file content in the form of a string in a JSON document.write_file: The JSON input is a string. The task writes this string to a (local) file. The output is a reference (filename) to the file written.
In fact, these tasks are exposed in the hqs_task_example_client package.
Then, you could write the following client script:
from hqs_task_example_client import read_file, write_file
from hqs_tasks_execution.config import global_config, BackendConfigurationREST
global_config.backend = BackendConfigurationREST()
# Specify a file content for this example
message = "Hello world!"
# The task which writes the string to a file. The returned value is a FileRef object
# (from hqs_task.types.fileref)
file_ref = await write_file(message)
# The task which reads the file given by the FileRef object will return its contents
# again as a string.
read_back = await read_file(file_ref)
# When the above worked as intended, this assertion will be true.
assert read_back == message
Note that it is currently not possible to provide (new) binary data as an input to a task from a user script.
Remote Files
As the true power of HQS Tasks lies in executing tasks on a remote machine (e.g., in the cloud using the REST backend), the question arises how in the client script you can read the content of a file produced by a task.
In fact, a key component is added for remote backends: the uploading and downloading of files to / from a remote file storage.
If a remote backend is configured and file references are returned by a given task, the corresponding files are automatically uploaded to a remote file storage. In this case, the files written by a task are uploaded after the actual task has completed, but before your client "sees" the result, and the local file reference is being replaced with a URL which represents that file.
For the current implementation in the REST backend (which we might adjust in the future), we add a bit more explaination to this:
- A file reference in the output of a task points to a "signed S3 URL". This can be downloaded with any tool in the client, e.g., with the
requestslibrary in a Python script, or by opening the URL in a web browser. However, this link expires after some time, which we do not speficy here concretely, since it depends on several implementation details. The important fact is that it will expire at some point. - Any such URL can be used in an input to some follow-up task, like in the example above. You can even use expired URLs for that as the REST backend automatically "expands" the validity so that the task implementation can access it.
- If you encounter the situation where you want to access a link from a file reference which is expired, you need to refresh the output of the task execution by calling the task function again. Due to caching (if enabled), this will not trigger a new execution, but update the cached result with new file signatures.