draco.core.task
An improved base task implementing easy (and explicit) saving of outputs.
Functions
|
Create a Task that groups a bunch of tasks together. |
Classes
|
Delete pipeline products to free memory. |
A task with logger support. |
|
|
Filter log entries by MPI rank. |
A task base that has MPI aware logging. |
|
|
Base class for MPI using tasks. |
Workaround for caput.pipeline issues. |
|
Workaround for caput.pipeline issues. |
|
A task used to configure MPI aware logging. |
|
Process a task with at most one input and output. |
- class draco.core.task.Delete[source]
Bases:
SingleTask
Delete pipeline products to free memory.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.LoggedTask[source]
Bases:
TaskBase
A task with logger support.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- property log
The logger object for this task.
- class draco.core.task.MPILogFilter(add_mpi_info=True, level_rank0=20, level_all=30)[source]
Bases:
Filter
Filter log entries by MPI rank.
Also this will optionally add MPI rank information, and add an elapsed time entry.
- Parameters:
add_mpi_info (boolean, optional) – Add MPI rank/size info to log records that don’t already have it.
level_rank0 (int) – Log level for messages from rank=0.
level_all (int) – Log level for messages from all other ranks.
Initialize a filter.
Initialize with the name of the logger which, together with its children, will have its events allowed through the filter. If no name is specified, allow every event.
- class draco.core.task.MPILoggedTask[source]
Bases:
MPITask
,LoggedTask
A task base that has MPI aware logging.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.MPITask[source]
Bases:
TaskBase
Base class for MPI using tasks.
Just ensures that the task gets a comm attribute.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.ReturnFirstInputOnFinish[source]
Bases:
SingleTask
Workaround for caput.pipeline issues.
This caches its input on the first call to process and then returns it for a finish call.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.ReturnLastInputOnFinish[source]
Bases:
SingleTask
Workaround for caput.pipeline issues.
This caches its input on every call to process and then returns the last one for a finish call.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.SetMPILogging[source]
Bases:
TaskBase
A task used to configure MPI aware logging.
- level_rank0, level_all
Log level for rank=0, and other ranks respectively.
- Type:
int or str
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- class draco.core.task.SingleTask[source]
Bases:
MPILoggedTask
,BasicContMixin
Process a task with at most one input and output.
Both input and output are expected to be
memh5.BasicCont
objects. This class allows writing of the output when requested.Tasks inheriting from this class should override process and optionally
setup()
orfinish()
. They should not overridenext()
.If the value of
input_root
is anything other than the string “None” then the input will be read (usingread_input()
) from the fileself.input_root + self.input_filename
. If the input is specified both as a filename and as a product key in the pipeline configuration, an error will be raised upon initialization.If the value of
output_root
is anything other than the string “None” then the output will be written (usingwrite_output()
) to the fileself.output_root + self.output_filename
.- save
Whether to save the output to disk or not. Can be provided as a list if multiple outputs are being handled. Default is False.
- Type:
list | bool
- attrs
A mapping of attribute names and values to set in the .attrs at the root of the output container. String values will be formatted according to the standard Python .format(…) rules, and can interpolate several other values into the string. These are:
count: an integer giving which iteration of the task is this.
- tag: a string identifier for the output derived from the
containers tag attribute. If that attribute is not present count is used instead.
key: the name of the output key.
task: the (unqualified) name of the task.
input_tags: a list of the tags for each input argument for the task.
Any existing attribute in the container can be interpolated by the name of its key. The specific values above will override any attribute with the same name.
Incorrectly formatted values will cause an error to be thrown.
- Type:
dict, optional
- tag
Set a format for the tag attached to the output. This is a Python format string which can interpolate the variables listed under attrs above. For example a tag of “cat{count}” will generate catalogs with the tags “cat1”, “cat2”, etc.
- Type:
str, optional
- output_name
A python format string used to construct the filename. All variables given under attrs above can be interpolated into the filename. Can be provided as a list if multiple output are being handled. Valid identifiers are:
count: an integer giving which iteration of the task is this.
- tag: a string identifier for the output derived from the
containers tag attribute. If that attribute is not present count is used instead.
key: the name of the output key.
task: the (unqualified) name of the task.
- output_root: the value of the output root argument. This is deprecated
and is just used for legacy support. The default value of output_name means the previous behaviour works.
- Type:
list | string
- compression
Set compression options for each dataset. Provided as a dict with the dataset names as keys and values for chunks, compression, and compression_opts. Any datasets not included in the dict (including if the dict is empty), will use the default parameters set in the dataset spec. If set to False (or anything that evaluates to False, other than an empty dict) chunks and compression will be disabled for all datasets. If no argument in provided, the default parameters set in the dataset spec are used. Note that this will modify these parameters on the container itself, such that if it is written out again downstream in the pipeline these will be used.
- Type:
dict or bool, optional
- output_root
Pipeline settable parameter giving the first part of the output path. Deprecated in favour of output_name.
- Type:
string
- nan_check
Check the output for NaNs (and infs) logging if they are present.
- Type:
bool
- nan_dump
If NaN’s are found, dump the container to disk.
- Type:
bool
- nan_skip
If NaN’s are found, don’t pass on the output.
- Type:
bool
- versions
Keys are module names (str) and values are their version strings. This is attached to output metadata.
- Type:
dict
- pipeline_config
Global pipeline configuration. This is attached to output metadata.
- Type:
dict
- Raises:
caput.pipeline.PipelineRuntimeError – If this is used as a baseclass to a task overriding self.process with variable length or optional arguments.
Initialize pipeline task.
May be overridden with no arguments. Will be called after any config.Property attributes are set and after ‘input’ and ‘requires’ keys are set up.
- draco.core.task.group_tasks(*tasks)[source]
Create a Task that groups a bunch of tasks together.
This method creates a class that inherits from all the subtasks, and calls each process method in sequence, passing the output of one to the input of the next.
This should be used like:
>>> class SuperTask(group_tasks(SubTask1, SubTask2)): >>> pass
At the moment if the ensemble has more than one setup method, the SuperTask will need to implement an override that correctly calls each.