# Workflows ## Primitive components Workflows are directed acyclic graphs (DAGs) of individual MUSES module executions, or "processes", constructed using composite structures called "components" that consist of three primitive elements: - **process**: the fundamental unit of workflow unit that executes a single MUSES module - **group**: a set of components that execute in parallel - **chain**: a set of components that execute sequentially ## Composition A workflow specification is a data object with two key-value pairs: `processes` and `components`. Process names and component names share a namespace in which they must be unique. ### Processes #### Definition The value of `processes` is an unordered list of process objects defining unique MUSES module configurations. A **process** definition consists of - `module`: (string) Label that identifies which MUSES module to execute. The available values are defined in the Calculation Engine configuration file and can also be viewed on [the module list webpage](https://ce.musesframework.io/ce/modules/). - `name`: (string) Unique label **within the workflow** associated with this particular configuration of the specified module. - `config`: (object) An object specified by key-value pairs that defines the configuration of the specified module. See the module-specific documentation for details of the config object schema. There can be several processes defined that execute the same module but with distinct configurations. In the examples below there are workflows in which the Lepton module is invoked several times but configured differently depending on which EoS module preceeds it. #### Inputs There are three options for providing external input files to a process (aside from the module config), illustrated in the example workflow snippet below. All inputs are specified under an `inputs` or `pipes` mapping, where the keys are named according to the declared label of the target module input. The schema for the input source spec varies according to the input option used as explained below. The first process shows how to input an uploaded file. The `uuid` is an random string assigned by the CE upon upload that uniquely identifies it. If the upload is not owned by the user submitting the workflow, the upload must be set to public by the owner. The `checksum` string is the md5sum of the file, which must be known in advance. If the uploaded file does not match the specified checksum, the workflow will fail. The second process shows how to input a file generated by a previously executed workflow. In general for this to work, the referenced job must be saved to avoid it being purged by the periodic garbage collection performed by the CE to conserve disk space. Because a job typically generates multiple output files, the `path` string is required to uniquely identify the desired file. The third process shows how to input a file generated by a previous process in the current workflow. The `module` and `process` values must match the module name and process name of an item in the `processes` list to uniquely identify the process that generated the desired file. The `label` string indicated which of the consuming process's inputs should receive the input file. ```yaml processes: - name: crust_dft_eos module: crust_dft inputs: EOS_table: type: upload uuid: d1ed1c63-6192-4ac9-9cb1-a7d82dc27b72 checksum: 164575f9d84c3ac087780e0219ee2e8a config: output_format: CSV - name: lepton-crust_dft module: lepton inputs: input_eos: type: job uuid: 57388fe3-6932-4b45-b1d0-63463cc828ac path: /cmf/opt/output/CMF_output_for_Lepton_baryons.csv config: global: use_charge_neutrality: true - name: qlimr-crust_dft module: qlimr pipes: eos: label: eos_beta_equilibrium module: lepton process: lepton-crust_dft config: inputs: R_start: 0.0004 ``` ### Components The value of `components` is an ordered list of component objects that are specified recursively. This means that the first component in the list may only reference processes, and subsequent component definitions may reference processes and/or components previously defined. Any component used in the definition of another component is called a subcomponent, and because components can only reference previously defined components in the list, there can be no circular references. Thus, the last component defined in this list is actually the entire workflow that is executed; any subcomponent defined but not recursively referenced within this top-level component is ignored. A **component** definition consists of - `type`: (string) Either `chain` or `group`. - `name`: (string) Unique label **within the workflow** referencing this component in subsequent component definitions. - `sequence`/`group`: (list) A list of process or component names. If the component is type `chain`, then `sequence` is the key and the value is a list of subcomponents to be executed sequentially. If the component is type `group`, then `group` is the key and the value is a list of subcomponents to be executed in parallel. ## Examples The examples below use YAML format to define workflow configurations, because this format is easy to read for humans while supporting the rigorous syntax required to unambiguously define a data structure suitable for machines. Ultimately the workflow definition must be rendered in JSON format suitable for [the Calculation Engine API](https://ce.musesframework.io/swagger-ui/), but as demonstrated [in the tutorial](./tutorial/Readme.html), this conversion can be done transparently by any number of libraries such as Python `requests`. ### Chain A chain is a sequence of components that are executed in order, where previous components in the sequence must successfully complete before the next component is processed. Chain components are required when components are causally dependent on one another. In the example below, the Chiral EFT module pipes its output to the Lepton module, which must only run if the first process completes successfully. ```yaml processes: - name: chiral_eft_eos module: chiral_eft config: run_name: 'test_chiral_eft_lepton' chiraleft_parameters: fitted_parameter_set: 'n3lo-450' calculation_options: use_multithreading: true use_quadratic_asymmetry_expansion: true eos_grid: density_start: 0.032 density_end: 0.32 density_step: 0.032 isospin_asymmetry_start: 0.0 isospin_asymmetry_end: 1.0 isospin_asymmetry_step: 0.25 - name: lepton-module module: lepton config: global: use_beta_equilibrium: true use_charge_neutrality: false verbose: 2 output: output_derivatives: true output_hdf5: false particles: use_electron: true use_muon: true pipes: input_eos: label: ChEFT_Output_Lepton module: chiral_eft process: chiral_eft_eos components: - type: chain name: workflow sequence: - chiral_eft_eos - lepton-module ``` ### Group A group is set of components that are allowed to run in parallel. Concurrent execution is not actually guaranteed, however, because that depends on the Calculation Engine task queue system and dynamic worker load. Parallel here means that the output of the components in a group do not causally depend on one another. In the example below, there are two chains that execute in parallel. One chain outputs the EoS generated by CMF to the Lepton module. The other chain outputs the EoS generated by Chiral EFT to an independent Lepton module process. Note: * The name of the Lepton module *processes* must be unique for unambiguous reference when defining components. * The order of the process definitions in the `processes` block does not matter. * The order of the chain *components* (`chain1` and `chain2`) in the `group1` definition does not matter. ```yaml processes: - name: chiral_eft_eos module: chiral_eft config: run_name: 'test_chiral_eft_lepton' chiraleft_parameters: fitted_parameter_set: 'n3lo-450' calculation_options: use_multithreading: true use_quadratic_asymmetry_expansion: true eos_grid: density_start: 0.032 density_end: 0.32 density_step: 0.032 isospin_asymmetry_start: 0.0 isospin_asymmetry_end: 1.0 isospin_asymmetry_step: 0.25 - name: cmf module: cmf_solver config: variables: chemical_optical_potentials: muB_begin: 1000.0 muB_end: 1400.0 muB_step: 200.0 muQ_begin: -300.0 muQ_end: 0.0 muQ_step: 150.0 output_options: include_output_lepton: true - name: lepton1 module: lepton config: global: use_beta_equilibrium: true use_charge_neutrality: false pipes: input_eos: label: ChEFT_Output_Lepton module: chiral_eft process: chiral_eft_eos - name: lepton2 module: lepton config: global: use_beta_equilibrium: false pipes: input_eos: label: CMF_for_Lepton_baryons_only module: cmf_solver process: cmf components: - type: chain name: chain1 sequence: - chiral_eft_eos - lepton1 - type: chain name: chain2 sequence: - cmf - lepton2 - type: group name: group1 group: - chain1 - chain2 ``` ### Singleton A so-called "singleton" workflow consists of a single process. This means that the workflow is identical whether a "group" or a "chain" component is defined. ```yaml processes: - name: cmf module: cmf_solver config: variables: chemical_optical_potentials: muB_begin: 1000.0 muB_end: 1400.0 muB_step: 200.0 use_hyperons: false use_decuplet: false use_quarks: false components: - type: group name: run_cmf_test group: - cmf ``` ### Complex workflow An incomplete "sketch" of the config for the complex workflow depicted in the diagram below is provided here. The purpose is to illustrate how to break down the desired structure and construct it logically piece by piece. It is left as an exercise for the reader to complete the missing "pipes" connecting the output of processes to the consuming processes and to add the missing `config:` specification for each process. ![complex_workflow_example.png](complex_workflow_example.png) ```yaml processes: - name: chiral module: chiral_eft - name: cmf module: cmf_solver - name: crust module: crust_dft - name: lepton1 module: lepton pipes: input_eos: label: CMF_for_Lepton_baryons_only module: cmf_solver process: cmf - name: lepton2 module: lepton pipes: input_eos: label: CMF_for_Lepton_baryons_only module: cmf_solver process: cmf - name: lepton3 module: lepton pipes: input_eos: label: ChEFT_Output_Lepton module: chiral_eft process: chiral - name: lepton4 module: lepton pipes: input_eos: label: e4mma w/o lepton module: crust_dft process: crust - name: synthesis1 module: synthesis pipes: {} - name: synthesis2 module: synthesis pipes: {} - name: synthesis3 module: synthesis pipes: {} - name: qlimr module: qlimr pipes: input_eos: label: CMF_for_Lepton_baryons_only module: cmf_solver process: cmf - name: flavor module: flavor_equilibration pipes: {} components: - type: group name: group-leptons group: - lepton1 - lepton2 - type: chain name: chain-cmf-leptons sequence: - cmf - group-leptons - synthesis1 - type: chain name: chain-chiral-lepton sequence: - chiral - lepton3 - type: chain name: chain-crust-lepton sequence: - crust - lepton4 - type: group name: group-chiral-crust group: - chain-chiral-lepton - chain-crust-lepton - type: chain name: chain-chiral-crust-synthesis sequence: - group-chiral-crust - synthesis2 - type: group name: eos-syntheses group: - chain-cmf-leptons - chain-chiral-crust-synthesis - type: chain name: final-synthesis sequence: - eos-syntheses - synthesis3 - type: group name: group-observables group: - qlimr - flavor - type: chain name: final-observables sequence: - final-synthesis - group-observables ```