Tools#

Wellies defines a Tool as anything that needs to be made available on the execution environment of one or more of your workflow's tasks. They can be dependent on each other and contain up to three script snippets that will be used on different contexts:

  • setup: Defines how a tool can be installed on the suite's workspace.
  • load: For any occasion when preparation action needs to be done to make a tool discoverable for use.
  • unload: The opposite operation of load. Makes the tool not available anymore.

Tool types#

  • Environment variable: general environment variables with name and value that can be set at configuration level.
  • Module: They are just loaded/unloaded from the system, i.e., does not have a setup. At configuration level the options are name and version. Can be a general Module or Private which requires an extra option modulefiles.
  • Package: They are installables that can be retrieved from local or remote locations. The options are similar to the ones for the different static data types and the post_script option can be used to reference to custom installation script snippets or commands. They don't add load or unload scripts, so usually they are associated to an environment where they can be made discoverable. A custom build_dir can be provided and that will point the retrieval part of the setup script to that location.
  • Python virtual environment: This is a shortcut to define python virtual environments that will be built using venv. Wellies supports two types
    • type: system_venv: Creates an environment that is based on the system-wide python installation, meaning the creation option will contain the --system-site-packages argument. Extra external packages can be installed locally to the environment using the extra_packages option.
    • type: venv: Creates a file-based environment without the --system-site-packages argument.
  • Conda environment: Wellies supports three types of conda environments using different building strategies: from a specification file when env_file is present, with a list of packages provided when extra_packages is present, or no build at all with a reference to an existing environment when environment is present. Extra options for the conda commands can be set using the conda_cmd option. The env_file behaves just as any other data object, so any of the static data types can be use to define how the specification file needs to be retrieved.
  • Folder environment: This is namespace environment type where packages can be installed with appropriate dependencies. The created namespace will be added to the system PATH for discoverability.
  • Custom environment: Custom environment with custom load, unload and setup commands or scripts.

All Tool types also accept a depends option where a list of dependencies can be defined by each tool name. Environment types accept a packages option where previously defined package Tools can be installed on that same environment

Warning

Please be aware that configuration files are parsed sequentially, so the dependency tree must be defined accordingly.

Examples#

Environment variables#

tools.yaml
tools:
  env_variables:
    PYTHONPATH:
      value: "$LIB_DIR/python:$LIB_DIR/bin"
    HDF5:
      variable: HDF5_USE_FILE_LOCKING
      value: "FALSE"

The wellies' generated snippets environment variables will be

--------setup script-----------

--------load script-----------

export PYTHONPATH=$LIB_DIR/python:$LIB_DIR/bin:${PYTHONPATH:-}

--------unload script-----------

echo 'removing $LIB_DIR/python:$LIB_DIR/bin from $PYTHONPATH'
export PYTHONPATH=${PYTHONPATH/$LIB_DIR/python:$LIB_DIR/bin:/}
echo '$PATH after removing' && echo $PYTHONPATH

If variable is provided, the high-level key will be an alias that you can refer to in the python suite generator code of your suite that will point to the actual variable name on the system. The snippets will reflect that.

--------setup script-----------

--------load script-----------

export HDF5_USE_FILE_LOCKING=FALSE

--------unload script-----------

unset $HDF5_USE_FILE_LOCKING

Note

any environment variable ending with "PATH" will have a special treatment and write the value in append mode.

Modules#

tools.yaml
tools:
  modules:
    ecmwf-toolbox:
      version: "2023.10.1.0"
    mymodule:
      modulefiles: /path/to/dev/module

The wellies' generated snippets for the system module will be

--------setup script-----------

--------load script-----------

set +ux
module unload ecmwf-toolbox || true
module load ecmwf-toolbox/2023.10.1.0
set -ux

--------unload script-----------

module unload ecmwf-toolbox

And for private modules

--------setup script-----------

--------load script-----------

module use /path/to/dev/module

set +ux
module unload mymodule || true
module load mymodule/default
set -ux

--------unload script-----------

module unload mymodule

You can also differentiate between the configuration key and the actual module name by providing a value for name. This makes it easier for cross-referencing on dependencies trees while using various versions.

tools.yaml
tools:
  modules:
    python:
      name: python3
      version: "3.10.10-01"
    python_old:
      name: python3
      version: "old"
    pyflow:
      version: "3.2.0"
      depends: ["python"]
    pcraster:
      version: "4.3.0-01"
      depends: ["python_old"]

After combined into a [wellies.ToolStore][] object, the dependencies can be resolved accordingly. Using the configuration above, the pyflow tool will contain the following snippets:

--------setup script-----------

--------load script-----------

set +ux
module unload python3 || true
module load python3/3.10.10-01
set -ux

set +ux
module unload pyflow || true
module load pyflow/3.2.0
set -ux

--------unload script-----------

module unload python3

module unload pyflow

Packages#

tools.yaml
tools:
  packages:
    earthkit:
      type: "git"
        source: "git@github.com:ecmwf/earthkit-data.git"
        branch: "develop"
        build_dir: "/tmp/git/files"
        post_script: [
          "pip uninstall earthkit",
          "pip install .",
          "pip show src | grep Version > version.txt",
        ]
    local_files:
      type: "rsync"
      source: "hpc-login:/path/to/pkg/src"
      post_script: "/path/to/installer.sh"

Considering there is a LIB_DIR environment variable pointing to where packages should be installed, the wellies' generated snippets for earthkit will be

--------setup script-----------
# Main script for retrieving data
mkdir -p /tmp/git/files

dest_dir=/tmp/git/files/earthkit
rm -rf $dest_dir
giturl=git@github.com:ecmwf/earthkit-data.git
gitbranch=develop
git clone $giturl --branch $gitbranch --single-branch --depth 1 $dest_dir
cd $dest_dir

# Post-script
pip uninstall earthkit
pip install .
pip show src | grep Version > version.txt

ecflow_client --label=version $(if [[ -f version.txt ]]; then cat version.txt; else echo NA; fi)

--------load script-----------


--------unload script-----------

The package signature is based on the StaticData object. For supported types for different retrieval strategies, please check this page. Further customization in the setup process is provided by the post_script option that can accept either a literal script or a reference to an existing file.

So, using the example above once again, local_files will have the following generated snippets:

installer.sh
echo "Hello from install file"
cd $LIB_DIR/local_files && make install
--------setup script-----------
# Main script for retrieving data
mkdir -p $LIB_DIR/build/${ENV_NAME:-}

dest_dir=$LIB_DIR/build/${ENV_NAME:-}/local_files
rsync -avzpL hpc-login:/path/to/pkg/src  $dest_dir/
cd $dest_dir

# Post-script
echo "Hello from install file"
cd $LIB_DIR/local_files && make install


ecflow_client --label=version $(if [[ -f version.txt ]]; then cat version.txt; else echo NA; fi)

--------load script-----------


--------unload script-----------

Folder environment#

Folder environments are like a namespace to aggregate different tools together. Firstly, checking the environment itself:

tools.yaml
tools:
  environments:
    bin:
      type: folder

Considering there is LIB_DIR environment variable pointing to the root directory, the wellies' generated snippets will be

--------setup script-----------
rm -rf $LIB_DIR/bin
mkdir -p $LIB_DIR/bin

--------load script-----------

export PATH=$LIB_DIR/bin:${PATH:-}

--------unload script-----------

echo 'removing $LIB_DIR/bin from $PATH'
export PATH=${PATH/$LIB_DIR/bin:/}
echo '$PATH after removing' && echo $PATH

Python virtual environment#

System environment#

The following config can be used to define a local python virtual environment that extends a system wide installation:

tools.yaml
tools:
  environments:
    myvenv:
      type: system_venv
      extra_packages: "pcraster>=3.4"
      venv_options: "--upgrade"

Considering there is LIB_DIR environment variable pointing to the installation root directory, the wellies' generated snippets will be

--------setup script-----------
rm -rf $LIB_DIR/myvenv
python3 -m venv $LIB_DIR/myvenv  --system-site-packages --upgrade

source $LIB_DIR/myvenv/bin/activate
export LD_LIBRARY_PATH=$LIB_DIR/myvenv/lib:${LD_LIBRARY_PATH:=}
pip install 'pcraster>=3.4'

--------load script-----------

source $LIB_DIR/myvenv/bin/activate
export LD_LIBRARY_PATH=$LIB_DIR/myvenv/lib:${LD_LIBRARY_PATH:=}

--------unload script-----------
deactivate

Note

the reserved name packages always refers to other tools of this type specified within your configuration. For external packages use extra_packages

Warning

If provided, extra_packages must always be a list, even of one element

Build with custom packages#

The following config can be used to define a local python virtual environment that does not use the system-wide site-packages.

tools.yaml
tools:
  environments:
    datasets_env:
      type: venv
      packages: [anemoi_datasets]
      depends: [python]

Considering there is LIB_DIR environment variable pointing to the installation root directory, the wellies' generated snippets will be

--------setup script-----------
rm -rf $LIB_DIR/datasets_env
python3 -m venv $LIB_DIR/datasets_env

--------load script-----------

source $LIB_DIR/datasets_env/bin/activate
export LD_LIBRARY_PATH=$LIB_DIR/datasets_env/lib:${LD_LIBRARY_PATH:=}

--------unload script-----------
deactivate

Note

the reserved name packages always refers to other tools of this type specified within your configuration. For external packages use extra_packages.

Conda environments#

Conda environments can be defined in three ways: - Existing system environment - Built from a specification list - Built from a specification file

The loading and unloading script snippets for all of them will be same. They differ only on the way they are set up and this will be the focus here.

System environment#

A system conda environment can be configured as

tools.yaml
tools:
  environments:
    system_conda:
      type: conda
      environment: base

Considering there is LIB_DIR environment variable pointing to the installation root directory, the wellies' generated snippets will be

--------setup script-----------


--------load script-----------

set +ux
conda activate base
set -ux

--------unload script-----------
conda deactivate

Build with custom packages#

To specify a conda environment that needs to be built within your workflow from a list of packages specifications, a configuration file can look like:

tools.yaml
tools:
  environments:
    myconda:
      type: conda
      packages: ["earthkit"]
      extra_packages: ["python==3.10", "pcraster>=4.3.0", "gdal"]
      conda_cmd: "mamba -c conda-forge"

Here we also changed the conda base command to use on setup to use the mamba environment solver and to give priority to the conda-forge channel. This allows you to use any valid extra option for your conda commands.

Note

the reserved name packages always refers to other tools of this type specified within your configuration. For external packages use extra_packages

Warning

If provided, extra_packages must always be a list, even of one element

Considering there is LIB_DIR environment variable pointing to the installation root directory, the wellies' generated snippets will be

--------setup script-----------

rm -rf $LIB_DIR/myconda
mamba -c conda-forge create -p $LIB_DIR/myconda python==3.10 'pcraster>=4.3.0' gdal

--------load script-----------

set +ux
conda activate $LIB_DIR/myconda
set -ux

--------unload script-----------
conda deactivate

From file#

Another common way to specify conda environment is through yml files. A valid configuration to obtain such a file from the local filesystem is

tools.yaml
tools:
  environments:
    myconda:
      type: conda
      env_file:
        type: copy
        source: /path/to/project
        files: env.yml

The env_file specification can be any valid wellies.StaticData entry. For more details on the options available, please check the data types page.

Considering there is LIB_DIR environment variable pointing to the installation root directory, the wellies' generated snippets will be

--------setup script-----------
# Main script for retrieving data
mkdir -p $LIB_DIR/build

dest_dir=$LIB_DIR/build/myconda
rm -rf $dest_dir
mkdir -p $dest_dir
scp /path/to/project/env.yml  $dest_dir/
cd $dest_dir


rm -rf $LIB_DIR/myconda
conda env create --file $LIB_DIR/build/myconda/env.yml -p $LIB_DIR/myconda

--------load script-----------

set +ux
conda activate $LIB_DIR/myconda
set -ux

--------unload script-----------
conda deactivate

Custom Environments#

Custom environment is a flexible alternative to generate any other type of environment. Supports definition of load, unload and setup scripts or commands from the configuration files.

tools.yaml
tools:
  environments:
    myenv:
        type: custom
        load: "/path/to/env/load.sh"
        unload: "unload_myenv"

The wellies' generated snippets for the custom environment will be

--------setup script-----------
None

--------load script-----------
/path/to/env/load.sh

--------unload script-----------
unload_myenv