Static data#
Wellies defines static data as anything that should be made available in a common suite working environment. The data source can be local or remote. By defining different types, wellies will associate to the data reference a particular retrieval script. The options are:
- link: This only works with local data. The associated script will create a symbolic link to the source directory in the suite target directory.
- copy: The associated script will copy the source directory in the suite target directory. If
filesis specified, will only copy those. - rsync: The associated script will rsync the source directory in the suite target directory. If
filesis specified, will only copy those. Extra options for the rsync command can be provided withrsync_options; the default is to use"-avzpL". - git: The associated script will clone a repository
branchon the suite target directory. Iffilesis specified it will clone on to a temporary directory,git, then rsync everything infiles, therefore it also acceptsrsync_options. - ecfs: The associated script copies data from the
ECFSremote archive in the suite directory. Iffilesis specified it will only copy those. - mars: The associated script will be a
MARSrequest. All the keys should be given in therequestoption. For more details, on the MARS request option check here - custom: This is a wildcard option that natively does nothing, but the user can specify a custom script to use using options
pre_scriptorpost_script.
All scripts can be extended by using the options pre_script and post_script.
Examples#
Link data#
Configuration entry and assuming DATA_DIR is well a defined suite variable.
wellies data script snippet
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/large_datasets
rm -rf $dest_dir
ln -sfn /path/to/dir $dest_dir
if [[ -L $dest_dir && -d $(readlink $dest_dir) ]]; then
echo Link and directory exist
else
echo Link or directory does not exist
exit 1
fi
cd $dest_dir
Copy data#
Configuration entry and assuming DATA_DIR is a well defined suite variable.
static_data:
copy_data:
type: copy
source: /path/to/dir/file.txt
post_script: "echo 'Copy Done'"
wellies data script snippet
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/copy_data
rm -rf $dest_dir
mkdir -p $dest_dir
scp /path/to/dir/file.txt $dest_dir/
cd $dest_dir
# Post-script
echo 'Copy Done'
or if using files option
static_data:
copy_data:
type: copy
source: /path/to/dir
files: file.txt
post_script: "echo 'Copy Done'"
The result is equivalent
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/copy_data
rm -rf $dest_dir
mkdir -p $dest_dir
scp /path/to/dir/file.txt $dest_dir/
cd $dest_dir
# Post-script
echo 'Copy Done'
Rsync data#
Configuration entry and assuming DATA_DIR is a well defined suite variable. In this example, we also use post_script as a reference to a existing file script. If not using absolute paths, it will be relative to where the deployment has been executed.
static_data:
copy_data:
type: rsync
source: hpc-login:/path/to/dir/
files:
- dis.nc
- scov.nc
rsync_options: "-avz"
post_script: "install.sh"
wellies data script snippet
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/copy_data
rsync -avz /path/to/dir/dis.nc /path/to/dir/scov.nc $dest_dir/
cd $dest_dir
# Post-script
echo 'running after every other command'
echo 'bye bye'
Git data#
Configuration entry and assuming DATA_DIR is a well defined suite variable.
static_data:
git_data:
type: git
source: "git.example.com/repo.git"
branch: main
pre_script: "git config --global user.name 'John Doe'"
wellies data script snippet
# Pre-script
git config --global user.name 'John Doe'
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/git_data
rm -rf $dest_dir
giturl=git.example.com/repo.git
gitbranch=main
git clone $giturl --branch $gitbranch --single-branch --depth 1 $dest_dir
cd $dest_dir
if files is specified the repository is cloned onto a temporary directory:
static_data:
git_data:
type: git
source: "git.example.com/repo.git"
branch: main
files:
- static/dem.nc
- static/metadata_template.grb
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/git/git_data
rm -rf $dest_dir
giturl=git.example.com/repo.git
gitbranch=main
git clone $giturl --branch $gitbranch --single-branch --depth 1 $dest_dir
cd $dest_dir
dest_dir=$DATA_DIR/git_data
rsync -avzpL $DATA_DIR/git/git_data/static/dem.nc $DATA_DIR/git/git_data/static/metadata_template.grb $dest_dir/
cd $dest_dir
echo 'cleaning build directory'
rm -rf $DATA_DIR/git/git_data
ECFS data#
Configuration entry and assuming DATA_DIR is a well defined suite variable.
static_data:
ecfs_data:
type: ecfs
source: "ec:/path/to/data"
post_script: "cd $DATA_DIR/ecfs_data && tar -xvf *.tar && cd -"
Note
for remote datasets the host needs to be specified even for ECFS type.
wellies data script snippet
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/ecfs_data
rm -rf $dest_dir
mkdir -p $dest_dir
ecp ec:/path/to/data $dest_dir/
cd $dest_dir
# Post-script
cd $DATA_DIR/ecfs_data && tar -xvf *.tar && cd -
MARS data#
Configuration entry and assuming DATA_DIR is a well defined suite variable.
static_data:
mars_data:
type: mars
request:
class: od
type: an
expver: "1"
date: "19990215"
time: "12"
param: t
levtype: "pressure level"
levelist: [1000, 850, 700, 500]
target: t.grb
post_script: "pproc-interpol --grid SMUFF-OPERA-2km-proj t.grb t_2km.grb"
Note
the Mars data type does not accept a source option.
wellies data script snippet
# Main script for retrieving data
mkdir -p $DATA_DIR
dest_dir=$DATA_DIR/mars_data
mkdir -p $dest_dir
cd $dest_dir
mars << EOF
retrieve,
class=od,
type=an,
expver=1,
date=19990215,
time=12,
param=t,
levtype=pressure level,
levelist=1000/850/700/500,
target="t.grb"
EOF
# Post-script
pproc-interpol --grid SMUFF-OPERA-2km-proj t.grb t_2km.grb
Custom data#
static_data:
custom_data:
type: custom
pre_script: "retrieve_data.sh"
post_script:
- "cd $DATA_DIR/custom_data"
- "tar -xvf *.tar"
- "cd -"
wellies data script snippet