Configuration using YAML

Why YAML?

PyKale has been designed such that users can configure machine learning models and experiments without writing any new Python code. This is achieved via a human and machine readable language called YAML. Well thought out default configuration values are first stored using the YACS Python module in a config.py file. Several customized configurations can then be created in respective .yaml files.

This also enables more advanced users to establish their own default and add new configuration parameters with minimal coding. By separating code and configuration, this approach can lead to better reproducibility.

A simple example

The following example is a simple YAML file tutorial.yaml used by the digits tutorial notebook:

DAN:
  METHOD: "CDAN"

DATASET:
  NUM_REPEAT: 1
  SOURCE: "svhn"
  VALID_SPLIT_RATIO: 0.5

SOLVER:
  MIN_EPOCHS: 0
  MAX_EPOCHS: 3

OUTPUT:
  PB_FRESH: None

Related configuration settings are grouped together. The group headings and allowed values are stored in a separate Python file config.py which many users will not need to refer to. The headings and parameters in this example are explained below:

Heading / Parameter Meaning Default
DAN Domain Adaptation Net None
METHOD Type of DAN: CDAN, CDAN-E, or DANN CDAN
DATASET Dataset (for training, testing and validation ) None
NUM_REPEAT Number of times the training and validation cycle will be run 10
SOURCE The source dataset name mnist
VALID_SPLIT_RATIO The proportion of training data used for validation 0.1
SOLVER Model training parameters None
MIN_EPOCHS The minimum number of training epochs 20
MAX_EPOCHS The maximum number of training epochs 120
OUTPUT Output configuration None
PB_FRESH Progress bar refresh option 0 (disabled)

The tutorial YAML file tutorial.yaml above overrides certain defaults in config.py to make the machine learning process faster and clearer for demonstration purposes.

Customization for your applications

Application of an example to your data can be as simple as creating a new YAML file to (change the defaults to) specify your data location, and other preferred configuration customization, e.g., in the choice of models and/or the number of iterations.