Configuration using YAML

Why YAML?

PyKale has been designed such that users can configure machine learning models and experiments without writing any new Python code. This is achieved via a human and machine readable language called YAML. Well thought out default configuration values are first stored using the YACS Python module in a config.py file. Several customized configurations can then be created in respective .yaml files.

This also enables more advanced users to establish their own default and add new configuration parameters with minimal coding. By separating code and configuration, this approach can lead to better reproducibility.

A simple example

The following example is a simple YAML file tutorial.yaml used by the digits tutorial notebook:

DAN:
  METHOD: "CDAN"

DATASET:
  NUM_REPEAT: 1
  SOURCE: "svhn"
  VALID_SPLIT_RATIO: 0.5

SOLVER:
  MIN_EPOCHS: 0
  MAX_EPOCHS: 3

OUTPUT:
  PB_FRESH: None

Related configuration settings are grouped together. The group headings and allowed values are stored in a separate Python file config.py which many users will not need to refer to. The headings and parameters in this example are explained below:

Heading / Parameter	Meaning	Default
DAN	Domain Adaptation Net	None
METHOD	Type of DAN: `CDAN`, `CDAN-E`, or `DANN`	`CDAN`
DATASET	Dataset (for training, testing and validation )	None
NUM_REPEAT	Number of times the training and validation cycle will be run	`10`
SOURCE	The source dataset name	`mnist`
VALID_SPLIT_RATIO	The proportion of training data used for validation	`0.1`
SOLVER	Model training parameters	None
MIN_EPOCHS	The minimum number of training epochs	`20`
MAX_EPOCHS	The maximum number of training epochs	`120`
OUTPUT	Output configuration	None
PB_FRESH	Progress bar refresh option	`0` (disabled)

The tutorial YAML file tutorial.yaml above overrides certain defaults in config.py to make the machine learning process faster and clearer for demonstration purposes.

Customization for your applications

Application of an example to your data can be as simple as creating a new YAML file to (change the defaults to) specify your data location, and other preferred configuration customization, e.g., in the choice of models and/or the number of iterations.