# Configuration using YAML ## Why YAML? PyKale has been designed such that users can configure machine learning models and experiments without writing any new Python code. This is achieved via a human and machine readable language called [YAML](https://en.wikipedia.org/wiki/YAML). Well thought out default configuration values are first stored using the [YACS](https://github.com/rbgirshick/yacs) Python module in a `config.py` file. Several customized configurations can then be created in respective `.yaml` files. This also enables more advanced users to establish their own default and add new configuration parameters with minimal coding. By separating code and configuration, this approach can lead to better [reproducibility](https://en.wikipedia.org/wiki/Reproducibility). ## A simple example The following example is a simple [YAML file `tutorial.yaml`](https://github.com/pykale/pykale/blob/main/examples/digits_dann/configs/tutorial.yaml) used by the [digits tutorial notebook](https://github.com/pykale/pykale/blob/main/examples/digits_dann/tutorial.ipynb): ```{YAML} DAN: METHOD: "CDAN" DATASET: NUM_REPEAT: 1 SOURCE: "mnist" VALID_SPLIT_RATIO: 0.5 SOLVER: MIN_EPOCHS: 0 MAX_EPOCHS: 3 OUTPUT: PB_FRESH: None ``` Related configuration settings are grouped together. The group headings and allowed values are stored in a [separate Python file `config.py`](https://github.com/pykale/pykale/blob/main/examples/digits_dann/config.py) which many users will not need to refer to. The headings and parameters in this example are explained below: | Heading / Parameter | Meaning | Default | | --- | --- | --- | | **DAN** | Domain Adaptation Net | *None* | | METHOD | Type of DAN: `CDAN`, `CDAN-E`, or `DANN` | `CDAN` | |**DATASET** | Dataset (for training, testing and validation ) | *None* | | NUM_REPEAT | Number of times the training and validation cycle will be run | `10` | | SOURCE | The source dataset name | `mnist` | | VALID_SPLIT_RATIO | The proportion of training data used for validation | `0.1` | | **SOLVER** | Model training parameters | *None* | | MIN_EPOCHS | The minimum number of training epochs | `20` | | MAX_EPOCHS | The maximum number of training epochs | `120` | | **OUTPUT** | Output configuration | *None* | | PB_FRESH | Progress bar refresh option | `0` (disabled) | The tutorial YAML file `tutorial.yaml` above overrides certain defaults in `config.py` to make the machine learning process faster and clearer for demonstration purposes. ## Customization for your applications Application of an example to your data can be as simple as creating a new YAML file to (change the defaults to) specify your data location, and other preferred configuration customization, e.g., in the choice of models and/or the number of iterations.