# Tutorial

For *interactive* tutorials, see [Jupyter Notebook tutorials](notebooks.md).

## Usage of Pipeline-based API in Examples

The `kale` API has a unique pipeline-based API design. Each example typically has three essential modules (`main.py`, `config.py`, `model.py`), one optional directory (`configs`), and possibly other modules (`trainer.py`):

- `main.py` is the main module to be run, showing the main workflow.
- `config.py` is the configuration module that sets up the data, prediction problem, and hyper-parameters, etc. The settings in this module is the default configuration.
  - `configs` is the directory to place *customized* configurations for individual runs. We use `.yaml` files for this purpose.
- `model.py` is the model module to define the machine learning model and configure its training parameters.
  - `trainer.py` is the trainer module to define the training and testing workflow. This module is *only needed when NOT using `PyTorch Lightning`*.

Next, we explain the usage of the pipeline-based API in the modules above, mainly using the [domain adaptation for digits classification example](https://github.com/pykale/pykale/tree/main/examples/digits_dann).

- The `kale.pipeline` module provides mature, off-the-shelf machine learning pipelines for plug-in usage, e.g. `import kale.pipeline.domain_adapter as domain_adapter` in [`digits_dann`'s `model` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/model.py).
- The `kale.utils` module provides common utility functions, such as `from kale.utils.seed import set_seed` in [`digits_dann`'s `main` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/main.py).
- The `kale.loaddata` module provides the input to the machine learning system, such as`from kale.loaddata.image_access import DigitDatase` in  [`digits_dann`'s `main` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/main.py).
- The `kale.prepdata` module provides pre-processing functions to transform the raw input data into a suitable form for machine learning, such as `import kale.prepdata.image_transform as image_transform` in `kale.loaddata.image_access` used in  [`digits_dann`'s `main` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/main.py) for image data augmentation.
- The `kale.embed` module provides *embedding* functions (the *encoder*) to *learn* suitable representations from the (pre-processed) input data, such as `from kale.embed.image_cnn import SmallCNNFeature` in [`digits_dann`'s `model` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/model.py). This is a machine learning module.
- The `kale.predict` module provides prediction functions (the *decoder*) to *learn* a mapping from the input representation to a target prediction, such as `from kale.predict.class_domain_nets import ClassNetSmallImage` in [`digits_dann`'s `model` module](https://github.com/pykale/pykale/blob/main/examples/digits_dann/model.py). This is also a machine learning module.
- The `kale.evaluate` module implements evaluation metrics not yet available, such as the Concordance Index (CI) for measuring the proportion of [concordant pairs](https://en.wikipedia.org/wiki/Concordant_pair).
- The `kale.interpret` module aims to provide functions for interpretation of the learned model or the prediction results, such as visualization. This module has no implementation yet.

## Building New Modules or Projects

New modules/projects can be built following the steps below.

- Step 1 - Examples: Choose one of the [examples](https://github.com/pykale/pykale/tree/main/examples) of your interest (e.g., most relevant to your project) to
  - browse through the configuration, main, and model modules
  - download the data if needed
  - run the example following instructions in the example's README
- Step 2a - New model: To develop new machine learning models under PyKale,
  - define the blocks in your pipeline to figure out what the methods are for data loading, pre-processing data, embedding (encoder/representation), prediction (decoder), evaluation, and interpretation (if needed)
  - modify existing pipelines with your customized blocks or build a new pipeline with PyKale blocks and blocks from other libraries
- Step 2b - New applications: To develop new applications using PyKale,
  - clarify the input data and the prediction target to find matching functionalities in PyKale (request if not found)
  - tailor data loading, pre-processing, and evaluation (and interpretation if needed) to your application

## The Scope of Support

### Data

PyKale currently supports graphs, images, and videos, using PyTorch Dataloaders wherever possible. Audios are not supported yet (welcome your contribution).

### Machine learning models

PyKale supports modules from the following areas of machine learning

- Deep learning: convolutional neural networks (CNNs), graph neural networks (GNNs) GNN including graph convolutional networks (GCNs), transformers
- Transfer learning: domain adaptation
- Multimodal learning: integration of heterogeneous data
- Dimensionality reduction: multilinear subspace learning, such as multilinear principal component analysis (MPCA)

### Example applications

PyKale includes example application from three areas below

- Image/video recognition: imaging recognition with CIFAR10/100, digits (MNIST, USPS), action videos (EPIC Kitchen)
- Bioinformatics/graph analysis: link prediction problems in BindingDB and knowledge graphs
- Medical imaging: cardiac MRI classification