Tutorial
For interactive tutorials, see Jupyter Notebook tutorials.
Usage of Pipeline-based API in Examples
The kale API has a unique pipeline-based API design. Each example typically has three essential modules (main.py, config.py, model.py), one optional directory (configs), and possibly other modules (trainer.py):
main.pyis the main module to be run, showing the main workflow.config.pyis the configuration module that sets up the data, prediction problem, and hyper-parameters, etc. The settings in this module is the default configuration.configsis the directory to place customized configurations for individual runs. We use.yamlfiles for this purpose.
model.pyis the model module to define the machine learning model and configure its training parameters.trainer.pyis the trainer module to define the training and testing workflow. This module is only needed when NOT usingPyTorch Lightning.
Next, we explain the usage of the pipeline-based API in the modules above, mainly using the domain adaptation for digits classification example.
The
kale.pipelinemodule provides mature, off-the-shelf machine learning pipelines for plug-in usage, e.g.import kale.pipeline.domain_adapter as domain_adapterindigits_dann’smodelmodule.The
kale.utilsmodule provides common utility functions, such asfrom kale.utils.seed import set_seedindigits_dann’smainmodule.The
kale.loaddatamodule provides the input to the machine learning system, such asfrom kale.loaddata.image_access import DigitDataseindigits_dann’smainmodule.The
kale.prepdatamodule provides pre-processing functions to transform the raw input data into a suitable form for machine learning, such asimport kale.prepdata.image_transform as image_transforminkale.loaddata.image_accessused indigits_dann’smainmodule for image data augmentation.The
kale.embedmodule provides embedding functions (the encoder) to learn suitable representations from the (pre-processed) input data, such asfrom kale.embed.image_cnn import SmallCNNFeatureindigits_dann’smodelmodule. This is a machine learning module.The
kale.predictmodule provides prediction functions (the decoder) to learn a mapping from the input representation to a target prediction, such asfrom kale.predict.class_domain_nets import ClassNetSmallImageindigits_dann’smodelmodule. This is also a machine learning module.The
kale.evaluatemodule implements evaluation metrics not yet available, such as the Concordance Index (CI) for measuring the proportion of concordant pairs.The
kale.interpretmodule aims to provide functions for interpretation of the learned model or the prediction results, such as visualization. This module has no implementation yet.
Building New Modules or Projects
New modules/projects can be built following the steps below.
Step 1 - Examples: Choose one of the examples of your interest (e.g., most relevant to your project) to
browse through the configuration, main, and model modules
download the data if needed
run the example following instructions in the example’s README
Step 2a - New model: To develop new machine learning models under PyKale,
define the blocks in your pipeline to figure out what the methods are for data loading, pre-processing data, embedding (encoder/representation), prediction (decoder), evaluation, and interpretation (if needed)
modify existing pipelines with your customized blocks or build a new pipeline with PyKale blocks and blocks from other libraries
Step 2b - New applications: To develop new applications using PyKale,
clarify the input data and the prediction target to find matching functionalities in PyKale (request if not found)
tailor data loading, pre-processing, and evaluation (and interpretation if needed) to your application
The Scope of Support
Data
PyKale currently supports graphs, images, and videos, using PyTorch Dataloaders wherever possible. Audios are not supported yet (welcome your contribution).
Machine learning models
PyKale supports modules from the following areas of machine learning
Deep learning: convolutional neural networks (CNNs), graph neural networks (GNNs) GNN including graph convolutional networks (GCNs), transformers
Transfer learning: domain adaptation
Multimodal learning: integration of heterogeneous data
Dimensionality reduction: multilinear subspace learning, such as multilinear principal component analysis (MPCA)
Example applications
PyKale includes example application from three areas below
Image/video recognition: imaging recognition with CIFAR10/100, digits (MNIST, USPS), action videos (EPIC Kitchen)
Bioinformatics/graph analysis: link prediction problems in BindingDB and knowledge graphs
Medical imaging: cardiac MRI classification