Embed

Submodules

kale.embed.attention module

class kale.embed.attention.PositionalEncoding(d_model: int, max_len: int = 5000)

Bases: Module

Implements the positional encoding as described in the NIPS2017 paper ‘Attention Is All You Need’ about Transformers (https://arxiv.org/abs/1706.03762). Essentially, for all timesteps in a given sequence, adds information about the relative temporal location of a timestep directly into the features of that timestep, and then returns this slightly-modified, same-shape sequence.

Parameters:

d_model – The number of features that each timestep has (required).
max_len – The maximum sequence length that the positional encodings should support (required).

forward(x)

Expects input of shape (sequence_length, batch_size, num_features) and returns output of the same shape. sequence_length is at most allowed to be self.max_len and num_features is expected to be exactly self.d_model.

Parameters:: x – a sequence input of shape (sequence_length, batch_size, num_features) (required).

class kale.embed.attention.BANLayer(input_v_dim, input_q_dim, hidden_dim, num_out_heads, activation='ReLU', dropout=0.2, num_att_maps=3)

Bases: Module

The bilinear Attention Network (BAN) layer is designed to apply bilinear attention between two feature sets (v and q), which could represent features extracted from drugs and proteins, respectively. This layer enables the interaction between these two sets of features, allowing the model to learn joint representations that can be used for downstream tasks like predicting drug-protein interactions.

Parameters:

input_v_dim (int) – Dimensionality of the first input “value” feature set (v).
input_q_dim (int) – Dimensionality of the second input “query” feature set (q).
hidden_dim (int) – Dimensionality of the hidden layer used in the bilinear attention mechanism.
num_out_heads (int) – Number of output heads in the bilinear attention mechanism.
activation (str, optional) – Activation function to use in the fully connected networks for value (v) and query (q). Default is “ReLU”.
dropout (float, optional) – Dropout rate to apply after each layer in the fully connected networks. Default is 0.2.
num_att_maps (int, optional) – Number of attention maps to generate (used in pooling). Default is 3.

attention_pooling(v, q, att_map)

forward(v, q, softmax=False)

kale.embed.cnn module

Convolutional Neural Network (CNN) architectures for embedding and feature extraction.

This module provides a collection of CNN-based models for various tasks including: - Drug-target interaction prediction (CNNEncoder) - Protein sequence feature extraction (ProteinCNN) - CNN-Transformer hybrid architectures (CNNTransformer, ContextCNNGeneric)

All CNN implementations inherit from BaseCNN (from kale.embed.base_cnn), which provides reusable utilities for creating convolutional blocks, applying activations, weight initialization, pooling operations, and embedding layer creation.

Classes:: CNNEncoder: 1D CNN encoder for sequence data (DeepDTA architecture). ProteinCNN: 1D CNN for protein sequence feature extraction. ContextCNNGeneric: Template for CNN + sequence-to-sequence contextualizer. CNNTransformer: CNN backbone followed by Transformer-Encoder.

Note

Import BaseCNN directly from kale.embed.base_cnn for base utilities.

Example

>>> from kale.embed.base_cnn import BaseCNN
>>> from kale.embed.cnn import CNNEncoder, ProteinCNN
>>> # Create a custom CNN using BaseCNN utilities
>>> class MyCNN(BaseCNN):
...     def __init__(self):
...         super().__init__()
...         self.conv_layers, self.batch_norms = self._create_sequential_conv_blocks(
...             in_channels=3, out_channels_size_list=[32, 64], kernel_sizes=3, conv_type='2d'
...         )
>>> # Use existing implementations
>>> encoder = CNNEncoder(num_embeddings=64, embedding_dim=128, sequence_length=85,
...                       num_kernels=32, kernel_length=8)

class kale.embed.cnn.CNNEncoder(num_embeddings, embedding_dim, sequence_length, num_kernels, kernel_length)

Bases: BaseCNN

The DeepDTA’s CNN encoder module, which comprises three 1D-convolutional layers and one max-pooling layer. The module is applied to encoding drug/target sequence information, and the input should be transformed information with integer/label encoding. The original paper is “DeepDTA: deep drug–target binding affinity prediction”.

This class now inherits from BaseCNN to leverage shared CNN utilities.

Parameters:

num_embeddings (int) – Number of embedding labels/categories, depends on the types of encoding sequence.
embedding_dim (int) – Dimension of embedding labels/categories.
sequence_length (int) – Max length of the input sequence.
num_kernels (int) – Number of kernels (filters).
kernel_length (int) – Length of kernel (filter).

forward(x)

Forward pass through the CNNEncoder.

Parameters:: x (torch.Tensor) – Input tensor containing embedded sequence data of shape (batch_size, sequence_length).
Returns:: Encoded feature vector of shape (batch_size, num_kernels * 3).
Return type:: torch.Tensor

output_size() → int

Return the output feature dimension of the encoder.

Returns:: Number of output features (num_kernels * 3).
Return type:: int

class kale.embed.cnn.ProteinCNN(embedding_dim, num_filters, kernel_size, use_padding: bool = True)

Bases: BaseCNN

A protein feature extractor using Convolutional Neural Networks (CNNs).

This class extracts features from protein sequences using a series of 1D convolutional layers. The input protein sequence is first embedded and then passed through multiple convolutional and batch normalization layers to produce a fixed-size feature vector.

This class now inherits from BaseCNN to leverage shared CNN utilities.

Parameters:

embedding_dim (int) – Dimensionality of the embedding space for protein sequences.
num_filters (list of int) – A list specifying the number of filters for each convolutional layer.
kernel_size (list of int) – A list specifying the kernel size for each convolutional layer.
use_padding (bool, optional) – Whether to set a padding index on the embedding layer. When True, index 0 is treated as a padding token (its embedding is fixed at zero). Defaults to True. Note: This controls the padding_idx parameter of the embedding layer, not the convolutional layer padding (which is controlled by the conv_padding argument in BaseCNN utilities).

forward(v)

Forward pass through the ProteinCNN.

Parameters:

v (torch.Tensor) – Input tensor containing protein sequence indices of shape (batch_size, sequence_length).

Returns:

Extracted protein features of shape: (batch_size, sequence_length, num_filters[-1]).

Return type:

torch.Tensor

output_size() → int

Return the output feature dimension of the protein CNN.

Returns:: Number of output features (last filter size).
Return type:: int

class kale.embed.cnn.ContextCNNGeneric(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], contextualizer: Module | Any, output_type: str)

Bases: Module

A template to construct a feature extractor consisting of a CNN followed by a sequence-to-sequence contextualizer like a Transformer-Encoder. Before inputting the CNN output tensor to the contextualizer, the tensor’s spatial dimensions are unrolled into a sequence.

Parameters:

cnn (nn.Module) – Any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width).
cnn_output_shape (tuple) – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
contextualizer (nn.Module, optional) – A sequence-to-sequence model that takes inputs of shape (num_timesteps, batch_size, num_features) and uses attention to contextualize the sequence and returns a sequence of the exact same shape. This will mainly be a Transformer-Encoder (required).
output_type (string) – One of ‘sequence’ or ‘spatial’. If spatial, then the final output of the model, which is a sequence, will be reshaped to resemble the image-batch shape of the output of the CNN. If sequence then the output sequence is returned as is (required).

Examples

>>> cnn = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3),
>>>                     nn.Conv2d(32, 64, kernel_size=3),
>>>                     nn.MaxPool2d(2))
>>> cnn_output_shape = (-1, 64, 8, 8)
>>> contextualizer = nn.TransformerEncoderLayer(...)
>>> output_type = 'spatial'
>>>
>>> attention_cnn = ContextCNNGeneric(cnn, cnn_output_shape, contextualizer, output_type)
>>> output = attention_cnn(torch.randn((32,3,16,16)))
>>>
>>> output.size() == cnn_output_shape # True

forward(x: Tensor)

Pass the input through the cnn and then the contextualizer.

Parameters:: x – input image batch exactly as for CNNs (required).

class kale.embed.cnn.CNNTransformer(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], num_layers: int, num_heads: int, dim_feedforward: int, dropout: float, output_type: str, positional_encoder: Module = None)

Bases: ContextCNNGeneric

A feature extractor consisting of a given CNN backbone followed by a standard Transformer-Encoder. See documentation of “ContextCNNGeneric” for more information.

Parameters:

cnn (nn.Module) – Any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width) (required).
cnn_output_shape (tuple) – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
num_layers (int) – Number of attention layers in the Transformer-Encoder (required).
num_heads (int) – Number of attention heads in each transformer block (required).
dim_feedforward (int) – Number of neurons in the intermediate dense layer of each transformer feedforward block (required).
dropout (float) – Dropout rate of the transformer layers (required).
output_type (string) – One of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is the sequence output of the Transformer-Encoder, will be reshaped to resemble the image-batch shape of the output of the CNN (required).
positional_encoder (nn.Module) – None or a nn.Module that expects inputs of shape (sequence_length, batch_size, embedding_dim) and returns the same input after adding some positional information to the embeddings. If None, then the default and fixed sin-cos positional encodings of base transformers are applied (optional).

Examples

See pykale/examples/cifar_cnntransformer/model.py

kale.embed.factorization module

Python implementation of a tensor factorization algorithm Multilinear Principal Component Analysis (MPCA) and a matrix factorization algorithm Maximum Independence Domain Adaptation (MIDA）

class kale.embed.factorization.MPCA(var_ratio=0.97, max_iter=1, vectorize=False, n_components=None)

Bases: BaseEstimator, TransformerMixin

MPCA implementation compatible with scikit-learn

Parameters:

var_ratio (float, optional) – Percentage of variance explained (between 0 and 1). Defaults to 0.97.
max_iter (int, optional) – Maximum number of iteration. Defaults to 1.
vectorize (bool) – Whether return the transformed/projected tensor in vector. Defaults to False.
n_components (int) – Number of components to keep. Applies only when vectorize=True. Defaults to None.

proj_mats

A list of transposed projection matrices, shapes (P_1, I_1), …, (P_N, I_N), where P_1, …, P_N are output tensor shape for each sample.

Type:: list of arrays

idx_order

The ordering index of projected (and vectorized) features in decreasing variance.

Type:: array-like

mean_

Per-feature empirical mean, estimated from the training set, shape (I_1, I_2, …, I_N).

Type:: array-like

shape_in

Input tensor shapes, i.e. (I_1, I_2, …, I_N).

Type:: tuple

shape_out

Output tensor shapes, i.e. (P_1, P_2, …, P_N).

Type:: tuple

Reference:: Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “MPCA: Multilinear Principal Component Analysis of Tensor Objects”, IEEE Transactions on Neural Networks, Vol. 19, No. 1, Page: 18-39, January 2008. For initial Matlab implementation, please go to https://uk.mathworks.com/matlabcentral/fileexchange/26168.

Examples

>>> import numpy as np
>>> from kale.embed.factorization import MPCA
>>> x = np.random.random((40, 20, 25, 20))
>>> x.shape
(40, 20, 25, 20)
>>> mpca = MPCA()
>>> x_projected = mpca.fit_transform(x)
>>> x_projected.shape
(40, 18, 23, 18)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 7452)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 50)
>>> x_rec = mpca.inverse_transform(x_projected)
>>> x_rec.shape
(40, 20, 25, 20)

fit(x, y=None)

Fit the model with input training data x.

Args

x (array-like tensor): Input data, shape (n_samples, I_1, I_2, …, I_N), where n_samples is the number of: samples, I_1, I_2, …, I_N are the dimensions of corresponding mode (1, 2, …, N), respectively.

y (None): Ignored variable.

Returns:: self (object). Returns the instance itself.

transform(x)

Perform dimension reduction on x

Parameters:: x (array-like tensor) – Data to perform dimension reduction, shape (n_samples, I_1, I_2, …, I_N).
Returns:: Projected data in lower dimension, shape (n_samples, P_1, P_2, …, P_N) if self.vectorize==False. If self.vectorize==True, features will be sorted based on their explained variance ratio, shape (n_samples, P_1 * P_2 * … * P_N) if self.n_components is None, and shape (n_samples, n_components) if self.n_component is a valid integer.
Return type:: array-like tensor

inverse_transform(x)

Reconstruct projected data to the original shape and add the estimated mean back

Parameters:: x (array-like tensor) – Data to be reconstructed, shape (n_samples, P_1, P_2, …, P_N), if self.vectorize == False, where P_1, P_2, …, P_N are the reduced dimensions of corresponding mode (1, 2, …, N), respectively. If self.vectorize == True, shape (n_samples, self.n_components) or shape (n_samples, P_1 * P_2 * … * P_N).
Returns:: Reconstructed tensor in original shape, shape (n_samples, I_1, I_2, …, I_N)
Return type:: array-like tensor

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') → MPCA

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_inverse_transform_request(*, x: bool | None | str = '$UNCHANGED$') → MPCA

Configure whether metadata should be requested to be passed to the inverse_transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in inverse_transform.
Returns:: self – The updated object.
Return type:: object

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') → MPCA

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.
Returns:: self – The updated object.
Return type:: object

class kale.embed.factorization.BaseKernelDomainAdapter(num_components=None, ignore_y=False, augment=None, kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, alpha=1.0, fit_inverse_transform=False, eigen_solver='auto', tol=0, max_iter=None, iterated_power='auto', remove_zero_eig=False, scale_components=False, random_state=None, copy=True, num_jobs=None)

Bases: ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator

Base class for kernel domain adaptation methods. Extendable to support different kernel-based domain adaptation methods (e.g., MIDA, TCA, SCA).

Parameters:

num_components (int, optional) – Number of components to keep. If None, all components are kept. Defaults to None.
ignore_y (bool, optional) – Whether to ignore the target variable y during fitting. Defaults to False.
augment (str, optional) – Whether to augment the input data with factors. Can be “pre” (prepend factors), “post” (append factors), or None (no augmentation). Defaults to None.
kernel (str or callable, optional) – Kernel function to use. Can be “linear”, “rbf”, “poly”, “sigmoid”, or a callable. Defaults to “linear”.
gamma (float, optional) – Kernel coefficient for “rbf”, “poly”, and “sigmoid” kernels. If None, defaults to 1 / num_features. Defaults to None.
degree (int, optional) – Degree of the polynomial kernel. Ignored by other kernels. Defaults to 3.
coef0 (float, optional) – Independent term in the polynomial and sigmoid kernels. Ignored by other kernels. Defaults to 1.
kernel_params (dict, optional) – Additional parameters for the kernel function. Defaults to None.
alpha (float, optional) – Regularization parameter for the kernel. Defaults to 1.0.
fit_inverse_transform (bool, optional) – Whether to fit the inverse transform for reconstruction. Defaults to False.
eigen_solver (str, optional) – Eigenvalue solver to use. Can be “auto”, “dense”, “arpack”, or “randomized”. Defaults to “auto”.
tol (float, optional) – Tolerance for convergence of the eigenvalue solver. Defaults to 0.
max_iter (int, optional) – Maximum number of iterations for the eigenvalue solver. If None, no limit is applied. Defaults to None.
iterated_power (int or str, optional) – Number of iterations for the randomized solver. Can be an integer or “auto”. Defaults to “auto”.
remove_zero_eig (bool, optional) – Whether to remove zero eigenvalues during postprocessing. Defaults to False.
scale_components (bool, optional) – Whether to scale the components by the square root of their eigenvalues. Defaults to False.
random_state (int, np.random.RandomState, or None, optional) – Random seed for reproducibility. Defaults to None.
copy (bool, optional) – Whether to copy the input data during validation. Defaults to True.
num_jobs (int or None, optional) – Number of jobs to run in parallel for pairwise kernel computations. Defaults to None.

property orig_coef_: Coefficients projected to the original feature space with shape (num_components, num_features).

fit(x, y=None, group_labels=None, **fit_params)

Fit the model to the data x and target variable y. :param x: The input data with shape (num_samples, num_features). :type x: array-like :param y: The target variable (binary or multiclass classification label) with shape (num_samples).

Set -1 for unknown labels for semi-supervised MIDA. Default is None.

Parameters:

group_labels (array-like, optional) – Categorical variables representing domain or grouping factors with shape (num_samples, num_factors). Preprocessing (e.g., one-hot encode domain, gender, or age groups) must be applied in advance. Default is None.
**fit_params – Additional parameters for fitting.

Returns:

The fitted model.

Return type:

self

transform(x, group_labels=None)

Transform the input data x to factor-independent feature space using the fitted domain adapter. :param x: The input data with shape (num_samples, num_features). :type x: array-like :param group_labels: Categorical variables representing domain or grouping factors with

shape (num_samples, num_factors). Preprocessing (e.g., one-hot encode domain, gender, or age groups) must be applied in advance. Default is None.

Returns:: The transformed data with shape (num_samples, num_components).
Return type:: array-like

inverse_transform(z)

Inverse transform the transformed data z back to the original space. :param z: The transformed data with shape (num_samples, num_components). :type z: array-like

Returns:

The inverse transformed data with shape (num_samples, num_features): or (num_samples, num_features + num_factors) if augment=True.

Return type:

array-like

fit_transform(x, y=None, group_labels=None, **fit_params)

Fit the model to the data x and target variable y and remove the effect of factors, and transform x. :param x: The input data with shape (num_samples, num_features). :type x: array-like :param y: The target variable (binary or multiclass classification label) with shape (num_samples).

Set -1 for unknown labels for semi-supervised MIDA. Default is None.

Parameters:

group_labels (array-like, optional) – Categorical variables representing domain or grouping factors with shape (num_samples, num_factors). Preprocessing (e.g., one-hot encode domain, gender, or age groups) must be applied in advance. Default is None.
**fit_params – Additional parameters for fitting.

Returns:

The transformed data.

Return type:

array-like

set_fit_request(*, group_labels: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') → BaseKernelDomainAdapter

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

group_labels (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for group_labels parameter in fit.
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_inverse_transform_request(*, z: bool | None | str = '$UNCHANGED$') → BaseKernelDomainAdapter

Configure whether metadata should be requested to be passed to the inverse_transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for z parameter in inverse_transform.
Returns:: self – The updated object.
Return type:: object

set_transform_request(*, group_labels: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') → BaseKernelDomainAdapter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

group_labels (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for group_labels parameter in transform.
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.

Returns:

self – The updated object.

Return type:

object

class kale.embed.factorization.MIDA(num_components=None, mu=1.0, eta=1.0, ignore_y=False, augment=None, kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, alpha=1, fit_inverse_transform=False, eigen_solver='auto', tol=0, max_iter=None, iterated_power='auto', remove_zero_eig=False, scale_components=False, random_state=None, copy=True, num_jobs=None)

Bases: BaseKernelDomainAdapter

Maximum Independent Domain Adaptation (MIDA). A kernel-based domain adaptation method that uses removes the effect of factors/covariates from the data by learning a feature space derived from maximizing Hilbert-Schmidt independence criterion (HSIC).

To prevent label leakage, please set the label for the target indices to -1.

Parameters:

num_components (int, optional) – Number of components to keep. If None, all components are kept.
mu (float, optional) – L2 kernel regularization coefficient. Default is 1.0.
eta (float, optional) – Class-dependency regularization coefficient. Default is 1.0.
ignore_y (bool, optional) – Whether to ignore the target variable y during fitting. Default is False.
augment (str, optional) – Whether to augment the input data with factors. Can be “pre” (prepend factors), “post” (append factors), or None (no augmentation). Defaults to None.
kernel (str, optional) – Kernel type to be used. Default is ‘linear’.
gamma (float, optional) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’ kernels. Default is None.
degree (int, optional) – Degree of the polynomial kernel. Default is 3.
coef0 (float, optional) – Independent term in the polynomial and sigmoid kernels. Default is 1.
kernel_params (dict, optional) – Additional kernel parameters. Default is None.
alpha (float, optional) – Regularization parameter. Default is 1.0.
fit_inverse_transform (bool, optional) – Whether to fit the inverse transform. Default is False.
eigen_solver (str, optional) – Eigendecomposition solver to use. Default is ‘auto’.
tol (float, optional) – Tolerance for convergence. Default is 0.
max_iter (int, optional) – Maximum number of iterations for the solver. Default is None.
iterated_power (int or str, optional) – Number of iterations for randomized solver. Default is ‘auto’.
remove_zero_eig (bool, optional) – Whether to remove zero eigenvalues. Default is False.
scale_components (bool, optional) – Whether to scale the components. Default is False.
random_state (int or np.random.RandomState, optional) – Random seed for reproducibility. Default is None.
copy (bool, optional) – Whether to copy the input data. Default is True.
num_jobs (int, optional) – Number of jobs to run in parallel for joblib.Parallel. Default is None.

References

[1] Yan, K., Kou, L. and Zhang, D., 2018. Learning domain-invariant subspace using domain features and: independence maximization. IEEE transactions on cybernetics, 48(1), pp.288-299.

Examples

>>> import numpy as np
>>> from kale.embed.factorization import MIDA
>>> # Generate random synthetic data
>>> x_source = np.random.normal(loc=5, scale=1, size=(20, 40))
>>> x_target = np.random.normal(loc=-5, scale=1, size=(20, 40))
>>> y = np.array([0] * 10 + [1] * 10 + [0] * 10 + [1] * 10)
>>> # Concatenate source and target data
>>> x = np.vstack((x_source, x_target))
>>> target_indices = np.arange(20, 40)
>>> # Mask the target indices with -1
>>> y[target_indices] = -1
>>> # Create factors (e.g., one-hot encoded domain labels)
>>> factors = np.concatenate((np.zeros((20, 1)), np.ones((20, 1))), axis=0)
>>> mida = MIDA()
>>> x_projected = mida.fit_transform(x, y, group_labels=factors)
>>> x_projected.shape
(40, 18)

set_fit_request(*, group_labels: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') → MIDA

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

group_labels (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for group_labels parameter in fit.
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_inverse_transform_request(*, z: bool | None | str = '$UNCHANGED$') → MIDA

Configure whether metadata should be requested to be passed to the inverse_transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for z parameter in inverse_transform.
Returns:: self – The updated object.
Return type:: object

set_transform_request(*, group_labels: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') → MIDA

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

group_labels (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for group_labels parameter in transform.
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.

Returns:

self – The updated object.

Return type:

object

kale.embed.gcn module

class kale.embed.gcn.GCNEncoderLayer(in_channels, out_channels, improved=False, cached=False, bias=True, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.GCNConv, which reduces the computational cost of GCN layer for GripNet model. The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” (ICLR 2017) paper.

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},\]

where $\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$ denotes the adjacency matrix with inserted self-loops and $\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$ its diagonal degree matrix.

Note: For more information please see Pytorch Geomertic’s nn.GCNConv docs.

Parameters:

in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
improved (bool, optional) – If set to True, the layer computes $\mathbf{\hat{A}}$ as $\mathbf{A} + 2\mathbf{I}$. (default: False)
cached (bool, optional) – If set to True, the layer will cache the computation of $\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()

static norm(edge_index, num_nodes, edge_weight, improved=False, dtype=None): Add self-loops and apply symmetric normalization

forward(x, edge_index, edge_weight=None)

Parameters:

x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_weight (torch.Tensor, optional) – The one-dimensional relation weight for each edge in edge_index (default: None).

class kale.embed.gcn.RGCNEncoderLayer(in_channels, out_channels, num_relations, num_bases, after_relu, bias=False, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.RGCNConv, which reduces the computational and memory cost of RGCN encoder layer for GripNet model. The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper.

\[\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,\]

where $\mathcal{R}$ denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional torch.long tensor which stores a relation identifier $\in \{ 0, \ldots, |\mathcal{R}| - 1\}$ for each edge.

Note: For more information please see Pytorch Geomertic’s nn.RGCNConv docs.

Parameters:

in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
num_relations (int) – Number of edge relations.
num_bases (int) – Use bases-decoposition regulatization scheme and num_bases denotes the number of bases.
after_relu (bool) – Whether input embedding is activated by relu function or not.
bias (bool) – If set to False, the layer will not learn an additive bias. (default: False)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()

forward(x, edge_index, edge_type, range_list)

Parameters:

x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_type (torch.Tensor) – The one-dimensional relation type/index for each edge in edge_index.
range_list (torch.Tensor) – The index range list of each edge type with shape [num_types, 2].

class kale.embed.gcn.GCNEncoder(in_channel=78, out_channel=128, dropout_rate=0.2)

Bases: Module

The GraphDTA’s GCN encoder module, which comprises three graph convolutional layers and one full connected layer. The model is a variant of DeepDTA and is applied to encoding drug molecule graph information. The original paper is “GraphDTA: Predicting drug–target binding affinity with graph neural networks” .

Parameters:

in_channel (int) – Dimension of each input node feature.
out_channel (int) – Dimension of each output node feature.
dropout_rate (float) – dropout rate during training.

forward(x, edge_index, batch)

class kale.embed.gcn.MolecularGCN(in_feats, dim_embedding=128, padding=True, hidden_feats=None, activation=None)

Bases: Module

A molecular feature extractor using a Graph Convolutional Network (GCN).

This class implements a GCN to extract features from molecular graphs. It includes an initial linear transformation followed by a series of graph convolutional layers. The output is a fixed-size feature vector for each molecule.

Parameters:

in_feats (int) – Number of input features each node has.
dim_embedding (int) – Dimensionality of the embedding space after the initial linear transformation.
padding (bool) – Whether to apply padding (set certain weights to zero).
hidden_feats (list of int) – A list specifying the number of hidden units for each GCN layer.
activation (callable, optional) – Activation function to apply after each GCN layer.

forward(batch_graph)

kale.embed.image_cnn module

kale.embed.model_lib.ban module

kale.embed.model_lib.gripnet module

The GripNet proposed in the `”GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs”: <https://doi.org/10.1016/j.patcog.2022.108973>`_ (PatternRecognit 2022) paper, which is an efficient framework to learn node representations on heterogeneous graphs for the downstream link prediction, node classification, and visualization. The code is based on the https://github.com/NYXFLOWER/GripNet.

class kale.embed.model_lib.gripnet.GripNetInternalModule(in_channels: int, num_edge_type: int, start_supervertex: bool, setting: SuperVertexParaSetting)

Bases: Module

The internal module of a supervertex, which is composed of an internal feature layer and multiple internal aggregation layers.

Parameters:

in_channels (int) – the dimension of node features on this supervertex.
num_edge_type (int) – the number of edge types on this supervertex.
start_supervertex (bool) – whether this supervertex is a start supervertex on the supergraph.
setting (SuperVertexParaSetting) – supervertex parameter settings.

forward(x: Tensor, edge_index: Tensor, edge_type: Tensor = None, range_list: Tensor = None, edge_weight: Tensor = None) → Tensor

Parameters:

x (torch.Tensor) – the input node feature embedding.
edge_index (torch.Tensor) – edge index in COO format with shape [2, #edges].
edge_type (torch.Tensor, optional) – one-dimensional relation type for each edge, indexed from 0. Defaults to None.
range_list (torch.Tensor, optional) – The index range list of each edge type with shape [num_types, 2]. Defaults to None.
edge_weight (torch.Tensor, optional) – one-dimensional weight for each edge. Defaults to None.

Note: The internal feature layer is computed in the forward function of GripNet class. If the supervertex is not a start supervertex, x should be the sum or concat of the outputs of the internal feature layer and all external aggregation layers.

class kale.embed.model_lib.gripnet.GripNetExternalModule(in_channels: int, out_channels: int, num_out_node: int)

Bases: Module

The internal module of a supervertex, which is an external feature layer.

Parameters:

in_channels (int) – Size of each input sample. In GripNet, it should be the dimension of the output embedding of
supervertex. (the corresponding parent)
out_channels (int) – Size of each output sample. In GripNet, it is the dimension of the output embedding of the supervertex.
num_out_node (int) – the number of output nodes.

forward(x: Tensor, edge_index: Tensor, edge_weight: Tensor = None, use_relu=True)

Parameters:

x (torch.Tensor) – the input node feature embedding.
edge_index (torch.Tensor) – edge index in COO format with shape [2, #edges].
edge_weight (torch.Tensor, optional) – one-dimensional weight for each edge. Defaults to None.
use_relu (bool, optional) – whether to use ReLU before returning node feature embeddings. Defaults to True.

class kale.embed.model_lib.gripnet.GripNet(supergraph: SuperGraph)

Bases: Module

The GripNet model.

Parameters:: supergraph (SuperGraph) – the supergraph.

Reference:: Xu, H., Sang, S., Bai, P., Li, R., Yang, L. and Lu, H., 2022. GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs. Pattern Recognition, p.108973.

forward()

kale.embed.model_lib.mogonet module

Construct a message passing network using PyTorch Geometric for the MOGONET method. MOGONET is a multiomics fusion framework for cancer classification and biomarker identification that utilizes supervised graph convolutional networks for omics datasets.

This code is written by refactoring the MOGONET code (https://github.com/txWang/MOGONET/blob/main/models.py) within the ‘MessagePassing’ base class provided in the PyTorch Geometric.

Reference: Wang, T., Shao, W., Huang, Z., Tang, H., Zhang, J., Ding, Z., Huang, K. (2021). MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nature communications. https://www.nature.com/articles/s41467-021-23774-w

class kale.embed.model_lib.mogonet.MogonetGCNConv(in_channels: int, out_channels: int, bias: bool = True, aggr: str | List[str] | Aggregation | None = 'add', **kwargs)

Bases: MessagePassing

Create message passing layers for the MOGONET method. Each layer is defined as:

\[H^{(l+1)}=f(H^{(l)}, A) = \sigma(AH^{(l)}W^{(l)})\]

where $\mathbf{H^{(l)}}$ is the input of the $l$-th layer and $\mathbf{W^{(l)}}$ is the weight matrix of the $l$-th layer. $\sigma(.)$ denotes a non-linear activation function.

For more information please refer to the MOGONET paper.

Parameters:

in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)
aggr (string or list or Aggregation, optional) – The aggregation scheme to use, e.g., "add", "sum", "mean", "min", "max" or "mul".
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters() → None: Reset all parameters of the model.

forward(x: Tensor, edge_index: SparseTensor) → Tensor

message(x_j: Tensor) → Tensor: Construct messages from node $j$ to node $i$ for each edge in edge_index.

message_and_aggregate(adj_t: SparseTensor | Tensor, x: Tensor) → Tensor: Fuse computations of message() and aggregate() into a single function.

update(aggr_out: Tensor) → Tensor: Update node embeddings for each node $i \in \mathcal{V}$.

class kale.embed.model_lib.mogonet.MogonetGCN(in_channels: int, hidden_channels: List[int], dropout: float)

Bases: Module

Create the structure of the graph convolutional network in the MOGONET method. For more information please refer to the MOGONET paper.

Parameters:

in_channels (int) – Size of each input sample.
hidden_channels (List[int]) – A list of sizes of hidden layers.
dropout (float) – Probability of an element to be zeroed.

forward(x: Tensor, edge_index: SparseTensor) → Tensor

kale.embed.multimodal_encoder module

kale.embed.multimodal_fusion module

This module implements four different multimodal fusion methods: 1. Concat 2. BimodalInteractionFusion 3. LowRankTensorFusion 4. ProductOfExperts Each of these fusion methods are designed to work with input modalities as PyTorch tensors and perform different operations to combine and create a joint representation of the input data. Reference: https://github.com/pliang279/MultiBench/blob/main/fusions/common_fusions.py

class kale.embed.multimodal_fusion.Concat

Bases: Module

Concat is a simple PyTorch module for fusing multimodal data by concatenating tensors along dimension 1. This fusion method is often used in multimodal learning where data from different modalities (e.g., image, audio) are processed separately and then fused together for further processing or decision-making. Each modality data is first flattened from its second dimension onward and then these flattened tensors are concatenated together. This approach to fusion maintains the independence of the modalities before the fusion point, allowing the network to learn separate representations for each modality before combining them.

forward(modalities)

class kale.embed.multimodal_fusion.BimodalInteractionFusion(input_dims, output_dim, output, flatten=False, clip=None, grad_clip=None, flip=False)

Bases: Module

BimodalInteractionFusion is a PyTorch module that performs fusion of two data modalities through a hypernetwork-based interaction mechanism. The ‘input_dims’ argument specifies the input dimensions of the two modalities. The ‘output_dim’ argument specifies the output dimension after the fusion. The ‘output’ argument defines the type of bimodal matrix interactions to be performed, which can be ‘matrix’, ‘vector’, or ‘scalar’.

This fusion method supports three types of bimodal interactions:

Matrix: It implements a general hypernetwork mechanism where the interaction is multiplicative. It uses

separate weight matrices and biases for the two modalities. - Vector: It uses diagonal forms and gating mechanisms, applying element-wise multiplication to combine the modalities. - Scalar: It applies scales and biases to the input modalities before combining them.

This fusion method uses xavier normal distribution for initializing the weight matrices and normal distribution for the biases. It also provides options to clip the parameter values and their gradients within specified ranges to prevent them from exploding or vanishing. This fusion approach allows for complex interactions between the modalities and is well-suited for tasks that require the integration of heterogeneous data.

Parameters:

input_dims (list or tuple) – list or tuple of 2 integers indicating input dimensions of the 2 modalities
output_dim (int) – output dimension after the fusion
output (str) – type of BimodalMatrix Interactions, options from ‘matrix’,’vector’,’scalar’
flatten (bool) – whether we need to flatten the input modalities
clip (tuple, optional) – clip parameter values, None if no clip
grad_clip (tuple, optional) – clip grad values, None if no clip
flip (bool) – whether to swap the two input modalities in forward function or not

forward(modalities)

class kale.embed.multimodal_fusion.LowRankTensorFusion(input_dims, output_dim, rank, flatten=True)

Bases: Module

LowRankTensorFusion is a PyTorch module that performs multimodal fusion using a low-rank tensor-based approach.: The ‘input_dims’ argument specifies the input dimensions of each modality. The ‘output_dim’ argument defines the output dimension after the fusion. The ‘rank’ argument is a hyperparameter specifying the rank for the low-rank tensor decomposition. This fusion method performs fusion by assuming a low-rank structure for the interaction tensor, effectively compressing the interaction space. It leverages a set of low-rank factors, one for each modality, that are learned during training. These factors are initialized with xavier normal distribution and are applied to their corresponding modalities during the forward pass. A tensor product is computed across all modalities and their respective factors, resulting in a fused tensor. Next, a weighted summation of this fused tensor is computed using fusion weights, followed by the addition of a fusion bias. Both fusion weights and bias are learnable parameters initialized with xavier normal distribution and zero respectively. The final output is reshaped to the specified ‘output_dim’ and returned. If ‘flatten’ is set to True, each modality is first flattened before concatenation with a ones tensor and the subsequent multiplication with its factor. This approach provides an efficient and compact representation for capturing interactions among multiple modalities, making it suitable for tasks involving high-dimensional multimodal data.

Parameters:

input_dims (list or tuple) – A list or tuple of integers indicating input dimensions of the modalities.
output_dim (int) – output dimension after the fusion.
rank (int) – A hyperparameter specifying the rank for the low-rank tensor decomposition.
flatten (bool) – Boolean to dictate if output should be flattened or not. Default: True

Note

Adapted from https://github.com/Justin1904/Low-rank-Multimodal-Fusion.

forward(modalities)

class kale.embed.multimodal_fusion.ProductOfExperts(*args, **kwargs)

Bases: Module

ProductOfExperts combines multiple independent Gaussian distributions (“experts”) into a single Gaussian by computing their product in closed form. This is commonly used in multimodal variational autoencoders (VAE) and probabilistic models to fuse information from different modalities or sources, yielding a consensus latent distribution.

For each expert, the inputs are mean (mean) and log-variance (log_var) tensors, and the output is the mean and log-variance of the combined product Gaussian. This formulation enables principled uncertainty fusion and robust inference in the presence of missing or noisy modalities.

Example

poe = ProductOfExperts() combined_mu, combined_log_var = poe(mu_experts, log_var_experts)

forward(mean, log_var, eps=1e-08)

Parameters:

mean (Tensor) – Mean values from all experts, shape (num_experts, batch_size, latent_dim)
log_var (Tensor) – Log-variance values from all experts, shape (num_experts, batch_size, latent_dim)
eps (float, optional) – Small value for numerical stability. Default is 1e-8.

Returns:

Mean of the combined product Gaussian, shape (batch_size, latent_dim) prod_log_var (Tensor): Log-variance of the combined product Gaussian, shape (batch_size, latent_dim)

Return type:

prod_mean (Tensor)

kale.embed.nn module

class kale.embed.nn.RandomLayer(input_dim_list, output_dim=256)

Bases: Module

The RandomLayer is designed to apply random matrix multiplications to a list of input tensors. Each input tensor is multiplied by a randomly initialized matrix, and the results are combined through element-wise multiplication.

Parameters:

input_dim_list (list of int) – A list of integers representing the dimensionality of each input tensor. The length of this list determines how many input tensors the layer expects.
output_dim (int, optional) – The dimensionality of the output tensor after the random transformations. Default is 256.

forward(input_list)

class kale.embed.nn.FCNet(dims, activation='ReLU', dropout=0)

Bases: Module

A simple class for non-linear fully connect network

Modified from https://github.com/jnhwkim/ban-vqa/blob/master/fc.py

This class creates a fully connected neural network with optional dropout and activation functions. Weight normalization is applied to each linear layer.

Parameters:

dims (list of int) – A list specifying the input and output dimensions of each layer. For example, [input_dim, hidden_dim1, hidden_dim2, …, output_dim].
activation (str, optional) – The name of the activation function to use (e.g., ‘ReLU’, ‘Tanh’). Default is ‘ReLU’. If an empty string is provided, no activation is applied.
dropout (float, optional) – Dropout probability to apply after each layer. Default is 0 (no dropout).

forward(x)

kale.embed.signal_cnn module

This module provides CNN-based encoders for transforming 1D signals into latent representations, primarily for use in variational autoencoders (VAE) and related deep learning applications.

class kale.embed.signal_cnn.SignalVAEEncoder(input_dim=60000, latent_dim=256)

Bases: BaseCNN

SignalVAEEncoder encodes 1D signals into a latent representation suitable for variational autoencoders (VAE).

This encoder uses a series of 1D convolutional layers to extract hierarchical temporal features from generic 1D signals, followed by fully connected layers that output the mean and log-variance vectors for the latent Gaussian distribution. This structure is commonly used for unsupervised or multimodal learning on time-series or sequential data.

This class now inherits from BaseCNN to leverage shared CNN utilities for improved code reusability and FAIR compliance.

Parameters:

input_dim (int, optional) – Length of the input 1D signal (number of time points). Default is 60000.
latent_dim (int, optional) – Dimensionality of the latent space representation. Default is 256.

Forward Input:: x (Tensor): Input signal tensor of shape (batch_size, 1, input_dim).
Forward Output:: mean (Tensor): Mean vector of the latent Gaussian distribution, shape (batch_size, latent_dim). log_var (Tensor): Log-variance vector of the latent Gaussian, shape (batch_size, latent_dim).

Example

>>> encoder = SignalVAEEncoder(input_dim=60000, latent_dim=128)
>>> mean, log_var = encoder(signals)

forward(x)

Forward pass through the SignalVAEEncoder.

Parameters:

x (torch.Tensor) – Input 1D signal tensor of shape (batch_size, 1, input_dim).

Returns:

A tuple containing:

mean (torch.Tensor): Mean vector of the latent Gaussian distribution,
shape (batch_size, latent_dim).
log_var (torch.Tensor): Log-variance vector of the latent Gaussian,
shape (batch_size, latent_dim).

Return type:

Tuple[torch.Tensor, torch.Tensor]

output_size() → int

Return the output feature dimension before the latent projection.

Returns:: Number of flattened features after convolutions (64 * flattened_dim).
Return type:: int

kale.embed.uncertainty_fitting module

kale.embed.video.feature_extractor module

kale.embed.video.i3d module

Define Inflated 3D ConvNets(I3D) on Action Recognition from https://ieeexplore.ieee.org/document/8099985 Created by Xianyuan Liu from modifying https://github.com/piergiaj/pytorch-i3d/blob/master/pytorch_i3d.py and https://github.com/deepmind/kinetics-i3d/blob/master/i3d.py

class kale.embed.video.i3d.MaxPool3dSamePadding(kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] | None = None, padding: int | Tuple[int, ...] = 0, dilation: int | Tuple[int, ...] = 1, return_indices: bool = False, ceil_mode: bool = False)

Bases: MaxPool3d

Construct 3d max pool with same padding. PyTorch does not provide same padding. Same padding means the output size matches input size for stride=1.

compute_pad(dim, s): Get the zero padding number.

forward(x): Compute ‘same’ padding. Add zero to the back position first.

class kale.embed.video.i3d.Unit3D(in_channels, output_channels, kernel_shape=(1, 1, 1), stride=(1, 1, 1), padding=0, activation_fn=<function relu>, use_batch_norm=True, use_bias=False, name='unit_3d')

Bases: Module

Basic unit containing Conv3D + BatchNorm + non-linearity.

compute_pad(dim, s): Get the zero padding number.

forward(x)

Connects the module to inputs. Dynamically pad based on input size in forward function. :param x: Inputs to the Unit3D component.

Returns:: Outputs from the module.

class kale.embed.video.i3d.InceptionModule(in_channels, out_channels, name)

Bases: SELayerMixin, Module

Construct Inception module. Concatenation after four branches (1x1x1 conv; 1x1x1 + 3x3x3 convs; 1x1x1 + 3x3x3 convs; 3x3x3 max-pool + 1x1x1 conv). In forward, we check if SELayers are used, which are channel-wise (SELayerC), temporal-wise (SELayerT), channel-temporal-wise (SELayerTC & SELayerCT).

forward(x)

class kale.embed.video.i3d.InceptionI3d(num_classes=400, spatial_squeeze=True, final_endpoint='Logits', name='inception_i3d', in_channels=3, dropout_keep_prob=0.5)

Bases: Module

Inception-v1 I3D architecture. The model is introduced in:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman https://arxiv.org/pdf/1705.07750v1.pdf.

See also the Inception architecture, introduced in:: Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/pdf/1409.4842v1.pdf.

VALID_ENDPOINTS = ('Conv3d_1a_7x7', 'MaxPool3d_2a_3x3', 'Conv3d_2b_1x1', 'Conv3d_2c_3x3', 'MaxPool3d_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'MaxPool3d_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool3d_5a_2x2', 'Mixed_5b', 'Mixed_5c', 'Logits', 'Predictions')

replace_logits(num_classes): Update the output size with num_classes according to the specific setting.

build()

forward(x): The output is the result of the final average pooling layer with 1024 dimensions.

extract_features(x)

kale.embed.video.i3d.i3d(name, num_channels, num_classes, pretrained=False, progress=True): Get InceptionI3d module w/o pretrained model.

kale.embed.video.i3d.i3d_joint(rgb_pt, flow_pt, num_classes, pretrained=False, progress=True)

Get I3D models for different inputs.

Parameters:

rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for flow input.
num_classes (int) – the class number of dataset.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains RGB and flow models.

Return type:

models (dictionary)

kale.embed.video.res3d module

Define MC3_18, R3D_18, R2plus1D_18 on Action Recognition from https://arxiv.org/abs/1711.11248 Created by Xianyuan Liu from modifying https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py

class kale.embed.video.res3d.Conv3DSimple(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions for R3D (3x3x3 kernel)

static get_downsample_stride(stride)

class kale.embed.video.res3d.Conv2Plus1D(in_planes, out_planes, midplanes, stride=1, padding=1)

Bases: Sequential

(2+1)D convolutions for R2plus1D (1x3x3 kernel + 3x1x1 kernel)

static get_downsample_stride(stride)

class kale.embed.video.res3d.Conv3DNoTemporal(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions without temporal dimension for MCx (1x3x3 kernel)

static get_downsample_stride(stride)

class kale.embed.video.res3d.SELayerMixin

Bases: object

Provide reusable helpers for applying squeeze-excitation submodules.

class kale.embed.video.res3d.BasicBlock(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: SELayerMixin, Module

Basic ResNet building block. Each block consists of two convolutional layers with a ReLU activation function after each layer and residual connections. Optional squeeze-excitation layers are applied via SELayerMixin.

expansion = 1

forward(x)

class kale.embed.video.res3d.Bottleneck(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

BottleNeck building block. Default: No use. Each block consists of two 1*n*n and one n*n*n convolutional layers with a ReLU activation function after each layer and residual connections.

expansion = 4

forward(x)

class kale.embed.video.res3d.BasicStem

Bases: Sequential

The default conv-batchnorm-relu stem. The first layer normally. (64 3x7x7 kernels)

class kale.embed.video.res3d.BasicFLowStem

Bases: Sequential

The default stem for optical flow.

class kale.embed.video.res3d.R2Plus1dStem

Bases: Sequential

R(2+1)D stem is different than the default one as it uses separated 3D convolution. (45 1x7x7 kernels + 64 3x1x1 kernel)

class kale.embed.video.res3d.R2Plus1dFlowStem

Bases: Sequential

R(2+1)D stem for optical flow.

class kale.embed.video.res3d.VideoResNet(block, conv_makers, layers, stem, num_classes=400, zero_init_residual=False)

Bases: Module

replace_fc(num_classes, block=<class 'kale.embed.video.res3d.BasicBlock'>): Update the output size with num_classes according to the specific setting.

forward(x)

kale.embed.video.res3d.r3d_18_rgb(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for RGB as in https://arxiv.org/abs/1711.11248

Parameters:

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

R3D-18 network

Return type:

nn.Module

kale.embed.video.res3d.r3d_18_flow(pretrained=False, progress=True, **kwargs): Construct 18 layer Resnet3D model for optical flow.

kale.embed.video.res3d.mc3_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for RGB as in https://arxiv.org/abs/1711.11248

Parameters:

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

MC3 Network definition

Return type:

nn.Module

kale.embed.video.res3d.mc3_18_flow(pretrained=False, progress=True, **kwargs): Constructor for 18 layer Mixed Convolution network for optical flow.

kale.embed.video.res3d.r2plus1d_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for RGB as in https://arxiv.org/abs/1711.11248

Parameters:

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

R(2+1)D-18 network

Return type:

nn.Module

kale.embed.video.res3d.r2plus1d_18_flow(pretrained=False, progress=True, **kwargs): Constructor for the 18 layer deep R(2+1)D network for optical flow.

kale.embed.video.res3d.r3d(rgb=False, flow=False, pretrained=False, progress=True): Get R3D_18 models.

kale.embed.video.res3d.mc3(rgb=False, flow=False, pretrained=False, progress=True): Get MC3_18 models.

kale.embed.video.res3d.r2plus1d(rgb=False, flow=False, pretrained=False, progress=True): Get R2PLUS1D_18 models.

kale.embed.video.se_i3d module

Add SELayers to I3D

class kale.embed.video.se_i3d.SEInceptionI3DRGB(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for RGB input. :param num_channels: the channel number of the input. :type num_channels: int :param num_classes: the class number of dataset. :type num_classes: int :param attention: the name of the SELayer.

(Options: [“SELayerC”, “SELayerT”, “SELayerMC”, “SELayerMAC”, “SELayerCT” and “SELayerTC”])

Returns:: I3D model with SELayers.
Return type:: model (VideoResNet)

forward(x)

class kale.embed.video.se_i3d.SEInceptionI3DFlow(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for optical flow input.

forward(x)

kale.embed.video.se_i3d.se_inception_i3d(name, num_channels, num_classes, attention, pretrained=False, progress=True, rgb=True): Get InceptionI3d module w/o SELayer and pretrained model.

kale.embed.video.se_i3d.se_i3d_joint(rgb_pt, flow_pt, num_classes, attention, pretrained=False, progress=True)

Get I3D models with SELayers for different inputs.

Parameters:

rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for optical flow input.
num_classes (int) – the class number of dataset.
attention (string, optional) – the name of the SELayer.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains models for RGB and optical flow.

Return type:

models (dictionary)

kale.embed.video.se_res3d module

Add SELayers to MC3_18, R3D_18, R2plus1D_18

kale.embed.video.se_res3d.se_r3d_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_r3d_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_mc3_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_mc3_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_r2plus1d_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_r2plus1d_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video.se_res3d.se_r3d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models with SELayers for different inputs.

Parameters:

attention (string) – the name of the SELayer.
rgb (bool) – choose if RGB model is needed. (Default: False)
flow (bool) – choose if optical flow model is needed. (Default: False)
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains models for RGB and optical flow.

Return type:

models (dictionary)

kale.embed.video.se_res3d.se_mc3(attention, rgb=False, flow=False, pretrained=False, progress=True): Get MC3_18 models with SELayers for different inputs.

kale.embed.video.se_res3d.se_r2plus1d(attention, rgb=False, flow=False, pretrained=False, progress=True): Get R2+1D_18 models with SELayers for different inputs.

kale.embed.video.selayer module

Python implementation of Squeeze-and-Excitation Layers (SELayer) Initial implementation: channel-wise (SELayerC) Improved implementation: temporal-wise (SELayerT), max-pooling-based channel-wise (SELayerMC), multi-pooling-based channel-wise (SELayerMAC), and their combinations (SELayerCT, SELayerTC).

References

Hu Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” In CVPR, pp. 7132-7141. 2018. For initial implementation, please go to https://github.com/hujie-frank/SENet

kale.embed.video.selayer.get_selayer(attention)

Returns a SELayer class based on the attention identifier.

Parameters:: attention – Name of the SELayer implementation to retrieve.
Returns:: Subclass corresponding to the requested attention name.
Return type:: SELayer
Raises:: ValueError – If the provided attention name is unsupported.

class kale.embed.video.selayer.BaseSELayer(channel, reduction=16)

Bases: Module

Base class for squeeze-and-excitation layers.

Parameters:

channel – Total number of channels expected by the layer.
reduction – Reduction ratio applied inside the excitation block.

forward(x)

class kale.embed.video.selayer.SELayerC(channel, reduction=16)

Bases: BaseSELayer

Construct channel-wise SELayer.

class kale.embed.video.selayer.SELayerT(channel, reduction=2)

Bases: BaseSELayer

Construct temporal-wise SELayer.

class kale.embed.video.selayer.SELayerMC(channel, reduction=16)

Bases: BaseSELayer

Construct channel-wise SELayer with max pooling.

class kale.embed.video.selayer.SELayerMAC(channel, reduction=16)

Bases: BaseSELayer

Construct channel-wise SELayer with the mix of average pooling and max pooling.

class kale.embed.video.selayer.SELayerCT(channel, temporal, channel_reduction=16, temporal_reduction=2)

Bases: Module

Compose channel SELayer followed by temporal SELayer.

forward(x)

class kale.embed.video.selayer.SELayerTC(channel, temporal, channel_reduction=16, temporal_reduction=2)

Bases: SELayerCT

Compose temporal SELayer followed by channel SELayer.

forward(x)

Embed

Submodules

kale.embed.attention module

kale.embed.cnn module

kale.embed.factorization module

kale.embed.gcn module

kale.embed.image_cnn module

kale.embed.model_lib.ban module

kale.embed.model_lib.gripnet module

kale.embed.model_lib.mogonet module

kale.embed.multimodal_encoder module

kale.embed.multimodal_fusion module

kale.embed.nn module

kale.embed.signal_cnn module

kale.embed.uncertainty_fitting module

kale.embed.video.feature_extractor module

kale.embed.video.i3d module

kale.embed.video.res3d module

kale.embed.video.se_i3d module

kale.embed.video.se_res3d module

kale.embed.video.selayer module

Module contents