Embed

Submodules

kale.embed.attention_cnn module

class kale.embed.attention_cnn.ContextCNNGeneric(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], contextualizer: Module | Any, output_type: str)

Bases: Module

A template to construct a feature extractor consisting of a CNN followed by a sequence-to-sequence contextualizer like a Transformer-Encoder. Before inputting the CNN output tensor to the contextualizer, the tensor’s spatial dimensions are unrolled into a sequence.

Parameters:
  • cnn (nn.Module) – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width).

  • cnn_output_shape (tuple) – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).

  • contextualizer (nn.Module, optional) – A sequence-to-sequence model that takes inputs of shape (num_timesteps, batch_size, num_features) and uses attention to contextualize the sequence and returns a sequence of the exact same shape. This will mainly be a Transformer-Encoder (required).

  • output_type (string) – One of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is a sequence, will be reshaped to resemble the image-batch shape of the output of the CNN. If Sequence then the output sequence is returned as is (required).

Examples

>>> cnn = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3),
>>>                     nn.Conv2d(32, 64, kernel_size=3),
>>>                     nn.MaxPool2d(2))
>>> cnn_output_shape = (-1, 64, 8, 8)
>>> contextualizer = nn.TransformerEncoderLayer(...)
>>> output_type = 'spatial'
>>>
>>> attention_cnn = ContextCNNGeneric(cnn, cnn_output_shape, contextualizer, output_type)
>>> output = attention_cnn(torch.randn((32,3,16,16)))
>>>
>>> output.size() == cnn_output_shape # True
forward(x: Tensor)

Pass the input through the cnn and then the contextualizer.

Parameters:

x – input image batch exactly as for CNNs (required).

class kale.embed.attention_cnn.CNNTransformer(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], num_layers: int, num_heads: int, dim_feedforward: int, dropout: float, output_type: str, positional_encoder: Module | None = None)

Bases: ContextCNNGeneric

A feature extractor consisting of a given CNN backbone followed by a standard Transformer-Encoder. See documentation of “ContextCNNGeneric” for more information.

Parameters:
  • cnn (nn.Module) – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width) (required).

  • cnn_output_shape (tuple) – a tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).

  • num_layers (int) – number of attention layers in the Transformer-Encoder (required).

  • num_heads (int) – number of attention heads in each transformer block (required).

  • dim_feedforward (int) – number of neurons in the intermediate dense layer of each transformer feedforward block (required).

  • dropout (float) – dropout rate of the transformer layers (required).

  • output_type (string) – one of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is the sequence output of the Transformer-Encoder, will be reshaped to resemble the image-batch shape of the output of the CNN (required).

  • positional_encoder (nn.Module) – None or a nn.Module that expects inputs of shape (sequence_length, batch_size, embedding_dim) and returns the same input after adding some positional information to the embeddings. If None, then the default and fixed sin-cos positional encodings of base transformers are applied (optional).

Examples

See pykale/examples/cifar_cnntransformer/model.py

kale.embed.factorization module

Python implementation of a tensor factorization algorithm Multilinear Principal Component Analysis (MPCA) and a matrix factorization algorithm Maximum Independence Domain Adaptation (MIDA)

class kale.embed.factorization.MPCA(var_ratio=0.97, max_iter=1, vectorize=False, n_components=None)

Bases: BaseEstimator, TransformerMixin

MPCA implementation compatible with scikit-learn

Parameters:
  • var_ratio (float, optional) – Percentage of variance explained (between 0 and 1). Defaults to 0.97.

  • max_iter (int, optional) – Maximum number of iteration. Defaults to 1.

  • vectorize (bool) – Whether return the transformed/projected tensor in vector. Defaults to False.

  • n_components (int) – Number of components to keep. Applies only when vectorize=True. Defaults to None.

proj_mats

A list of transposed projection matrices, shapes (P_1, I_1), …, (P_N, I_N), where P_1, …, P_N are output tensor shape for each sample.

Type:

list of arrays

idx_order

The ordering index of projected (and vectorized) features in decreasing variance.

Type:

array-like

mean_

Per-feature empirical mean, estimated from the training set, shape (I_1, I_2, …, I_N).

Type:

array-like

shape_in

Input tensor shapes, i.e. (I_1, I_2, …, I_N).

Type:

tuple

shape_out

Output tensor shapes, i.e. (P_1, P_2, …, P_N).

Type:

tuple

Reference:

Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “MPCA: Multilinear Principal Component Analysis of Tensor Objects”, IEEE Transactions on Neural Networks, Vol. 19, No. 1, Page: 18-39, January 2008. For initial Matlab implementation, please go to https://uk.mathworks.com/matlabcentral/fileexchange/26168.

Examples

>>> import numpy as np
>>> from kale.embed.factorization import MPCA
>>> x = np.random.random((40, 20, 25, 20))
>>> x.shape
(40, 20, 25, 20)
>>> mpca = MPCA()
>>> x_projected = mpca.fit_transform(x)
>>> x_projected.shape
(40, 18, 23, 18)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 7452)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 50)
>>> x_rec = mpca.inverse_transform(x_projected)
>>> x_rec.shape
(40, 20, 25, 20)
fit(x, y=None)

Fit the model with input training data x.

Args
x (array-like tensor): Input data, shape (n_samples, I_1, I_2, …, I_N), where n_samples is the number of

samples, I_1, I_2, …, I_N are the dimensions of corresponding mode (1, 2, …, N), respectively.

y (None): Ignored variable.

Returns:

self (object). Returns the instance itself.

transform(x)

Perform dimension reduction on x

Parameters:

x (array-like tensor) – Data to perform dimension reduction, shape (n_samples, I_1, I_2, …, I_N).

Returns:

Projected data in lower dimension, shape (n_samples, P_1, P_2, …, P_N) if self.vectorize==False. If self.vectorize==True, features will be sorted based on their explained variance ratio, shape (n_samples, P_1 * P_2 * … * P_N) if self.n_components is None, and shape (n_samples, n_components) if self.n_component is a valid integer.

Return type:

array-like tensor

inverse_transform(x)

Reconstruct projected data to the original shape and add the estimated mean back

Parameters:

x (array-like tensor) – Data to be reconstructed, shape (n_samples, P_1, P_2, …, P_N), if self.vectorize == False, where P_1, P_2, …, P_N are the reduced dimensions of corresponding mode (1, 2, …, N), respectively. If self.vectorize == True, shape (n_samples, self.n_components) or shape (n_samples, P_1 * P_2 * … * P_N).

Returns:

Reconstructed tensor in original shape, shape (n_samples, I_1, I_2, …, I_N)

Return type:

array-like tensor

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MPCA

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_inverse_transform_request(*, x: bool | None | str = '$UNCHANGED$') MPCA

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in inverse_transform.

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') MPCA

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.

Returns:

self – The updated object.

Return type:

object

class kale.embed.factorization.MIDA(n_components, kernel='linear', lambda_=1.0, mu=1.0, eta=1.0, augmentation=False, kernel_params=None)

Bases: BaseEstimator, TransformerMixin

Maximum independence domain adaptation :param n_components: Number of components to keep. :type n_components: int :param kernel: “linear”, “rbf”, or “poly”. Kernel to use for MIDA. Defaults to “linear”. :type kernel: str :param mu: Hyperparameter of the l2 penalty. Defaults to 1.0. :type mu: float :param eta: Hyperparameter of the label dependence. Defaults to 1.0. :type eta: float :param augmentation: Whether using covariates as augment features. Defaults to False. :type augmentation: bool :param kernel_params: Parameters for the kernel. Defaults to None. :type kernel_params: dict or None

References

[1] Yan, K., Kou, L. and Zhang, D., 2018. Learning domain-invariant subspace using domain features and

independence maximization. IEEE transactions on cybernetics, 48(1), pp.288-299.

fit(x, y=None, covariates=None)
Parameters:
  • x – array-like. Input data, shape (n_samples, n_features)

  • y – array-like. Labels, shape (nl_samples,)

  • covariates – array-like. Domain co-variates, shape (n_samples, n_co-variates)

Note

Unsupervised MIDA is performed if y is None. Semi-supervised MIDA is performed is y is not None.

fit_transform(x, y=None, covariates=None)
Parameters:
  • x – array-like, shape (n_samples, n_features)

  • y – array-like, shape (n_samples,)

  • covariates – array-like, shape (n_samples, n_covariates)

Returns:

array-like, shape (n_samples, n_components)

Return type:

x_transformed

set_fit_request(*, covariates: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') MIDA

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • covariates (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for covariates parameter in fit.

  • x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, covariates: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') MIDA

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • covariates (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for covariates parameter in transform.

  • x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(x, covariates=None)
Parameters:
  • x – array-like, shape (n_samples, n_features)

  • covariates – array-like, augmentation features, shape (n_samples, n_covariates)

Returns:

array-like, shape (n_samples, n_components)

Return type:

x_transformed

kale.embed.feature_fusion module

This module implements three different multimodal fusion methods: 1. Concat 2. BimodalInteractionFusion 3. LowRankTensorFusion Each of these fusion methods are designed to work with input modalities as PyTorch tensors and perform different operations to combine and create a joint representation of the input data. Reference: https://github.com/pliang279/MultiBench/blob/main/fusions/common_fusions.py

class kale.embed.feature_fusion.Concat

Bases: Module

Concat is a simple PyTorch module for fusing multimodal data by concatenating tensors along dimension 1. This fusion method is often used in multimodal learning where data from different modalities (e.g., image, audio) are processed separately and then fused together for further processing or decision-making. Each modality data is first flattened from its second dimension onward and then these flattened tensors are concatenated together. This approach to fusion maintains the independence of the modalities before the fusion point, allowing the network to learn separate representations for each modality before combining them.

forward(modalities)
class kale.embed.feature_fusion.BimodalInteractionFusion(input_dims, output_dim, output, flatten=False, clip=None, grad_clip=None, flip=False)

Bases: Module

BimodalInteractionFusion is a PyTorch module that performs fusion of two data modalities through a hypernetwork-based interaction mechanism. The ‘input_dims’ argument specifies the input dimensions of the two modalities. The ‘output_dim’ argument specifies the output dimension after the fusion. The ‘output’ argument defines the type of bimodal matrix interactions to be performed, which can be ‘matrix’, ‘vector’, or ‘scalar’.

This fusion method supports three types of bimodal interactions:
  • Matrix: It implements a general hypernetwork mechanism where the interaction is multiplicative. It uses

separate weight matrices and biases for the two modalities. - Vector: It uses diagonal forms and gating mechanisms, applying element-wise multiplication to combine the modalities. - Scalar: It applies scales and biases to the input modalities before combining them.

This fusion method uses xavier normal distribution for initializing the weight matrices and normal distribution for the biases. It also provides options to clip the parameter values and their gradients within specified ranges to prevent them from exploding or vanishing. This fusion approach allows for complex interactions between the modalities and is well-suited for tasks that require the integration of heterogeneous data.

Parameters:
  • input_dims (int) – list or tuple of 2 integers indicating input dimensions of the 2 modalities

  • output_dim (int) – output dimension after the fusion

  • output (str) – type of BimodalMatrix Interactions, options from ‘matrix’,’vector’,’scalar’

  • flatten (bool) – whether we need to flatten the input modalities

  • clip (tuple, optional) – clip parameter values, None if no clip

  • grad_clip (tuple, optional) – clip grad values, None if no clip

  • flip (bool) – whether to swap the two input modalities in forward function or not

forward(modalities)
class kale.embed.feature_fusion.LowRankTensorFusion(input_dims, output_dim, rank, flatten=True)

Bases: Module

LowRankTensorFusion is a PyTorch module that performs multimodal fusion using a low-rank tensor-based approach.

The ‘input_dims’ argument specifies the input dimensions of each modality. The ‘output_dim’ argument defines the output dimension after the fusion. The ‘rank’ argument is a hyperparameter specifying the rank for the low-rank tensor decomposition. This fusion method performs fusion by assuming a low-rank structure for the interaction tensor, effectively compressing the interaction space. It leverages a set of low-rank factors, one for each modality, that are learned during training. These factors are initialized with xavier normal distribution and are applied to their corresponding modalities during the forward pass. A tensor product is computed across all modalities and their respective factors, resulting in a fused tensor. Next, a weighted summation of this fused tensor is computed using fusion weights, followed by the addition of a fusion bias. Both fusion weights and bias are learnable parameters initialized with xavier normal distribution and zero respectively. The final output is reshaped to the specified ‘output_dim’ and returned. If ‘flatten’ is set to True, each modality is first flattened before concatenation with a ones tensor and the subsequent multiplication with its factor. This approach provides an efficient and compact representation for capturing interactions among multiple modalities, making it suitable for tasks involving high-dimensional multimodal data.

Parameters:
  • input_dims (int) – A list or tuple of integers indicating input dimensions of the modalities.

  • output_dim (int) – output dimension after the fusion.

  • rank (int) – A hyperparameter specifying the rank for the low-rank tensor decomposition.

  • flatten (bool) – Boolean to dictate if output should be flattened or not. Default: True

forward(modalities)

kale.embed.gcn module

class kale.embed.gcn.GCNEncoderLayer(in_channels, out_channels, improved=False, cached=False, bias=True, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.GCNConv, which reduces the computational cost of GCN layer for GripNet model. The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” (ICLR 2017) paper.

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},\]

where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix.

Note: For more information please see Pytorch Geomertic’s nn.GCNConv docs.

Parameters:
  • in_channels (int) – Size of each input sample.

  • out_channels (int) – Size of each output sample.

  • improved (bool, optional) – If set to True, the layer computes \(\mathbf{\hat{A}}\) as \(\mathbf{A} + 2\mathbf{I}\). (default: False)

  • cached (bool, optional) – If set to True, the layer will cache the computation of \(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}\) on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()
static norm(edge_index, num_nodes, edge_weight, improved=False, dtype=None)

Add self-loops and apply symmetric normalization

forward(x, edge_index, edge_weight=None)
Parameters:
  • x (torch.Tensor) – The input node feature embedding.

  • edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].

  • edge_weight (torch.Tensor, optional) – The one-dimensional relation weight for each edge in edge_index (default: None).

class kale.embed.gcn.RGCNEncoderLayer(in_channels, out_channels, num_relations, num_bases, after_relu, bias=False, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.RGCNConv, which reduces the computational and memory cost of RGCN encoder layer for GripNet model. The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper.

\[\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,\]

where \(\mathcal{R}\) denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional torch.long tensor which stores a relation identifier \(\in \{ 0, \ldots, |\mathcal{R}| - 1\}\) for each edge.

Note: For more information please see Pytorch Geomertic’s nn.RGCNConv docs.

Parameters:
  • in_channels (int) – Size of each input sample.

  • out_channels (int) – Size of each output sample.

  • num_relations (int) – Number of edge relations.

  • num_bases (int) – Use bases-decoposition regulatization scheme and num_bases denotes the number of bases.

  • after_relu (bool) – Whether input embedding is activated by relu function or not.

  • bias (bool) – If set to False, the layer will not learn an additive bias. (default: False)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()
forward(x, edge_index, edge_type, range_list)
Parameters:
  • x (torch.Tensor) – The input node feature embedding.

  • edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].

  • edge_type (torch.Tensor) – The one-dimensional relation type/index for each edge in edge_index.

  • range_list (torch.Tensor) – The index range list of each edge type with shape [num_types, 2].

kale.embed.gripnet module

The GripNet proposed in the `”GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs”

<https://doi.org/10.1016/j.patcog.2022.108973>`_ (PatternRecognit 2022) paper, which is an efficient framework to learn node representations on heterogeneous graphs for the downstream link prediction, node classification, and visualization. The code is based on the https://github.com/NYXFLOWER/GripNet.

class kale.embed.gripnet.GripNetInternalModule(in_channels: int, num_edge_type: int, start_supervertex: bool, setting: SuperVertexParaSetting)

Bases: Module

The internal module of a supervertex, which is composed of an internal feature layer and multiple internal aggregation layers.

Parameters:
  • in_channels (int) – the dimension of node features on this supervertex.

  • num_edge_type (int) – the number of edge types on this supervertex.

  • start_supervertex (bool) – whether this supervertex is a start supervertex on the supergraph.

  • setting (SuperVertexParaSetting) – supervertex parameter settings.

forward(x: Tensor, edge_index: Tensor, edge_type: Tensor | None = None, range_list: Tensor | None = None, edge_weight: Tensor | None = None) Tensor
Parameters:
  • x (torch.Tensor) – the input node feature embedding.

  • edge_index (torch.Tensor) – edge index in COO format with shape [2, #edges].

  • edge_type (torch.Tensor, optional) – one-dimensional relation type for each edge, indexed from 0. Defaults to None.

  • range_list (torch.Tensor, optional) – The index range list of each edge type with shape [num_types, 2]. Defaults to None.

  • edge_weight (torch.Tensor, optional) – one-dimensional weight for each edge. Defaults to None.

Note: The internal feature layer is computed in the forward function of GripNet class. If the supervertex is not a start supervertex, x should be the sum or concat of the outputs of the internal feature layer and all external aggregation layers.

class kale.embed.gripnet.GripNetExternalModule(in_channels: int, out_channels: int, num_out_node: int)

Bases: Module

The internal module of a supervertex, which is an external feature layer.

Parameters:
  • in_channels (int) – Size of each input sample. In GripNet, it should be the dimension of the output embedding of

  • supervertex. (the corresponding parent) –

  • out_channels (int) – Size of each output sample. In GripNet, it is the dimension of the output embedding of the supervertex.

  • num_out_node (int) – the number of output nodes.

forward(x: Tensor, edge_index: Tensor, edge_weight: Tensor | None = None, use_relu=True)
Parameters:
  • x (torch.Tensor) – the input node feature embedding.

  • edge_index (torch.Tensor) – edge index in COO format with shape [2, #edges].

  • edge_weight (torch.Tensor, optional) – one-dimensional weight for each edge. Defaults to None.

  • use_relu (bool, optional) – whether to use ReLU before returning node feature embeddings. Defaults to True.

class kale.embed.gripnet.GripNet(supergraph: SuperGraph)

Bases: Module

The GripNet model.

Parameters:

supergraph (SuperGraph) – the supergraph.

Reference:

Xu, H., Sang, S., Bai, P., Li, R., Yang, L. and Lu, H., 2022. GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs. Pattern Recognition, p.108973.

forward()

kale.embed.image_cnn module

CNNs for extracting features from small images of size 32x32 (e.g. MNIST) and regular images of size 224x224 (e.g. ImageNet). The code is based on https://github.com/criteo-research/pytorch-ada/blob/master/adalib/ada/models/modules.py,

which is for domain adaptation.

class kale.embed.image_cnn.Flatten

Bases: Module

Flatten layer This module is to replace the last fc layer of the pre-trained model with a flatten layer. It flattens the input tensor to a 2D vector, which is (B, N). B is the batch size and N is the product of all dimensions except the batch size.

Examples

>>> x = torch.randn(8, 3, 224, 224)
>>> x = Flatten()(x)
>>> print(x.shape)
>>> (8, 150528)
forward(x)
class kale.embed.image_cnn.Identity

Bases: Module

Identity layer This module is to replace any unwanted layers in a pre-defined model with an identity layer. It returns the input tensor as the output.

Examples

>>> x = torch.randn(8, 3, 224, 224)
>>> x = Identity()(x)
>>> print(x.shape)
>>> (8, 3, 224, 224)
forward(x)
class kale.embed.image_cnn.SmallCNNFeature(num_channels=3, kernel_size=5)

Bases: Module

A feature extractor for small 32x32 images (e.g. CIFAR, MNIST) that outputs a feature vector of length 128.

Parameters:
  • num_channels (int) – the number of input channels (default=3).

  • kernel_size (int) – the size of the convolution kernel (default=5).

Examples::
>>> feature_network = SmallCNNFeature(num_channels)
forward(input_)
output_size()
class kale.embed.image_cnn.SimpleCNNBuilder(conv_layers_spec, activation_fun='relu', use_batchnorm=True, pool_locations=(0, 3), num_channels=3)

Bases: Module

A builder for simple CNNs to experiment with different basic architectures.

Parameters:
  • num_channels (int) – the number of input channels. Defaults to 3.

  • conv_layers_spec (list) – a list for each convolutional layer given as [num_channels, kernel_size]. For example, [[16, 3], [16, 1]] represents 2 layers with 16 filters and kernel sizes of 3 and 1 respectively.

  • activation_fun (str) – a string specifying the activation function to use. one of (‘relu’, ‘elu’, ‘leaky_relu’). Defaults to “relu”.

  • use_batchnorm (boolean) – a boolean flag indicating whether to use batch normalization. Defaults to True.

  • pool_locations (tuple) – the index after which pooling layers should be placed in the convolutional layer list. Defaults to (0,3). (0,3) means placing 2 pooling layers after the first and fourth convolutional layer.

  • num_channels – the number of input channels. Defaults to 3.

activations = {'elu': ELU(alpha=1.0), 'leaky_relu': LeakyReLU(negative_slope=0.01), 'relu': ReLU()}
forward(x)
class kale.embed.image_cnn.ResNet18Feature(weights=ResNet18_Weights.IMAGENET1K_V1)

Bases: Module

Modified ResNet18 (without the last layer) feature extractor for regular 224x224 images.

Parameters:
forward(x)
output_size()
class kale.embed.image_cnn.ResNet34Feature(weights=ResNet34_Weights.IMAGENET1K_V1)

Bases: Module

Modified ResNet34 (without the last layer) feature extractor for regular 224x224 images.

Parameters:
forward(x)
output_size()
class kale.embed.image_cnn.ResNet50Feature(weights=ResNet50_Weights.IMAGENET1K_V2)

Bases: Module

Modified ResNet50 (without the last layer) feature extractor for regular 224x224 images.

Parameters:
forward(x)
output_size()
class kale.embed.image_cnn.ResNet101Feature(weights=ResNet101_Weights.IMAGENET1K_V2)

Bases: Module

Modified ResNet101 (without the last layer) feature extractor for regular 224x224 images.

Parameters:
forward(x)
output_size()
class kale.embed.image_cnn.ResNet152Feature(weights=ResNet152_Weights.IMAGENET1K_V2)

Bases: Module

Modified ResNet152 (without the last layer) feature extractor for regular 224x224 images.

Parameters:
forward(x)
output_size()
class kale.embed.image_cnn.LeNet(input_channels, output_channels, additional_layers, output_each_layer=False, linear=None, squeeze_output=True)

Bases: Module

LeNet is a customizable Convolutional Neural Network (CNN) model based on the LeNet architecture, designed for feature extraction from image and audio modalities.

LeNet supports several layers of 2D convolution, followed by batch normalization, max pooling, and adaptive average pooling, with a configurable number of channels. The depth of the network (number of convolutional blocks) is adjustable with the ‘additional_layers’ parameter. An optional linear layer can be added at the end for further transformation of the output, which could be useful for various tasks such as classification or regression. The ‘output_each_layer’ option allows for returning the output of each layer instead of just the final output, which can be beneficial for certain tasks or for analyzing the intermediate representations learned by the network. By default, the output tensor is squeezed before being returned, removing dimensions of size one, but this can be configured with the ‘squeeze_output’ parameter.

Parameters:
  • input_channels (int) – Input channel number.

  • output_channels (int) – Output channel number for block.

  • additional_layers (int) – Number of additional blocks for LeNet.

  • output_each_layer (bool, optional) – Whether to return the output of all layers. Defaults to False.

  • linear (tuple, optional) – Tuple of (input_dim, output_dim) for optional linear layer post-processing. Defaults to None.

  • squeeze_output (bool, optional) – Whether to squeeze output before returning. Defaults to True.

forward(x)

kale.embed.mogonet module

Construct a message passing network using PyTorch Geometric for the MOGONET method. MOGONET is a multiomics fusion framework for cancer classification and biomarker identification that utilizes supervised graph convolutional networks for omics datasets.

This code is written by refactoring the MOGONET code (https://github.com/txWang/MOGONET/blob/main/models.py) within the ‘MessagePassing’ base class provided in the PyTorch Geometric.

Reference: Wang, T., Shao, W., Huang, Z., Tang, H., Zhang, J., Ding, Z., Huang, K. (2021). MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nature communications. https://www.nature.com/articles/s41467-021-23774-w

class kale.embed.mogonet.MogonetGCNConv(in_channels: int, out_channels: int, bias: bool = True, aggr: str | List[str] | Aggregation | None = 'add', **kwargs)

Bases: MessagePassing

Create message passing layers for the MOGONET method. Each layer is defined as:

\[H^{(l+1)}=f(H^{(l)}, A) = \sigma(AH^{(l)}W^{(l)})\]

where \(\mathbf{H^{(l)}}\) is the input of the \(l\)-th layer and \(\mathbf{W^{(l)}}\) is the weight matrix of the \(l\)-th layer. \(\sigma(.)\) denotes a non-linear activation function.

For more information please refer to the MOGONET paper.

Parameters:
  • in_channels (int) – Size of each input sample.

  • out_channels (int) – Size of each output sample.

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • aggr (string or list or Aggregation, optional) – The aggregation scheme to use, e.g., "add", "sum", "mean", "min", "max" or "mul".

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters() None

Reset all parameters of the model.

forward(x: Tensor, edge_index: SparseTensor) Tensor
message(x_j: Tensor) Tensor

Construct messages from node \(j\) to node \(i\) for each edge in edge_index.

message_and_aggregate(adj_t: SparseTensor | Tensor, x: Tensor) Tensor

Fuse computations of message() and aggregate() into a single function.

update(aggr_out: Tensor) Tensor

Update node embeddings for each node \(i \in \mathcal{V}\).

class kale.embed.mogonet.MogonetGCN(in_channels: int, hidden_channels: List[int], dropout: float)

Bases: Module

Create the structure of the graph convolutional network in the MOGONET method. For more information please refer to the MOGONET paper.

Parameters:
  • in_channels (int) – Size of each input sample.

  • hidden_channels (List[int]) – A list of sizes of hidden layers.

  • dropout (float) – Probability of an element to be zeroed.

forward(x: Tensor, edge_index: SparseTensor) Tensor

kale.embed.positional_encoding module

class kale.embed.positional_encoding.PositionalEncoding(d_model: int, max_len: int = 5000)

Bases: Module

Implements the positional encoding as described in the NIPS2017 paper ‘Attention Is All You Need’ about Transformers (https://arxiv.org/abs/1706.03762). Essentially, for all timesteps in a given sequence, adds information about the relative temporal location of a timestep directly into the features of that timestep, and then returns this slightly-modified, same-shape sequence.

Parameters:
  • d_model – the number of features that each timestep has (required).

  • max_len – the maximum sequence length that the positional encodings should support (required).

forward(x)

Expects input of shape (sequence_length, batch_size, num_features) and returns output of the same shape. sequence_length is at most allowed to be self.max_len and num_features is expected to be exactly self.d_model

Parameters:

x – a sequence input of shape (sequence_length, batch_size, num_features) (required).

kale.embed.seq_nn module

DeepDTA based models for drug-target interaction prediction problem.

class kale.embed.seq_nn.CNNEncoder(num_embeddings, embedding_dim, sequence_length, num_kernels, kernel_length)

Bases: Module

The DeepDTA’s CNN encoder module, which comprises three 1D-convolutional layers and one max-pooling layer. The module is applied to encoding drug/target sequence information, and the input should be transformed information with integer/label encoding. The original paper is “DeepDTA: deep drug–target binding affinity prediction” .

Parameters:
  • num_embeddings (int) – Number of embedding labels/categories, depends on the types of encoding sequence.

  • embedding_dim (int) – Dimension of embedding labels/categories.

  • sequence_length (int) – Max length of input sequence.

  • num_kernels (int) – Number of kernels (filters).

  • kernel_length (int) – Length of kernel (filter).

forward(x)
class kale.embed.seq_nn.GCNEncoder(in_channel=78, out_channel=128, dropout_rate=0.2)

Bases: Module

The GraphDTA’s GCN encoder module, which comprises three graph convolutional layers and one full connected layer. The model is a variant of DeepDTA and is applied to encoding drug molecule graph information. The original paper is “GraphDTA: Predicting drug–target binding affinity with graph neural networks” .

Parameters:
  • in_channel (int) – Dimension of each input node feature.

  • out_channel (int) – Dimension of each output node feature.

  • dropout_rate (float) – dropout rate during training.

forward(x, edge_index, batch)

kale.embed.uncertainty_fitting module

Module from the implementation of L. A. Schobs, A. J. Swift and H. Lu, “Uncertainty Estimation for Heatmap-Based Landmark Localization,” in IEEE Transactions on Medical Imaging, vol. 42, no. 4, pp. 1021-1034, April 2023, doi: 10.1109/TMI.2022.3222730.

Functions related to use the validation data to fit the uncertainty boundaries with error bounds. Also bins the test data and saves the results.

kale.embed.uncertainty_fitting.fit_and_predict(target_idx: int, uncertainty_error_pairs: List[List], ue_pairs_val: str, ue_pairs_test: str, num_bins: int, config: CfgNode, groundtruth_test_errors: bool, save_folder: str | None = None) Tuple[DataFrame, DataFrame, DataFrame]

Loads (validation, testing data) pairs of (uncertainty, error) pairs and for each fold. Uses the validation set to generate quantile thresholds. Uses Isotonic Regression with validation set to estimate error bounds. Then bins the test data accordingly. Saves predicted bins and error bounds to a csv.

Parameters:
  • target_idx (int) – Index of target to perform uncertainty estimation on.

  • uncertainty_error_pairs (list[list]) – List of lists describing the different uncertainty combinations to test.

  • ue_pairs_val (str) – Path to validation pairs (uncertainty, error) data.

  • ue_pairs_test (str) – Path to test pairs (uncertainty, error) data.

  • num_bins (int) – Number of bins for quantile binning.

  • config (CfgNode) – Configuration object with hyperparameters and other settings.

  • groundtruth_test_errors (bool) – Whether ground truth errors are available for test data.

  • save_folder (str, optional) – Path to folder to save results to. If None, results are not saved.

Returns:

A DataFrame with uncertainty boundaries for each fold and uncertainty pairing. error_bound_estimates (pd.DataFrame): A DataFrame with estimated error bounds for each fold and uncertainty pairing. all_testing_results (pd.DataFrame): A DataFrame with predicted test bin values for each fold and uncertainty pairing.

Return type:

all_uncert_boundaries (pd.DataFrame)

kale.embed.video_feature_extractor module

Define the feature extractor for video including I3D, R3D_18, MC3_18 and R2PLUS1D_18 w/o SELayers.

kale.embed.video_feature_extractor.get_video_feat_extractor(model_name, image_modality, attention, num_classes)

Get the feature extractor w/o the pre-trained model and SELayers. The pre-trained models are saved in the path $XDG_CACHE_HOME/torch/hub/checkpoints/. For Linux, default path is ~/.cache/torch/hub/checkpoints/. For Windows, default path is C:/Users/$USER_NAME/.cache/torch/hub/checkpoints/. Provide four pre-trained models: “rgb_imagenet”, “flow_imagenet”, “rgb_charades”, “flow_charades”.

Parameters:
  • model_name (string) – The name of the feature extractor. (Choices=[“I3D”, “R3D_18”, “R2PLUS1D_18”, “MC3_18”])

  • image_modality (string) – Image type. (Choices=[“rgb”, “flow”, “joint”])

  • attention (string) – The attention type. (Choices=[“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”,

  • "SELayerCT"

  • "SELayerTC"

  • "SELayerMAC"])

  • num_classes (int) – The class number of specific dataset. (Default: No use)

Returns:

The network to extract features. class_feature_dim (int): The dimension of the feature network output for ClassNet.

It is a convention when the input dimension and the network is fixed.

domain_feature_dim (int): The dimension of the feature network output for DomainNet.

Return type:

feature_network (dictionary)

kale.embed.video_i3d module

Define Inflated 3D ConvNets(I3D) on Action Recognition from https://ieeexplore.ieee.org/document/8099985 Created by Xianyuan Liu from modifying https://github.com/piergiaj/pytorch-i3d/blob/master/pytorch_i3d.py and https://github.com/deepmind/kinetics-i3d/blob/master/i3d.py

class kale.embed.video_i3d.MaxPool3dSamePadding(kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] | None = None, padding: int | Tuple[int, ...] = 0, dilation: int | Tuple[int, ...] = 1, return_indices: bool = False, ceil_mode: bool = False)

Bases: MaxPool3d

Construct 3d max pool with same padding. PyTorch does not provide same padding. Same padding means the output size matches input size for stride=1.

compute_pad(dim, s)

Get the zero padding number.

forward(x)

Compute ‘same’ padding. Add zero to the back position first.

class kale.embed.video_i3d.Unit3D(in_channels, output_channels, kernel_shape=(1, 1, 1), stride=(1, 1, 1), padding=0, activation_fn=<function relu>, use_batch_norm=True, use_bias=False, name='unit_3d')

Bases: Module

Basic unit containing Conv3D + BatchNorm + non-linearity.

compute_pad(dim, s)

Get the zero padding number.

forward(x)

Connects the module to inputs. Dynamically pad based on input size in forward function. :param x: Inputs to the Unit3D component.

Returns:

Outputs from the module.

class kale.embed.video_i3d.InceptionModule(in_channels, out_channels, name)

Bases: Module

Construct Inception module. Concatenation after four branches (1x1x1 conv; 1x1x1 + 3x3x3 convs; 1x1x1 + 3x3x3 convs; 3x3x3 max-pool + 1x1x1 conv). In forward, we check if SELayers are used, which are channel-wise (SELayerC), temporal-wise (SELayerT), channel-temporal-wise (SELayerTC & SELayerCT).

forward(x)
class kale.embed.video_i3d.InceptionI3d(num_classes=400, spatial_squeeze=True, final_endpoint='Logits', name='inception_i3d', in_channels=3, dropout_keep_prob=0.5)

Bases: Module

Inception-v1 I3D architecture. The model is introduced in:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman https://arxiv.org/pdf/1705.07750v1.pdf.

See also the Inception architecture, introduced in:

Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/pdf/1409.4842v1.pdf.

VALID_ENDPOINTS = ('Conv3d_1a_7x7', 'MaxPool3d_2a_3x3', 'Conv3d_2b_1x1', 'Conv3d_2c_3x3', 'MaxPool3d_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'MaxPool3d_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool3d_5a_2x2', 'Mixed_5b', 'Mixed_5c', 'Logits', 'Predictions')
replace_logits(num_classes)

Update the output size with num_classes according to the specific setting.

build()
forward(x)

The output is the result of the final average pooling layer with 1024 dimensions.

extract_features(x)
kale.embed.video_i3d.i3d(name, num_channels, num_classes, pretrained=False, progress=True)

Get InceptionI3d module w/o pretrained model.

kale.embed.video_i3d.i3d_joint(rgb_pt, flow_pt, num_classes, pretrained=False, progress=True)

Get I3D models for different inputs.

Parameters:
  • rgb_pt (string, optional) – the name of pre-trained model for RGB input.

  • flow_pt (string, optional) – the name of pre-trained model for flow input.

  • num_classes (int) – the class number of dataset.

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains RGB and flow models.

Return type:

models (dictionary)

kale.embed.video_res3d module

Define MC3_18, R3D_18, R2plus1D_18 on Action Recognition from https://arxiv.org/abs/1711.11248 Created by Xianyuan Liu from modifying https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py

class kale.embed.video_res3d.Conv3DSimple(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions for R3D (3x3x3 kernel)

static get_downsample_stride(stride)
class kale.embed.video_res3d.Conv2Plus1D(in_planes, out_planes, midplanes, stride=1, padding=1)

Bases: Sequential

(2+1)D convolutions for R2plus1D (1x3x3 kernel + 3x1x1 kernel)

static get_downsample_stride(stride)
class kale.embed.video_res3d.Conv3DNoTemporal(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions without temporal dimension for MCx (1x3x3 kernel)

static get_downsample_stride(stride)
class kale.embed.video_res3d.BasicBlock(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

Basic ResNet building block. Each block consists of two convolutional layers with a ReLU activation function after each layer and residual connections. In forward, we check if SELayers are used, which are channel-wise (SELayerC) and temporal-wise (SELayerT).

expansion = 1
forward(x)
class kale.embed.video_res3d.Bottleneck(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

BottleNeck building block. Default: No use. Each block consists of two 1*n*n and one n*n*n convolutional layers with a ReLU activation function after each layer and residual connections.

expansion = 4
forward(x)
class kale.embed.video_res3d.BasicStem

Bases: Sequential

The default conv-batchnorm-relu stem. The first layer normally. (64 3x7x7 kernels)

class kale.embed.video_res3d.BasicFLowStem

Bases: Sequential

The default stem for optical flow.

class kale.embed.video_res3d.R2Plus1dStem

Bases: Sequential

R(2+1)D stem is different than the default one as it uses separated 3D convolution. (45 1x7x7 kernels + 64 3x1x1 kernel)

class kale.embed.video_res3d.R2Plus1dFlowStem

Bases: Sequential

R(2+1)D stem for optical flow.

class kale.embed.video_res3d.VideoResNet(block, conv_makers, layers, stem, num_classes=400, zero_init_residual=False)

Bases: Module

replace_fc(num_classes, block=<class 'kale.embed.video_res3d.BasicBlock'>)

Update the output size with num_classes according to the specific setting.

forward(x)
kale.embed.video_res3d.r3d_18_rgb(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for RGB as in https://arxiv.org/abs/1711.11248

Parameters:
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

R3D-18 network

Return type:

nn.Module

kale.embed.video_res3d.r3d_18_flow(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for optical flow.

kale.embed.video_res3d.mc3_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for RGB as in https://arxiv.org/abs/1711.11248

Parameters:
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

MC3 Network definition

Return type:

nn.Module

kale.embed.video_res3d.mc3_18_flow(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for optical flow.

kale.embed.video_res3d.r2plus1d_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for RGB as in https://arxiv.org/abs/1711.11248

Parameters:
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns:

R(2+1)D-18 network

Return type:

nn.Module

kale.embed.video_res3d.r2plus1d_18_flow(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for optical flow.

kale.embed.video_res3d.r3d(rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models.

kale.embed.video_res3d.mc3(rgb=False, flow=False, pretrained=False, progress=True)

Get MC3_18 models.

kale.embed.video_res3d.r2plus1d(rgb=False, flow=False, pretrained=False, progress=True)

Get R2PLUS1D_18 models.

kale.embed.video_se_i3d module

Add SELayers to I3D

class kale.embed.video_se_i3d.SEInceptionI3DRGB(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for RGB input. :param num_channels: the channel number of the input. :type num_channels: int :param num_classes: the class number of dataset. :type num_classes: int :param attention: the name of the SELayer.

(Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”, “SELayerCT” and “SELayerTC”])

Returns:

I3D model with SELayers.

Return type:

model (VideoResNet)

forward(x)
class kale.embed.video_se_i3d.SEInceptionI3DFlow(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for optical flow input.

forward(x)
kale.embed.video_se_i3d.se_inception_i3d(name, num_channels, num_classes, attention, pretrained=False, progress=True, rgb=True)

Get InceptionI3d module w/o SELayer and pretrained model.

kale.embed.video_se_i3d.se_i3d_joint(rgb_pt, flow_pt, num_classes, attention, pretrained=False, progress=True)

Get I3D models with SELayers for different inputs.

Parameters:
  • rgb_pt (string, optional) – the name of pre-trained model for RGB input.

  • flow_pt (string, optional) – the name of pre-trained model for optical flow input.

  • num_classes (int) – the class number of dataset.

  • attention (string, optional) – the name of the SELayer.

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains models for RGB and optical flow.

Return type:

models (dictionary)

kale.embed.video_se_res3d module

Add SELayers to MC3_18, R3D_18, R2plus1D_18

kale.embed.video_se_res3d.se_r3d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r3d_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_mc3_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_mc3_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r2plus1d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r2plus1d_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r3d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models with SELayers for different inputs.

Parameters:
  • attention (string) – the name of the SELayer.

  • rgb (bool) – choose if RGB model is needed. (Default: False)

  • flow (bool) – choose if optical flow model is needed. (Default: False)

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns:

A dictionary contains models for RGB and optical flow.

Return type:

models (dictionary)

kale.embed.video_se_res3d.se_mc3(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get MC3_18 models with SELayers for different inputs.

kale.embed.video_se_res3d.se_r2plus1d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R2+1D_18 models with SELayers for different inputs.

kale.embed.video_selayer module

Python implementation of Squeeze-and-Excitation Layers (SELayer) Initial implementation: channel-wise (SELayerC) Improved implementation: temporal-wise (SELayerT), convolution-based channel-wise (SELayerCoC), max-pooling-based channel-wise (SELayerMC), multi-pooling-based channel-wise (SELayerMAC)

[Redundancy and repeat of code will be reduced in the future.]

References

Hu Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” In CVPR, pp. 7132-7141. 2018. For initial implementation, please go to https://github.com/hujie-frank/SENet

kale.embed.video_selayer.get_selayer(attention)

Get SELayers referring to attention.

Parameters:

attention (string) – the name of the SELayer. (Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”])

Returns:

the SELayer.

Return type:

se_layer (SELayer, optional)

class kale.embed.video_selayer.SELayer(channel, reduction=16)

Bases: Module

Helper class for SELayer design.

forward(x)
class kale.embed.video_selayer.SELayerC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer.

forward(x)
class kale.embed.video_selayer.SELayerT(channel, reduction=2)

Bases: SELayer

Construct temporal-wise SELayer.

forward(x)
class kale.embed.video_selayer.SELayerCoC(channel, reduction=16)

Bases: SELayer

Construct convolution-based channel-wise SELayer.

forward(x)
class kale.embed.video_selayer.SELayerMC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with max pooling.

forward(x)
class kale.embed.video_selayer.SELayerMAC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with the mix of average pooling and max pooling.

forward(x)

Module contents