Embed

Submodules

kale.embed.attention_cnn module

class kale.embed.attention_cnn.ContextCNNGeneric(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], contextualizer: Module, output_type: str)

Bases: Module

A template to construct a feature extractor consisting of a CNN followed by a sequence-to-sequence contextualizer like a Transformer-Encoder. Before inputting the CNN output tensor to the contextualizer, the tensor’s spatial dimensions are unrolled into a sequence.

Parameters
  • cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width).

  • cnn_output_shape – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).

  • contextualizer – A sequence-to-sequence model that takes inputs of shape (num_timesteps, batch_size, num_features) and uses attention to contextualize the sequence and returns a sequence of the exact same shape. This will mainly be a Transformer-Encoder (required).

  • output_type – One of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is a sequence, will be reshaped to resemble the image-batch shape of the output of the CNN. If Sequence then the output sequence is returned as is (required).

Examples

>>> cnn = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3),
>>>                     nn.Conv2d(32, 64, kernel_size=3),
>>>                     nn.MaxPool2d(2))
>>> cnn_output_shape = (-1, 64, 8, 8)
>>> contextualizer = nn.TransformerEncoderLayer(...)
>>> output_type = 'spatial'
>>>
>>> attention_cnn = ContextCNNGeneric(cnn, cnn_output_shape, contextualizer, output_type)
>>> output = attention_cnn(torch.randn((32,3,16,16)))
>>>
>>> output.size() == cnn_output_shape # True
forward(x: Tensor)

Pass the input through the cnn and then the contextualizer.

Parameters

x – input image batch exactly as for CNNs (required).

training: bool
class kale.embed.attention_cnn.CNNTransformer(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], num_layers: int, num_heads: int, dim_feedforward: int, dropout: float, output_type: str, positional_encoder: Optional[Module] = None)

Bases: ContextCNNGeneric

A feature extractor consisting of a given CNN backbone followed by a standard Transformer-Encoder. See documentation of “ContextCNNGeneric” for more information.

Parameters
  • cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width) (required).

  • cnn_output_shape – a tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).

  • num_layers – number of attention layers in the Transformer-Encoder (required).

  • num_heads – number of attention heads in each transformer block (required).

  • dim_feedforward – number of neurons in the intermediate dense layer of each transformer feedforward block (required).

  • dropout – dropout rate of the transformer layers (required).

  • output_type – one of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is the sequence output of the Transformer-Encoder, will be reshaped to resemble the image-batch shape of the output of the CNN (required).

  • positional_encoder – None or a nn.Module that expects inputs of shape (sequence_length, batch_size, embedding_dim) and returns the same input after adding some positional information to the embeddings. If None, then the default and fixed sin-cos positional encodings of base transformers are applied (optional).

Examples

See pykale/examples/cifar_cnntransformer/model.py

training: bool

kale.embed.factorization module

Python implementation of a tensor factorization algorithm Multilinear Principal Component Analysis (MPCA) and a matrix factorization algorithm Maximum Independence Domain Adaptation (MIDA)

class kale.embed.factorization.MPCA(var_ratio=0.97, max_iter=1, vectorize=False, n_components=None)

Bases: BaseEstimator, TransformerMixin

MPCA implementation compatible with sickit-learn

Parameters
  • var_ratio (float, optional) – Percentage of variance explained (between 0 and 1). Defaults to 0.97.

  • max_iter (int, optional) – Maximum number of iteration. Defaults to 1.

  • vectorize (bool) – Whether return the transformed/projected tensor in vector. Defaults to False.

  • n_components (int) – Number of components to keep. Applies only when vectorize=True. Defaults to None.

proj_mats

A list of transposed projection matrices, shapes (P_1, I_1), …, (P_N, I_N), where P_1, …, P_N are output tensor shape for each sample.

Type

list of arrays

idx_order

The ordering index of projected (and vectorized) features in decreasing variance.

Type

array-like

mean_

Per-feature empirical mean, estimated from the training set, shape (I_1, I_2, …, I_N).

Type

array-like

shape_in

Input tensor shapes, i.e. (I_1, I_2, …, I_N).

Type

tuple

shape_out

Output tensor shapes, i.e. (P_1, P_2, …, P_N).

Type

tuple

Reference:

Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “MPCA: Multilinear Principal Component Analysis of Tensor Objects”, IEEE Transactions on Neural Networks, Vol. 19, No. 1, Page: 18-39, January 2008. For initial Matlab implementation, please go to https://uk.mathworks.com/matlabcentral/fileexchange/26168.

Examples

>>> import numpy as np
>>> from kale.embed.mpca import MPCA
>>> x = np.random.random((40, 20, 25, 20))
>>> x.shape
(40, 20, 25, 20)
>>> mpca = MPCA()
>>> x_projected = mpca.fit_transform(x)
>>> x_projected.shape
(40, 18, 23, 18)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 7452)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 50)
>>> x_rec = mpca.inverse_transform(x_projected)
>>> x_rec.shape
(40, 20, 25, 20)
fit(x, y=None)

Fit the model with input training data x.

Args
x (array-like tensor): Input data, shape (n_samples, I_1, I_2, …, I_N), where n_samples is the number of

samples, I_1, I_2, …, I_N are the dimensions of corresponding mode (1, 2, …, N), respectively.

y (None): Ignored variable.

Returns

self (object). Returns the instance itself.

transform(x)

Perform dimension reduction on x

Parameters

x (array-like tensor) – Data to perform dimension reduction, shape (n_samples, I_1, I_2, …, I_N).

Returns

Projected data in lower dimension, shape (n_samples, P_1, P_2, …, P_N) if self.vectorize==False. If self.vectorize==True, features will be sorted based on their explained variance ratio, shape (n_samples, P_1 * P_2 * … * P_N) if self.n_components is None, and shape (n_samples, n_components) if self.n_component is a valid integer.

Return type

array-like tensor

inverse_transform(x)

Reconstruct projected data to the original shape and add the estimated mean back

Parameters

x (array-like tensor) – Data to be reconstructed, shape (n_samples, P_1, P_2, …, P_N), if self.vectorize == False, where P_1, P_2, …, P_N are the reduced dimensions of corresponding mode (1, 2, …, N), respectively. If self.vectorize == True, shape (n_samples, self.n_components) or shape (n_samples, P_1 * P_2 * … * P_N).

Returns

Reconstructed tensor in original shape, shape (n_samples, I_1, I_2, …, I_N)

Return type

array-like tensor

class kale.embed.factorization.MIDA(n_components, kernel='linear', lambda_=1.0, mu=1.0, eta=1.0, augmentation=False, kernel_params=None)

Bases: BaseEstimator, TransformerMixin

Maximum independence domain adaptation :param n_components: Number of components to keep. :type n_components: int :param kernel: “linear”, “rbf”, or “poly”. Kernel to use for MIDA. Defaults to “linear”. :type kernel: str :param mu: Hyperparameter of the l2 penalty. Defaults to 1.0. :type mu: float :param eta: Hyperparameter of the label dependence. Defaults to 1.0. :type eta: float :param augmentation: Whether using covariates as augment features. Defaults to False. :type augmentation: bool :param kernel_params: Parameters for the kernel. Defaults to None. :type kernel_params: dict or None

References

[1] Yan, K., Kou, L. and Zhang, D., 2018. Learning domain-invariant subspace using domain features and

independence maximization. IEEE transactions on cybernetics, 48(1), pp.288-299.

fit(x, y=None, covariates=None)
Parameters
  • x – array-like. Input data, shape (n_samples, n_features)

  • y – array-like. Labels, shape (nl_samples,)

  • covariates – array-like. Domain co-variates, shape (n_samples, n_co-variates)

Note

Unsupervised MIDA is performed if y is None. Semi-supervised MIDA is performed is y is not None.

fit_transform(x, y=None, covariates=None)
Parameters
  • x – array-like, shape (n_samples, n_features)

  • y – array-like, shape (n_samples,)

  • covariates – array-like, shape (n_samples, n_covariates)

Returns

array-like, shape (n_samples, n_components)

Return type

x_transformed

transform(x, covariates=None)
Parameters
  • x – array-like, shape (n_samples, n_features)

  • covariates – array-like, augmentation features, shape (n_samples, n_covariates)

Returns

array-like, shape (n_samples, n_components)

Return type

x_transformed

kale.embed.gcn module

class kale.embed.gcn.GCNEncoderLayer(in_channels, out_channels, improved=False, cached=False, bias=True, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.GCNConv, which reduces the computational cost of GCN layer for GripNet model. The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” (ICLR 2017) paper.

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},\]

where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix.

Note: For more information please see Pytorch Geomertic’s nn.GCNConv docs.

Parameters
  • in_channels (int) – Size of each input sample.

  • out_channels (int) – Size of each output sample.

  • improved (bool, optional) – If set to True, the layer computes \(\mathbf{\hat{A}}\) as \(\mathbf{A} + 2\mathbf{I}\). (default: False)

  • cached (bool, optional) – If set to True, the layer will cache the computation of \(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}\) on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()
static norm(edge_index, num_nodes, edge_weight, improved=False, dtype=None)

Add self-loops and apply symmetric normalization

forward(x, edge_index, edge_weight=None)
Parameters
  • x (torch.Tensor) – The input node feature embedding.

  • edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].

  • edge_weight (torch.Tensor, optional) – The one-dimensional relation weight for each edge in edge_index (default: None).

class kale.embed.gcn.RGCNEncoderLayer(in_channels, out_channels, num_relations, num_bases, after_relu, bias=False, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.RGCNConv, which reduces the computational and memory cost of RGCN encoder layer for GripNet model. The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper.

\[\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,\]

where \(\mathcal{R}\) denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional torch.long tensor which stores a relation identifier \(\in \{ 0, \ldots, |\mathcal{R}| - 1\}\) for each edge.

Note: For more information please see Pytorch Geomertic’s nn.RGCNConv docs.

Parameters
  • in_channels (int) – Size of each input sample.

  • out_channels (int) – Size of each output sample.

  • num_relations (int) – Number of edge relations.

  • num_bases (int) – Use bases-decoposition regulatization scheme and num_bases denotes the number of bases.

  • after_relu (bool) – Whether input embedding is activated by relu function or not.

  • bias (bool) – If set to False, the layer will not learn an additive bias. (default: False)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()
forward(x, edge_index, edge_type, range_list)
Parameters
  • x (torch.Tensor) – The input node feature embedding.

  • edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].

  • edge_type (torch.Tensor) – The one-dimensional relation type/index for each edge in edge_index.

  • range_list (torch.Tensor) – The index range list of each edge type with shape [num_types, 2].

kale.embed.gripnet module

kale.embed.image_cnn module

CNNs for extracting features from small images of size 32x32 (e.g. MNIST) and regular images of size 224x224 (e.g. ImageNet). The code is based on https://github.com/criteo-research/pytorch-ada/blob/master/adalib/ada/models/modules.py,

which is for domain adaptation.

class kale.embed.image_cnn.SmallCNNFeature(num_channels=3, kernel_size=5)

Bases: Module

A feature extractor for small 32x32 images (e.g. CIFAR, MNIST) that outputs a feature vector of length 128.

Parameters
  • num_channels – the number of input channels (default=3).

  • kernel_size – the size of the convolution kernel (default=5).

Examples::
>>> feature_network = SmallCNNFeature(num_channels)
forward(input_)
output_size()
training: bool
class kale.embed.image_cnn.ResNet18Feature(pretrained=True)

Bases: Module

Modified ResNet18 (without the last layer) feature extractor for regular 224x224 images.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

forward(x)
output_size()
training: bool
class kale.embed.image_cnn.ResNet34Feature(pretrained=True)

Bases: Module

Modified ResNet34 (without the last layer) feature extractor for regular 224x224 images.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

forward(x)
output_size()
training: bool
class kale.embed.image_cnn.ResNet50Feature(pretrained=True)

Bases: Module

Modified ResNet50 (without the last layer) feature extractor for regular 224x224 images.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

forward(x)
output_size()
training: bool
class kale.embed.image_cnn.ResNet101Feature(pretrained=True)

Bases: Module

Modified ResNet101 (without the last layer) feature extractor for regular 224x224 images.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

forward(x)
output_size()
training: bool
class kale.embed.image_cnn.ResNet152Feature(pretrained=True)

Bases: Module

Modified ResNet152 (without the last layer) feature extractor for regular 224x224 images.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

forward(x)
output_size()
training: bool

kale.embed.positional_encoding module

class kale.embed.positional_encoding.PositionalEncoding(d_model: int, max_len: int = 5000)

Bases: Module

Implements the positional encoding as described in the NIPS2017 paper ‘Attention Is All You Need’ about Transformers (https://arxiv.org/abs/1706.03762). Essentially, for all timesteps in a given sequence, adds information about the relative temporal location of a timestep directly into the features of that timestep, and then returns this slightly-modified, same-shape sequence.

Parameters
  • d_model – the number of features that each timestep has (required).

  • max_len – the maximum sequence length that the positional encodings should support (required).

forward(x)

Expects input of shape (sequence_length, batch_size, num_features) and returns output of the same shape. sequence_length is at most allowed to be self.max_len and num_features is expected to be exactly self.d_model

Parameters

x – a sequence input of shape (sequence_length, batch_size, num_features) (required).

training: bool

kale.embed.seq_nn module

DeepDTA based models for drug-target interaction prediction problem.

class kale.embed.seq_nn.CNNEncoder(num_embeddings, embedding_dim, sequence_length, num_kernels, kernel_length)

Bases: Module

The DeepDTA’s CNN encoder module, which comprises three 1D-convolutional layers and one max-pooling layer. The module is applied to encoding drug/target sequence information, and the input should be transformed information with integer/label encoding. The original paper is “DeepDTA: deep drug–target binding affinity prediction” .

Parameters
  • num_embeddings (int) – Number of embedding labels/categories, depends on the types of encoding sequence.

  • embedding_dim (int) – Dimension of embedding labels/categories.

  • sequence_length (int) – Max length of input sequence.

  • num_kernels (int) – Number of kernels (filters).

  • kernel_length (int) – Length of kernel (filter).

forward(x)
training: bool
class kale.embed.seq_nn.GCNEncoder(in_channel=78, out_channel=128, dropout_rate=0.2)

Bases: Module

The GraphDTA’s GCN encoder module, which comprises three graph convolutional layers and one full connected layer. The model is a variant of DeepDTA and is applied to encoding drug molecule graph information. The original paper is “GraphDTA: Predicting drug–target binding affinity with graph neural networks” .

Parameters
  • in_channel (int) – Dimension of each input node feature.

  • out_channel (int) – Dimension of each output node feature.

  • dropout_rate (float) – dropout rate during training.

forward(x, edge_index, batch)
training: bool

kale.embed.video_feature_extractor module

Define the feature extractor for video including I3D, R3D_18, MC3_18 and R2PLUS1D_18 w/o SELayers.

kale.embed.video_feature_extractor.get_video_feat_extractor(model_name, image_modality, attention, num_classes)

Get the feature extractor w/o the pre-trained model and SELayers. The pre-trained models are saved in the path $XDG_CACHE_HOME/torch/hub/checkpoints/. For Linux, default path is ~/.cache/torch/hub/checkpoints/. For Windows, default path is C:/Users/$USER_NAME/.cache/torch/hub/checkpoints/. Provide four pre-trained models: “rgb_imagenet”, “flow_imagenet”, “rgb_charades”, “flow_charades”.

Parameters
  • model_name (string) – The name of the feature extractor. (Choices=[“I3D”, “R3D_18”, “R2PLUS1D_18”, “MC3_18”])

  • image_modality (string) – Image type. (Choices=[“rgb”, “flow”, “joint”])

  • attention (string) – The attention type. (Choices=[“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerCT”, “SELayerTC”, “SELayerMAC”])

  • num_classes (int) – The class number of specific dataset. (Default: No use)

Returns

The network to extract features. class_feature_dim (int): The dimension of the feature network output for ClassNet.

It is a convention when the input dimension and the network is fixed.

domain_feature_dim (int): The dimension of the feature network output for DomainNet.

Return type

feature_network (dictionary)

kale.embed.video_i3d module

Define Inflated 3D ConvNets(I3D) on Action Recognition from https://ieeexplore.ieee.org/document/8099985 Created by Xianyuan Liu from modifying https://github.com/piergiaj/pytorch-i3d/blob/master/pytorch_i3d.py and https://github.com/deepmind/kinetics-i3d/blob/master/i3d.py

class kale.embed.video_i3d.MaxPool3dSamePadding(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)

Bases: MaxPool3d

Construct 3d max pool with same padding. PyTorch does not provide same padding. Same padding means the output size matches input size for stride=1.

compute_pad(dim, s)

Get the zero padding number.

forward(x)

Compute ‘same’ padding. Add zero to the back position first.

kernel_size: Union[int, Tuple[int, int, int]]
stride: Union[int, Tuple[int, int, int]]
padding: Union[int, Tuple[int, int, int]]
dilation: Union[int, Tuple[int, int, int]]
class kale.embed.video_i3d.Unit3D(in_channels, output_channels, kernel_shape=(1, 1, 1), stride=(1, 1, 1), padding=0, activation_fn=<function relu>, use_batch_norm=True, use_bias=False, name='unit_3d')

Bases: Module

Basic unit containing Conv3D + BatchNorm + non-linearity.

compute_pad(dim, s)

Get the zero padding number.

forward(x)

Connects the module to inputs. Dynamically pad based on input size in forward function. :param x: Inputs to the Unit3D component.

Returns

Outputs from the module.

training: bool
class kale.embed.video_i3d.InceptionModule(in_channels, out_channels, name)

Bases: Module

Construct Inception module. Concatenation after four branches (1x1x1 conv; 1x1x1 + 3x3x3 convs; 1x1x1 + 3x3x3 convs; 3x3x3 max-pool + 1x1x1 conv). In forward, we check if SELayers are used, which are channel-wise (SELayerC), temporal-wise (SELayerT), channel-temporal-wise (SELayerTC & SELayerCT).

forward(x)
training: bool
class kale.embed.video_i3d.InceptionI3d(num_classes=400, spatial_squeeze=True, final_endpoint='Logits', name='inception_i3d', in_channels=3, dropout_keep_prob=0.5)

Bases: Module

Inception-v1 I3D architecture. The model is introduced in:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman https://arxiv.org/pdf/1705.07750v1.pdf.

See also the Inception architecture, introduced in:

Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/pdf/1409.4842v1.pdf.

VALID_ENDPOINTS = ('Conv3d_1a_7x7', 'MaxPool3d_2a_3x3', 'Conv3d_2b_1x1', 'Conv3d_2c_3x3', 'MaxPool3d_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'MaxPool3d_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool3d_5a_2x2', 'Mixed_5b', 'Mixed_5c', 'Logits', 'Predictions')
replace_logits(num_classes)

Update the output size with num_classes according to the specific setting.

build()
forward(x)

The output is the result of the final average pooling layer with 1024 dimensions.

extract_features(x)
training: bool
kale.embed.video_i3d.i3d(name, num_channels, num_classes, pretrained=False, progress=True)

Get InceptionI3d module w/o pretrained model.

kale.embed.video_i3d.i3d_joint(rgb_pt, flow_pt, num_classes, pretrained=False, progress=True)

Get I3D models for different inputs.

Parameters
  • rgb_pt (string, optional) – the name of pre-trained model for RGB input.

  • flow_pt (string, optional) – the name of pre-trained model for flow input.

  • num_classes (int) – the class number of dataset.

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains RGB and flow models.

Return type

models (dictionary)

kale.embed.video_res3d module

Define MC3_18, R3D_18, R2plus1D_18 on Action Recognition from https://arxiv.org/abs/1711.11248 Created by Xianyuan Liu from modifying https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py

class kale.embed.video_res3d.Conv3DSimple(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions for R3D (3x3x3 kernel)

static get_downsample_stride(stride)
bias: Optional[Tensor]
out_channels: int
kernel_size: Tuple[int, ...]
stride: Tuple[int, ...]
padding: Union[str, Tuple[int, ...]]
dilation: Tuple[int, ...]
transposed: bool
output_padding: Tuple[int, ...]
groups: int
padding_mode: str
weight: Tensor
class kale.embed.video_res3d.Conv2Plus1D(in_planes, out_planes, midplanes, stride=1, padding=1)

Bases: Sequential

(2+1)D convolutions for R2plus1D (1x3x3 kernel + 3x1x1 kernel)

static get_downsample_stride(stride)
training: bool
class kale.embed.video_res3d.Conv3DNoTemporal(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions without temporal dimension for MCx (1x3x3 kernel)

static get_downsample_stride(stride)
bias: Optional[Tensor]
out_channels: int
kernel_size: Tuple[int, ...]
stride: Tuple[int, ...]
padding: Union[str, Tuple[int, ...]]
dilation: Tuple[int, ...]
transposed: bool
output_padding: Tuple[int, ...]
groups: int
padding_mode: str
weight: Tensor
class kale.embed.video_res3d.BasicBlock(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

Basic ResNet building block. Each block consists of two convolutional layers with a ReLU activation function after each layer and residual connections. In forward, we check if SELayers are used, which are channel-wise (SELayerC) and temporal-wise (SELayerT).

expansion = 1
forward(x)
training: bool
class kale.embed.video_res3d.Bottleneck(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

BottleNeck building block. Default: No use. Each block consists of two 1*n*n and one n*n*n convolutional layers with a ReLU activation function after each layer and residual connections.

expansion = 4
forward(x)
training: bool
class kale.embed.video_res3d.BasicStem

Bases: Sequential

The default conv-batchnorm-relu stem. The first layer normally. (64 3x7x7 kernels)

training: bool
class kale.embed.video_res3d.BasicFLowStem

Bases: Sequential

The default stem for optical flow.

training: bool
class kale.embed.video_res3d.R2Plus1dStem

Bases: Sequential

R(2+1)D stem is different than the default one as it uses separated 3D convolution. (45 1x7x7 kernels + 64 3x1x1 kernel)

training: bool
class kale.embed.video_res3d.R2Plus1dFlowStem

Bases: Sequential

R(2+1)D stem for optical flow.

training: bool
class kale.embed.video_res3d.VideoResNet(block, conv_makers, layers, stem, num_classes=400, zero_init_residual=False)

Bases: Module

replace_fc(num_classes, block=<class 'kale.embed.video_res3d.BasicBlock'>)

Update the output size with num_classes according to the specific setting.

forward(x)
training: bool
kale.embed.video_res3d.r3d_18_rgb(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for RGB as in https://arxiv.org/abs/1711.11248

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R3D-18 network

Return type

nn.Module

kale.embed.video_res3d.r3d_18_flow(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for optical flow.

kale.embed.video_res3d.mc3_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for RGB as in https://arxiv.org/abs/1711.11248

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns

MC3 Network definition

Return type

nn.Module

kale.embed.video_res3d.mc3_18_flow(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for optical flow.

kale.embed.video_res3d.r2plus1d_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for RGB as in https://arxiv.org/abs/1711.11248

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

  • progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R(2+1)D-18 network

Return type

nn.Module

kale.embed.video_res3d.r2plus1d_18_flow(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for optical flow.

kale.embed.video_res3d.r3d(rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models.

kale.embed.video_res3d.mc3(rgb=False, flow=False, pretrained=False, progress=True)

Get MC3_18 models.

kale.embed.video_res3d.r2plus1d(rgb=False, flow=False, pretrained=False, progress=True)

Get R2PLUS1D_18 models.

kale.embed.video_selayer module

Python implementation of Squeeze-and-Excitation Layers (SELayer) Initial implementation: channel-wise (SELayerC) Improved implementation: temporal-wise (SELayerT), convolution-based channel-wise (SELayerCoC), max-pooling-based channel-wise (SELayerMC), multi-pooling-based channel-wise (SELayerMAC)

[Redundancy and repeat of code will be reduced in the future.]

References

Hu Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” In CVPR, pp. 7132-7141. 2018. For initial implementation, please go to https://github.com/hujie-frank/SENet

kale.embed.video_selayer.get_selayer(attention)

Get SELayers referring to attention.

Parameters

attention (string) – the name of the SELayer. (Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”])

Returns

the SELayer.

Return type

se_layer (SELayer, optional)

class kale.embed.video_selayer.SELayer(channel, reduction=16)

Bases: Module

Helper class for SELayer design.

forward(x)
training: bool
class kale.embed.video_selayer.SELayerC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer.

forward(x)
training: bool
class kale.embed.video_selayer.SELayerT(channel, reduction=2)

Bases: SELayer

Construct temporal-wise SELayer.

forward(x)
training: bool
class kale.embed.video_selayer.SELayerCoC(channel, reduction=16)

Bases: SELayer

Construct convolution-based channel-wise SELayer.

forward(x)
training: bool
class kale.embed.video_selayer.SELayerMC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with max pooling.

forward(x)
training: bool
class kale.embed.video_selayer.SELayerMAC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with the mix of average pooling and max pooling.

forward(x)
training: bool

kale.embed.video_se_i3d module

Add SELayers to I3D

class kale.embed.video_se_i3d.SEInceptionI3DRGB(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for RGB input. :param num_channels: the channel number of the input. :type num_channels: int :param num_classes: the class number of dataset. :type num_classes: int :param attention: the name of the SELayer.

(Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”, “SELayerCT” and “SELayerTC”])

Returns

I3D model with SELayers.

Return type

model (VideoResNet)

forward(x)
training: bool
class kale.embed.video_se_i3d.SEInceptionI3DFlow(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for optical flow input.

forward(x)
training: bool
kale.embed.video_se_i3d.se_inception_i3d(name, num_channels, num_classes, attention, pretrained=False, progress=True, rgb=True)

Get InceptionI3d module w/o SELayer and pretrained model.

kale.embed.video_se_i3d.se_i3d_joint(rgb_pt, flow_pt, num_classes, attention, pretrained=False, progress=True)

Get I3D models with SELayers for different inputs.

Parameters
  • rgb_pt (string, optional) – the name of pre-trained model for RGB input.

  • flow_pt (string, optional) – the name of pre-trained model for optical flow input.

  • num_classes (int) – the class number of dataset.

  • attention (string, optional) – the name of the SELayer.

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains models for RGB and optical flow.

Return type

models (dictionary)

kale.embed.video_se_res3d module

Add SELayers to MC3_18, R3D_18, R2plus1D_18

kale.embed.video_se_res3d.se_r3d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r3d_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_mc3_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_mc3_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r2plus1d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r2plus1d_18_flow(attention, pretrained=False, progress=True, **kwargs)
kale.embed.video_se_res3d.se_r3d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models with SELayers for different inputs.

Parameters
  • attention (string) – the name of the SELayer.

  • rgb (bool) – choose if RGB model is needed. (Default: False)

  • flow (bool) – choose if optical flow model is needed. (Default: False)

  • pretrained (bool) – choose if pretrained parameters are used. (Default: False)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains models for RGB and optical flow.

Return type

models (dictionary)

kale.embed.video_se_res3d.se_mc3(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get MC3_18 models with SELayers for different inputs.

kale.embed.video_se_res3d.se_r2plus1d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R2+1D_18 models with SELayers for different inputs.

Module contents