Embed

Submodules

kale.embed.attention_cnn module

class kale.embed.attention_cnn.ContextCNNGeneric(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], contextualizer: Module, output_type: str)

Bases: Module

A template to construct a feature extractor consisting of a CNN followed by a sequence-to-sequence contextualizer like a Transformer-Encoder. Before inputting the CNN output tensor to the contextualizer, the tensor’s spatial dimensions are unrolled into a sequence.

Parameters

cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width).
cnn_output_shape – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
contextualizer – A sequence-to-sequence model that takes inputs of shape (num_timesteps, batch_size, num_features) and uses attention to contextualize the sequence and returns a sequence of the exact same shape. This will mainly be a Transformer-Encoder (required).
output_type – One of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is a sequence, will be reshaped to resemble the image-batch shape of the output of the CNN. If Sequence then the output sequence is returned as is (required).

Examples

>>> cnn = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3),
>>>                     nn.Conv2d(32, 64, kernel_size=3),
>>>                     nn.MaxPool2d(2))
>>> cnn_output_shape = (-1, 64, 8, 8)
>>> contextualizer = nn.TransformerEncoderLayer(...)
>>> output_type = 'spatial'
>>>
>>> attention_cnn = ContextCNNGeneric(cnn, cnn_output_shape, contextualizer, output_type)
>>> output = attention_cnn(torch.randn((32,3,16,16)))
>>>
>>> output.size() == cnn_output_shape # True

forward(x: Tensor)

Pass the input through the cnn and then the contextualizer.

Parameters: x – input image batch exactly as for CNNs (required).

training: bool

class kale.embed.attention_cnn.CNNTransformer(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], num_layers: int, num_heads: int, dim_feedforward: int, dropout: float, output_type: str, positional_encoder: Optional[Module] = None)

Bases: ContextCNNGeneric

A feature extractor consisting of a given CNN backbone followed by a standard Transformer-Encoder. See documentation of “ContextCNNGeneric” for more information.

Parameters

cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width) (required).
cnn_output_shape – a tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
num_layers – number of attention layers in the Transformer-Encoder (required).
num_heads – number of attention heads in each transformer block (required).
dim_feedforward – number of neurons in the intermediate dense layer of each transformer feedforward block (required).
dropout – dropout rate of the transformer layers (required).
output_type – one of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is the sequence output of the Transformer-Encoder, will be reshaped to resemble the image-batch shape of the output of the CNN (required).
positional_encoder – None or a nn.Module that expects inputs of shape (sequence_length, batch_size, embedding_dim) and returns the same input after adding some positional information to the embeddings. If None, then the default and fixed sin-cos positional encodings of base transformers are applied (optional).

Examples

See pykale/examples/cifar_cnntransformer/model.py

training: bool

kale.embed.factorization module

Python implementation of a tensor factorization algorithm Multilinear Principal Component Analysis (MPCA) and a matrix factorization algorithm Maximum Independence Domain Adaptation (MIDA）

class kale.embed.factorization.MPCA(var_ratio=0.97, max_iter=1, vectorize=False, n_components=None)

Bases: BaseEstimator, TransformerMixin

MPCA implementation compatible with sickit-learn

Parameters

var_ratio (float, optional) – Percentage of variance explained (between 0 and 1). Defaults to 0.97.
max_iter (int, optional) – Maximum number of iteration. Defaults to 1.
vectorize (bool) – Whether return the transformed/projected tensor in vector. Defaults to False.
n_components (int) – Number of components to keep. Applies only when vectorize=True. Defaults to None.

proj_mats

A list of transposed projection matrices, shapes (P_1, I_1), …, (P_N, I_N), where P_1, …, P_N are output tensor shape for each sample.

Type: list of arrays

idx_order

The ordering index of projected (and vectorized) features in decreasing variance.

Type: array-like

mean_

Per-feature empirical mean, estimated from the training set, shape (I_1, I_2, …, I_N).

Type: array-like

shape_in

Input tensor shapes, i.e. (I_1, I_2, …, I_N).

Type: tuple

shape_out

Output tensor shapes, i.e. (P_1, P_2, …, P_N).

Type: tuple

Reference:: Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “MPCA: Multilinear Principal Component Analysis of Tensor Objects”, IEEE Transactions on Neural Networks, Vol. 19, No. 1, Page: 18-39, January 2008. For initial Matlab implementation, please go to https://uk.mathworks.com/matlabcentral/fileexchange/26168.

Examples

>>> import numpy as np
>>> from kale.embed.mpca import MPCA
>>> x = np.random.random((40, 20, 25, 20))
>>> x.shape
(40, 20, 25, 20)
>>> mpca = MPCA()
>>> x_projected = mpca.fit_transform(x)
>>> x_projected.shape
(40, 18, 23, 18)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 7452)
>>> x_projected = mpca.transform(x)
>>> x_projected.shape
(40, 50)
>>> x_rec = mpca.inverse_transform(x_projected)
>>> x_rec.shape
(40, 20, 25, 20)

fit(x, y=None)

Fit the model with input training data x.

Args

x (array-like tensor): Input data, shape (n_samples, I_1, I_2, …, I_N), where n_samples is the number of: samples, I_1, I_2, …, I_N are the dimensions of corresponding mode (1, 2, …, N), respectively.

y (None): Ignored variable.

Returns: self (object). Returns the instance itself.

transform(x)

Perform dimension reduction on x

Parameters: x (array-like tensor) – Data to perform dimension reduction, shape (n_samples, I_1, I_2, …, I_N).
Returns: Projected data in lower dimension, shape (n_samples, P_1, P_2, …, P_N) if self.vectorize==False. If self.vectorize==True, features will be sorted based on their explained variance ratio, shape (n_samples, P_1 * P_2 * … * P_N) if self.n_components is None, and shape (n_samples, n_components) if self.n_component is a valid integer.
Return type: array-like tensor

inverse_transform(x)

Reconstruct projected data to the original shape and add the estimated mean back

Parameters: x (array-like tensor) – Data to be reconstructed, shape (n_samples, P_1, P_2, …, P_N), if self.vectorize == False, where P_1, P_2, …, P_N are the reduced dimensions of corresponding mode (1, 2, …, N), respectively. If self.vectorize == True, shape (n_samples, self.n_components) or shape (n_samples, P_1 * P_2 * … * P_N).
Returns: Reconstructed tensor in original shape, shape (n_samples, I_1, I_2, …, I_N)
Return type: array-like tensor

class kale.embed.factorization.MIDA(n_components, kernel='linear', lambda_=1.0, mu=1.0, eta=1.0, augmentation=False, kernel_params=None)

Bases: BaseEstimator, TransformerMixin

Maximum independence domain adaptation :param n_components: Number of components to keep. :type n_components: int :param kernel: “linear”, “rbf”, or “poly”. Kernel to use for MIDA. Defaults to “linear”. :type kernel: str :param mu: Hyperparameter of the l2 penalty. Defaults to 1.0. :type mu: float :param eta: Hyperparameter of the label dependence. Defaults to 1.0. :type eta: float :param augmentation: Whether using covariates as augment features. Defaults to False. :type augmentation: bool :param kernel_params: Parameters for the kernel. Defaults to None. :type kernel_params: dict or None

References

[1] Yan, K., Kou, L. and Zhang, D., 2018. Learning domain-invariant subspace using domain features and: independence maximization. IEEE transactions on cybernetics, 48(1), pp.288-299.

fit(x, y=None, covariates=None)

Parameters

x – array-like. Input data, shape (n_samples, n_features)
y – array-like. Labels, shape (nl_samples,)
covariates – array-like. Domain co-variates, shape (n_samples, n_co-variates)

Note

Unsupervised MIDA is performed if y is None. Semi-supervised MIDA is performed is y is not None.

fit_transform(x, y=None, covariates=None)

Parameters

x – array-like, shape (n_samples, n_features)
y – array-like, shape (n_samples,)
covariates – array-like, shape (n_samples, n_covariates)

Returns

array-like, shape (n_samples, n_components)

Return type

x_transformed

transform(x, covariates=None)

Parameters

x – array-like, shape (n_samples, n_features)
covariates – array-like, augmentation features, shape (n_samples, n_covariates)

Returns

array-like, shape (n_samples, n_components)

Return type

x_transformed

kale.embed.gcn module

class kale.embed.gcn.GCNEncoderLayer(in_channels, out_channels, improved=False, cached=False, bias=True, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.GCNConv, which reduces the computational cost of GCN layer for GripNet model. The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” (ICLR 2017) paper.

\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},\]

where $\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$ denotes the adjacency matrix with inserted self-loops and $\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$ its diagonal degree matrix.

Note: For more information please see Pytorch Geomertic’s nn.GCNConv docs.

Parameters

in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
improved (bool, optional) – If set to True, the layer computes $\mathbf{\hat{A}}$ as $\mathbf{A} + 2\mathbf{I}$. (default: False)
cached (bool, optional) – If set to True, the layer will cache the computation of $\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()

static norm(edge_index, num_nodes, edge_weight, improved=False, dtype=None): Add self-loops and apply symmetric normalization

forward(x, edge_index, edge_weight=None)

Parameters

x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_weight (torch.Tensor, optional) – The one-dimensional relation weight for each edge in edge_index (default: None).

class kale.embed.gcn.RGCNEncoderLayer(in_channels, out_channels, num_relations, num_bases, after_relu, bias=False, **kwargs)

Bases: MessagePassing

Modification of PyTorch Geometirc’s nn.RGCNConv, which reduces the computational and memory cost of RGCN encoder layer for GripNet model. The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper.

\[\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,\]

where $\mathcal{R}$ denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional torch.long tensor which stores a relation identifier $\in \{ 0, \ldots, |\mathcal{R}| - 1\}$ for each edge.

Note: For more information please see Pytorch Geomertic’s nn.RGCNConv docs.

Parameters

in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
num_relations (int) – Number of edge relations.
num_bases (int) – Use bases-decoposition regulatization scheme and num_bases denotes the number of bases.
after_relu (bool) – Whether input embedding is activated by relu function or not.
bias (bool) – If set to False, the layer will not learn an additive bias. (default: False)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()

forward(x, edge_index, edge_type, range_list)

Parameters

x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_type (torch.Tensor) – The one-dimensional relation type/index for each edge in edge_index.
range_list (torch.Tensor) – The index range list of each edge type with shape [num_types, 2].

kale.embed.gripnet module

kale.embed.image_cnn module

CNNs for extracting features from small images of size 32x32 (e.g. MNIST) and regular images of size 224x224 (e.g. ImageNet). The code is based on https://github.com/criteo-research/pytorch-ada/blob/master/adalib/ada/models/modules.py,

which is for domain adaptation.

class kale.embed.image_cnn.SmallCNNFeature(num_channels=3, kernel_size=5)

Bases: Module

A feature extractor for small 32x32 images (e.g. CIFAR, MNIST) that outputs a feature vector of length 128.

Parameters

num_channels – the number of input channels (default=3).
kernel_size – the size of the convolution kernel (default=5).

Examples::

>>> feature_network = SmallCNNFeature(num_channels)

forward(input_)

output_size()

training: bool

class kale.embed.image_cnn.ResNet18Feature(pretrained=True)

Bases: Module

Modified ResNet18 (without the last layer) feature extractor for regular 224x224 images.

Parameters: pretrained (bool) – If True, returns a model pre-trained on ImageNet

Note

Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py

forward(x)

output_size()

training: bool

class kale.embed.image_cnn.ResNet34Feature(pretrained=True)

Bases: Module

Modified ResNet34 (without the last layer) feature extractor for regular 224x224 images.

Parameters: pretrained (bool) – If True, returns a model pre-trained on ImageNet

Note

Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py

forward(x)

output_size()

training: bool

class kale.embed.image_cnn.ResNet50Feature(pretrained=True)

Bases: Module

Modified ResNet50 (without the last layer) feature extractor for regular 224x224 images.

Parameters: pretrained (bool) – If True, returns a model pre-trained on ImageNet

Note

Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py

forward(x)

output_size()

training: bool

class kale.embed.image_cnn.ResNet101Feature(pretrained=True)

Bases: Module

Modified ResNet101 (without the last layer) feature extractor for regular 224x224 images.

Parameters: pretrained (bool) – If True, returns a model pre-trained on ImageNet

Note

Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py

forward(x)

output_size()

training: bool

class kale.embed.image_cnn.ResNet152Feature(pretrained=True)

Bases: Module

Modified ResNet152 (without the last layer) feature extractor for regular 224x224 images.

Parameters: pretrained (bool) – If True, returns a model pre-trained on ImageNet

Note

Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py

forward(x)

output_size()

training: bool

kale.embed.positional_encoding module

class kale.embed.positional_encoding.PositionalEncoding(d_model: int, max_len: int = 5000)

Bases: Module

Implements the positional encoding as described in the NIPS2017 paper ‘Attention Is All You Need’ about Transformers (https://arxiv.org/abs/1706.03762). Essentially, for all timesteps in a given sequence, adds information about the relative temporal location of a timestep directly into the features of that timestep, and then returns this slightly-modified, same-shape sequence.

Parameters

d_model – the number of features that each timestep has (required).
max_len – the maximum sequence length that the positional encodings should support (required).

forward(x)

Expects input of shape (sequence_length, batch_size, num_features) and returns output of the same shape. sequence_length is at most allowed to be self.max_len and num_features is expected to be exactly self.d_model

Parameters: x – a sequence input of shape (sequence_length, batch_size, num_features) (required).

training: bool

kale.embed.seq_nn module

DeepDTA based models for drug-target interaction prediction problem.

class kale.embed.seq_nn.CNNEncoder(num_embeddings, embedding_dim, sequence_length, num_kernels, kernel_length)

Bases: Module

The DeepDTA’s CNN encoder module, which comprises three 1D-convolutional layers and one max-pooling layer. The module is applied to encoding drug/target sequence information, and the input should be transformed information with integer/label encoding. The original paper is “DeepDTA: deep drug–target binding affinity prediction” .

Parameters

num_embeddings (int) – Number of embedding labels/categories, depends on the types of encoding sequence.
embedding_dim (int) – Dimension of embedding labels/categories.
sequence_length (int) – Max length of input sequence.
num_kernels (int) – Number of kernels (filters).
kernel_length (int) – Length of kernel (filter).

forward(x)

training: bool

class kale.embed.seq_nn.GCNEncoder(in_channel=78, out_channel=128, dropout_rate=0.2)

Bases: Module

The GraphDTA’s GCN encoder module, which comprises three graph convolutional layers and one full connected layer. The model is a variant of DeepDTA and is applied to encoding drug molecule graph information. The original paper is “GraphDTA: Predicting drug–target binding affinity with graph neural networks” .

Parameters

in_channel (int) – Dimension of each input node feature.
out_channel (int) – Dimension of each output node feature.
dropout_rate (float) – dropout rate during training.

forward(x, edge_index, batch)

training: bool

kale.embed.video_feature_extractor module

Define the feature extractor for video including I3D, R3D_18, MC3_18 and R2PLUS1D_18 w/o SELayers.

kale.embed.video_feature_extractor.get_video_feat_extractor(model_name, image_modality, attention, num_classes)

Get the feature extractor w/o the pre-trained model and SELayers. The pre-trained models are saved in the path $XDG_CACHE_HOME/torch/hub/checkpoints/. For Linux, default path is ~/.cache/torch/hub/checkpoints/. For Windows, default path is C:/Users/$USER_NAME/.cache/torch/hub/checkpoints/. Provide four pre-trained models: “rgb_imagenet”, “flow_imagenet”, “rgb_charades”, “flow_charades”.

Parameters

model_name (string) – The name of the feature extractor. (Choices=[“I3D”, “R3D_18”, “R2PLUS1D_18”, “MC3_18”])
image_modality (string) – Image type. (Choices=[“rgb”, “flow”, “joint”])
attention (string) – The attention type. (Choices=[“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerCT”, “SELayerTC”, “SELayerMAC”])
num_classes (int) – The class number of specific dataset. (Default: No use)

Returns

The network to extract features. class_feature_dim (int): The dimension of the feature network output for ClassNet.

It is a convention when the input dimension and the network is fixed.

domain_feature_dim (int): The dimension of the feature network output for DomainNet.

Return type

feature_network (dictionary)

kale.embed.video_i3d module

Define Inflated 3D ConvNets(I3D) on Action Recognition from https://ieeexplore.ieee.org/document/8099985 Created by Xianyuan Liu from modifying https://github.com/piergiaj/pytorch-i3d/blob/master/pytorch_i3d.py and https://github.com/deepmind/kinetics-i3d/blob/master/i3d.py

class kale.embed.video_i3d.MaxPool3dSamePadding(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)

Bases: MaxPool3d

Construct 3d max pool with same padding. PyTorch does not provide same padding. Same padding means the output size matches input size for stride=1.

compute_pad(dim, s): Get the zero padding number.

forward(x): Compute ‘same’ padding. Add zero to the back position first.

kernel_size: Union[int, Tuple[int, int, int]]

stride: Union[int, Tuple[int, int, int]]

padding: Union[int, Tuple[int, int, int]]

dilation: Union[int, Tuple[int, int, int]]

class kale.embed.video_i3d.Unit3D(in_channels, output_channels, kernel_shape=(1, 1, 1), stride=(1, 1, 1), padding=0, activation_fn=<function relu>, use_batch_norm=True, use_bias=False, name='unit_3d')

Bases: Module

Basic unit containing Conv3D + BatchNorm + non-linearity.

compute_pad(dim, s): Get the zero padding number.

forward(x)

Connects the module to inputs. Dynamically pad based on input size in forward function. :param x: Inputs to the Unit3D component.

Returns: Outputs from the module.

training: bool

class kale.embed.video_i3d.InceptionModule(in_channels, out_channels, name)

Bases: Module

Construct Inception module. Concatenation after four branches (1x1x1 conv; 1x1x1 + 3x3x3 convs; 1x1x1 + 3x3x3 convs; 3x3x3 max-pool + 1x1x1 conv). In forward, we check if SELayers are used, which are channel-wise (SELayerC), temporal-wise (SELayerT), channel-temporal-wise (SELayerTC & SELayerCT).

forward(x)

training: bool

class kale.embed.video_i3d.InceptionI3d(num_classes=400, spatial_squeeze=True, final_endpoint='Logits', name='inception_i3d', in_channels=3, dropout_keep_prob=0.5)

Bases: Module

Inception-v1 I3D architecture. The model is introduced in:

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman https://arxiv.org/pdf/1705.07750v1.pdf.

See also the Inception architecture, introduced in:: Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/pdf/1409.4842v1.pdf.

VALID_ENDPOINTS = ('Conv3d_1a_7x7', 'MaxPool3d_2a_3x3', 'Conv3d_2b_1x1', 'Conv3d_2c_3x3', 'MaxPool3d_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'MaxPool3d_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool3d_5a_2x2', 'Mixed_5b', 'Mixed_5c', 'Logits', 'Predictions')

replace_logits(num_classes): Update the output size with num_classes according to the specific setting.

build()

forward(x): The output is the result of the final average pooling layer with 1024 dimensions.

extract_features(x)

training: bool

kale.embed.video_i3d.i3d(name, num_channels, num_classes, pretrained=False, progress=True): Get InceptionI3d module w/o pretrained model.

kale.embed.video_i3d.i3d_joint(rgb_pt, flow_pt, num_classes, pretrained=False, progress=True)

Get I3D models for different inputs.

Parameters

rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for flow input.
num_classes (int) – the class number of dataset.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains RGB and flow models.

Return type

models (dictionary)

kale.embed.video_res3d module

Define MC3_18, R3D_18, R2plus1D_18 on Action Recognition from https://arxiv.org/abs/1711.11248 Created by Xianyuan Liu from modifying https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py

class kale.embed.video_res3d.Conv3DSimple(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions for R3D (3x3x3 kernel)

static get_downsample_stride(stride)

bias: Optional[Tensor]

out_channels: int

kernel_size: Tuple[int, ...]

stride: Tuple[int, ...]

padding: Union[str, Tuple[int, ...]]

dilation: Tuple[int, ...]

transposed: bool

output_padding: Tuple[int, ...]

groups: int

padding_mode: str

weight: Tensor

class kale.embed.video_res3d.Conv2Plus1D(in_planes, out_planes, midplanes, stride=1, padding=1)

Bases: Sequential

(2+1)D convolutions for R2plus1D (1x3x3 kernel + 3x1x1 kernel)

static get_downsample_stride(stride)

training: bool

class kale.embed.video_res3d.Conv3DNoTemporal(in_planes, out_planes, midplanes=None, stride=1, padding=1)

Bases: Conv3d

3D convolutions without temporal dimension for MCx (1x3x3 kernel)

static get_downsample_stride(stride)

bias: Optional[Tensor]

out_channels: int

kernel_size: Tuple[int, ...]

stride: Tuple[int, ...]

padding: Union[str, Tuple[int, ...]]

dilation: Tuple[int, ...]

transposed: bool

output_padding: Tuple[int, ...]

groups: int

padding_mode: str

weight: Tensor

class kale.embed.video_res3d.BasicBlock(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

Basic ResNet building block. Each block consists of two convolutional layers with a ReLU activation function after each layer and residual connections. In forward, we check if SELayers are used, which are channel-wise (SELayerC) and temporal-wise (SELayerT).

expansion = 1

forward(x)

training: bool

class kale.embed.video_res3d.Bottleneck(inplanes, planes, conv_builder, stride=1, downsample=None)

Bases: Module

BottleNeck building block. Default: No use. Each block consists of two 1*n*n and one n*n*n convolutional layers with a ReLU activation function after each layer and residual connections.

expansion = 4

forward(x)

training: bool

class kale.embed.video_res3d.BasicStem

Bases: Sequential

The default conv-batchnorm-relu stem. The first layer normally. (64 3x7x7 kernels)

training: bool

class kale.embed.video_res3d.BasicFLowStem

Bases: Sequential

The default stem for optical flow.

training: bool

class kale.embed.video_res3d.R2Plus1dStem

Bases: Sequential

R(2+1)D stem is different than the default one as it uses separated 3D convolution. (45 1x7x7 kernels + 64 3x1x1 kernel)

training: bool

class kale.embed.video_res3d.R2Plus1dFlowStem

Bases: Sequential

R(2+1)D stem for optical flow.

training: bool

class kale.embed.video_res3d.VideoResNet(block, conv_makers, layers, stem, num_classes=400, zero_init_residual=False)

Bases: Module

replace_fc(num_classes, block=<class 'kale.embed.video_res3d.BasicBlock'>): Update the output size with num_classes according to the specific setting.

forward(x)

training: bool

kale.embed.video_res3d.r3d_18_rgb(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model for RGB as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R3D-18 network

Return type

nn.Module

kale.embed.video_res3d.r3d_18_flow(pretrained=False, progress=True, **kwargs): Construct 18 layer Resnet3D model for optical flow.

kale.embed.video_res3d.mc3_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network for RGB as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns

MC3 Network definition

Return type

nn.Module

kale.embed.video_res3d.mc3_18_flow(pretrained=False, progress=True, **kwargs): Constructor for 18 layer Mixed Convolution network for optical flow.

kale.embed.video_res3d.r2plus1d_18_rgb(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network for RGB as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R(2+1)D-18 network

Return type

nn.Module

kale.embed.video_res3d.r2plus1d_18_flow(pretrained=False, progress=True, **kwargs): Constructor for the 18 layer deep R(2+1)D network for optical flow.

kale.embed.video_res3d.r3d(rgb=False, flow=False, pretrained=False, progress=True): Get R3D_18 models.

kale.embed.video_res3d.mc3(rgb=False, flow=False, pretrained=False, progress=True): Get MC3_18 models.

kale.embed.video_res3d.r2plus1d(rgb=False, flow=False, pretrained=False, progress=True): Get R2PLUS1D_18 models.

kale.embed.video_selayer module

Python implementation of Squeeze-and-Excitation Layers (SELayer) Initial implementation: channel-wise (SELayerC) Improved implementation: temporal-wise (SELayerT), convolution-based channel-wise (SELayerCoC), max-pooling-based channel-wise (SELayerMC), multi-pooling-based channel-wise (SELayerMAC)

[Redundancy and repeat of code will be reduced in the future.]

References

Hu Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” In CVPR, pp. 7132-7141. 2018. For initial implementation, please go to https://github.com/hujie-frank/SENet

kale.embed.video_selayer.get_selayer(attention)

Get SELayers referring to attention.

Parameters: attention (string) – the name of the SELayer. (Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”])
Returns: the SELayer.
Return type: se_layer (SELayer, optional)

class kale.embed.video_selayer.SELayer(channel, reduction=16)

Bases: Module

Helper class for SELayer design.

forward(x)

training: bool

class kale.embed.video_selayer.SELayerC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer.

forward(x)

training: bool

class kale.embed.video_selayer.SELayerT(channel, reduction=2)

Bases: SELayer

Construct temporal-wise SELayer.

forward(x)

training: bool

class kale.embed.video_selayer.SELayerCoC(channel, reduction=16)

Bases: SELayer

Construct convolution-based channel-wise SELayer.

forward(x)

training: bool

class kale.embed.video_selayer.SELayerMC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with max pooling.

forward(x)

training: bool

class kale.embed.video_selayer.SELayerMAC(channel, reduction=16)

Bases: SELayer

Construct channel-wise SELayer with the mix of average pooling and max pooling.

forward(x)

training: bool

kale.embed.video_se_i3d module

Add SELayers to I3D

class kale.embed.video_se_i3d.SEInceptionI3DRGB(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for RGB input. :param num_channels: the channel number of the input. :type num_channels: int :param num_classes: the class number of dataset. :type num_classes: int :param attention: the name of the SELayer.

(Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”, “SELayerCT” and “SELayerTC”])

Returns: I3D model with SELayers.
Return type: model (VideoResNet)

forward(x)

training: bool

class kale.embed.video_se_i3d.SEInceptionI3DFlow(num_channels, num_classes, attention)

Bases: Module

Add the several SELayers to I3D for optical flow input.

forward(x)

training: bool

kale.embed.video_se_i3d.se_inception_i3d(name, num_channels, num_classes, attention, pretrained=False, progress=True, rgb=True): Get InceptionI3d module w/o SELayer and pretrained model.

kale.embed.video_se_i3d.se_i3d_joint(rgb_pt, flow_pt, num_classes, attention, pretrained=False, progress=True)

Get I3D models with SELayers for different inputs.

Parameters

rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for optical flow input.
num_classes (int) – the class number of dataset.
attention (string, optional) – the name of the SELayer.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains models for RGB and optical flow.

Return type

models (dictionary)

kale.embed.video_se_res3d module

Add SELayers to MC3_18, R3D_18, R2plus1D_18

kale.embed.video_se_res3d.se_r3d_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_r3d_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_mc3_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_mc3_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_r2plus1d_18_rgb(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_r2plus1d_18_flow(attention, pretrained=False, progress=True, **kwargs)

kale.embed.video_se_res3d.se_r3d(attention, rgb=False, flow=False, pretrained=False, progress=True)

Get R3D_18 models with SELayers for different inputs.

Parameters

attention (string) – the name of the SELayer.
rgb (bool) – choose if RGB model is needed. (Default: False)
flow (bool) – choose if optical flow model is needed. (Default: False)
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)

Returns

A dictionary contains models for RGB and optical flow.

Return type

models (dictionary)

kale.embed.video_se_res3d.se_mc3(attention, rgb=False, flow=False, pretrained=False, progress=True): Get MC3_18 models with SELayers for different inputs.

kale.embed.video_se_res3d.se_r2plus1d(attention, rgb=False, flow=False, pretrained=False, progress=True): Get R2+1D_18 models with SELayers for different inputs.

Embed

Submodules

kale.embed.attention_cnn module

kale.embed.factorization module

kale.embed.gcn module

kale.embed.gripnet module

kale.embed.image_cnn module

kale.embed.positional_encoding module

kale.embed.seq_nn module

kale.embed.video_feature_extractor module

kale.embed.video_i3d module

kale.embed.video_res3d module

kale.embed.video_selayer module

kale.embed.video_se_i3d module

kale.embed.video_se_res3d module

Module contents