Embed
Submodules
kale.embed.attention_cnn module
- class kale.embed.attention_cnn.ContextCNNGeneric(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], contextualizer: Module, output_type: str)
Bases:
Module
A template to construct a feature extractor consisting of a CNN followed by a sequence-to-sequence contextualizer like a Transformer-Encoder. Before inputting the CNN output tensor to the contextualizer, the tensor’s spatial dimensions are unrolled into a sequence.
- Parameters
cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width).
cnn_output_shape – A tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
contextualizer – A sequence-to-sequence model that takes inputs of shape (num_timesteps, batch_size, num_features) and uses attention to contextualize the sequence and returns a sequence of the exact same shape. This will mainly be a Transformer-Encoder (required).
output_type – One of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is a sequence, will be reshaped to resemble the image-batch shape of the output of the CNN. If Sequence then the output sequence is returned as is (required).
Examples
>>> cnn = nn.Sequential(nn.Conv2d(3, 32, kernel_size=3), >>> nn.Conv2d(32, 64, kernel_size=3), >>> nn.MaxPool2d(2)) >>> cnn_output_shape = (-1, 64, 8, 8) >>> contextualizer = nn.TransformerEncoderLayer(...) >>> output_type = 'spatial' >>> >>> attention_cnn = ContextCNNGeneric(cnn, cnn_output_shape, contextualizer, output_type) >>> output = attention_cnn(torch.randn((32,3,16,16))) >>> >>> output.size() == cnn_output_shape # True
- forward(x: Tensor)
Pass the input through the cnn and then the contextualizer.
- Parameters
x – input image batch exactly as for CNNs (required).
- training: bool
- class kale.embed.attention_cnn.CNNTransformer(cnn: Module, cnn_output_shape: Tuple[int, int, int, int], num_layers: int, num_heads: int, dim_feedforward: int, dropout: float, output_type: str, positional_encoder: Optional[Module] = None)
Bases:
ContextCNNGeneric
A feature extractor consisting of a given CNN backbone followed by a standard Transformer-Encoder. See documentation of “ContextCNNGeneric” for more information.
- Parameters
cnn – any convolutional neural network that takes in batches of images of shape (batch_size, channels, height, width) and outputs tensor representations of shape (batch_size, out_channels, out_height, out_width) (required).
cnn_output_shape – a tuple of shape (batch_size, num_channels, height, width) describing the output shape of the given CNN (required).
num_layers – number of attention layers in the Transformer-Encoder (required).
num_heads – number of attention heads in each transformer block (required).
dim_feedforward – number of neurons in the intermediate dense layer of each transformer feedforward block (required).
dropout – dropout rate of the transformer layers (required).
output_type – one of ‘sequence’ or ‘spatial’. If Spatial then the final output of the model, which is the sequence output of the Transformer-Encoder, will be reshaped to resemble the image-batch shape of the output of the CNN (required).
positional_encoder – None or a nn.Module that expects inputs of shape (sequence_length, batch_size, embedding_dim) and returns the same input after adding some positional information to the embeddings. If None, then the default and fixed sin-cos positional encodings of base transformers are applied (optional).
Examples
See pykale/examples/cifar_cnntransformer/model.py
- training: bool
kale.embed.factorization module
Python implementation of a tensor factorization algorithm Multilinear Principal Component Analysis (MPCA) and a matrix factorization algorithm Maximum Independence Domain Adaptation (MIDA)
- class kale.embed.factorization.MPCA(var_ratio=0.97, max_iter=1, vectorize=False, n_components=None)
Bases:
BaseEstimator
,TransformerMixin
MPCA implementation compatible with sickit-learn
- Parameters
var_ratio (float, optional) – Percentage of variance explained (between 0 and 1). Defaults to 0.97.
max_iter (int, optional) – Maximum number of iteration. Defaults to 1.
vectorize (bool) – Whether return the transformed/projected tensor in vector. Defaults to False.
n_components (int) – Number of components to keep. Applies only when vectorize=True. Defaults to None.
- proj_mats
A list of transposed projection matrices, shapes (P_1, I_1), …, (P_N, I_N), where P_1, …, P_N are output tensor shape for each sample.
- Type
list of arrays
- idx_order
The ordering index of projected (and vectorized) features in decreasing variance.
- Type
array-like
- mean_
Per-feature empirical mean, estimated from the training set, shape (I_1, I_2, …, I_N).
- Type
array-like
- shape_in
Input tensor shapes, i.e. (I_1, I_2, …, I_N).
- Type
tuple
- shape_out
Output tensor shapes, i.e. (P_1, P_2, …, P_N).
- Type
tuple
- Reference:
Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “MPCA: Multilinear Principal Component Analysis of Tensor Objects”, IEEE Transactions on Neural Networks, Vol. 19, No. 1, Page: 18-39, January 2008. For initial Matlab implementation, please go to https://uk.mathworks.com/matlabcentral/fileexchange/26168.
Examples
>>> import numpy as np >>> from kale.embed.mpca import MPCA >>> x = np.random.random((40, 20, 25, 20)) >>> x.shape (40, 20, 25, 20) >>> mpca = MPCA() >>> x_projected = mpca.fit_transform(x) >>> x_projected.shape (40, 18, 23, 18) >>> x_projected = mpca.transform(x) >>> x_projected.shape (40, 7452) >>> x_projected = mpca.transform(x) >>> x_projected.shape (40, 50) >>> x_rec = mpca.inverse_transform(x_projected) >>> x_rec.shape (40, 20, 25, 20)
- fit(x, y=None)
Fit the model with input training data x.
- Args
- x (array-like tensor): Input data, shape (n_samples, I_1, I_2, …, I_N), where n_samples is the number of
samples, I_1, I_2, …, I_N are the dimensions of corresponding mode (1, 2, …, N), respectively.
y (None): Ignored variable.
- Returns
self (object). Returns the instance itself.
- transform(x)
Perform dimension reduction on x
- Parameters
x (array-like tensor) – Data to perform dimension reduction, shape (n_samples, I_1, I_2, …, I_N).
- Returns
Projected data in lower dimension, shape (n_samples, P_1, P_2, …, P_N) if self.vectorize==False. If self.vectorize==True, features will be sorted based on their explained variance ratio, shape (n_samples, P_1 * P_2 * … * P_N) if self.n_components is None, and shape (n_samples, n_components) if self.n_component is a valid integer.
- Return type
array-like tensor
- inverse_transform(x)
Reconstruct projected data to the original shape and add the estimated mean back
- Parameters
x (array-like tensor) – Data to be reconstructed, shape (n_samples, P_1, P_2, …, P_N), if self.vectorize == False, where P_1, P_2, …, P_N are the reduced dimensions of corresponding mode (1, 2, …, N), respectively. If self.vectorize == True, shape (n_samples, self.n_components) or shape (n_samples, P_1 * P_2 * … * P_N).
- Returns
Reconstructed tensor in original shape, shape (n_samples, I_1, I_2, …, I_N)
- Return type
array-like tensor
- class kale.embed.factorization.MIDA(n_components, kernel='linear', lambda_=1.0, mu=1.0, eta=1.0, augmentation=False, kernel_params=None)
Bases:
BaseEstimator
,TransformerMixin
Maximum independence domain adaptation :param n_components: Number of components to keep. :type n_components: int :param kernel: “linear”, “rbf”, or “poly”. Kernel to use for MIDA. Defaults to “linear”. :type kernel: str :param mu: Hyperparameter of the l2 penalty. Defaults to 1.0. :type mu: float :param eta: Hyperparameter of the label dependence. Defaults to 1.0. :type eta: float :param augmentation: Whether using covariates as augment features. Defaults to False. :type augmentation: bool :param kernel_params: Parameters for the kernel. Defaults to None. :type kernel_params: dict or None
References
- [1] Yan, K., Kou, L. and Zhang, D., 2018. Learning domain-invariant subspace using domain features and
independence maximization. IEEE transactions on cybernetics, 48(1), pp.288-299.
- fit(x, y=None, covariates=None)
- Parameters
x – array-like. Input data, shape (n_samples, n_features)
y – array-like. Labels, shape (nl_samples,)
covariates – array-like. Domain co-variates, shape (n_samples, n_co-variates)
Note
Unsupervised MIDA is performed if y is None. Semi-supervised MIDA is performed is y is not None.
- fit_transform(x, y=None, covariates=None)
- Parameters
x – array-like, shape (n_samples, n_features)
y – array-like, shape (n_samples,)
covariates – array-like, shape (n_samples, n_covariates)
- Returns
array-like, shape (n_samples, n_components)
- Return type
x_transformed
- transform(x, covariates=None)
- Parameters
x – array-like, shape (n_samples, n_features)
covariates – array-like, augmentation features, shape (n_samples, n_covariates)
- Returns
array-like, shape (n_samples, n_components)
- Return type
x_transformed
kale.embed.gcn module
- class kale.embed.gcn.GCNEncoderLayer(in_channels, out_channels, improved=False, cached=False, bias=True, **kwargs)
Bases:
MessagePassing
Modification of PyTorch Geometirc’s nn.GCNConv, which reduces the computational cost of GCN layer for GripNet model. The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” (ICLR 2017) paper.
\[\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},\]where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix.
Note: For more information please see Pytorch Geomertic’s nn.GCNConv docs.
- Parameters
in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
improved (bool, optional) – If set to
True
, the layer computes \(\mathbf{\hat{A}}\) as \(\mathbf{A} + 2\mathbf{I}\). (default:False
)cached (bool, optional) – If set to
True
, the layer will cache the computation of \(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}\) on first execution, and will use the cached version for further executions. This parameter should only be set toTrue
in transductive learning scenarios. (default:False
)bias (bool, optional) – If set to
False
, the layer will not learn an additive bias. (default:True
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- reset_parameters()
- static norm(edge_index, num_nodes, edge_weight, improved=False, dtype=None)
Add self-loops and apply symmetric normalization
- forward(x, edge_index, edge_weight=None)
- Parameters
x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_weight (torch.Tensor, optional) – The one-dimensional relation weight for each edge in
edge_index
(default: None).
- class kale.embed.gcn.RGCNEncoderLayer(in_channels, out_channels, num_relations, num_bases, after_relu, bias=False, **kwargs)
Bases:
MessagePassing
Modification of PyTorch Geometirc’s nn.RGCNConv, which reduces the computational and memory cost of RGCN encoder layer for GripNet model. The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper.
\[\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,\]where \(\mathcal{R}\) denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional
torch.long
tensor which stores a relation identifier \(\in \{ 0, \ldots, |\mathcal{R}| - 1\}\) for each edge.Note: For more information please see Pytorch Geomertic’s nn.RGCNConv docs.
- Parameters
in_channels (int) – Size of each input sample.
out_channels (int) – Size of each output sample.
num_relations (int) – Number of edge relations.
num_bases (int) – Use bases-decoposition regulatization scheme and num_bases denotes the number of bases.
after_relu (bool) – Whether input embedding is activated by relu function or not.
bias (bool) – If set to
False
, the layer will not learn an additive bias. (default:False
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- reset_parameters()
- forward(x, edge_index, edge_type, range_list)
- Parameters
x (torch.Tensor) – The input node feature embedding.
edge_index (torch.Tensor) – Graph edge index in COO format with shape [2, num_edges].
edge_type (torch.Tensor) – The one-dimensional relation type/index for each edge in
edge_index
.range_list (torch.Tensor) – The index range list of each edge type with shape [num_types, 2].
kale.embed.gripnet module
kale.embed.image_cnn module
CNNs for extracting features from small images of size 32x32 (e.g. MNIST) and regular images of size 224x224 (e.g. ImageNet). The code is based on https://github.com/criteo-research/pytorch-ada/blob/master/adalib/ada/models/modules.py,
which is for domain adaptation.
- class kale.embed.image_cnn.SmallCNNFeature(num_channels=3, kernel_size=5)
Bases:
Module
A feature extractor for small 32x32 images (e.g. CIFAR, MNIST) that outputs a feature vector of length 128.
- Parameters
num_channels – the number of input channels (default=3).
kernel_size – the size of the convolution kernel (default=5).
- Examples::
>>> feature_network = SmallCNNFeature(num_channels)
- forward(input_)
- output_size()
- training: bool
- class kale.embed.image_cnn.ResNet18Feature(pretrained=True)
Bases:
Module
Modified ResNet18 (without the last layer) feature extractor for regular 224x224 images.
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
Note
Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py
- forward(x)
- output_size()
- training: bool
- class kale.embed.image_cnn.ResNet34Feature(pretrained=True)
Bases:
Module
Modified ResNet34 (without the last layer) feature extractor for regular 224x224 images.
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
Note
Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py
- forward(x)
- output_size()
- training: bool
- class kale.embed.image_cnn.ResNet50Feature(pretrained=True)
Bases:
Module
Modified ResNet50 (without the last layer) feature extractor for regular 224x224 images.
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
Note
Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py
- forward(x)
- output_size()
- training: bool
- class kale.embed.image_cnn.ResNet101Feature(pretrained=True)
Bases:
Module
Modified ResNet101 (without the last layer) feature extractor for regular 224x224 images.
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
Note
Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py
- forward(x)
- output_size()
- training: bool
- class kale.embed.image_cnn.ResNet152Feature(pretrained=True)
Bases:
Module
Modified ResNet152 (without the last layer) feature extractor for regular 224x224 images.
- Parameters
pretrained (bool) – If True, returns a model pre-trained on ImageNet
Note
Code adapted by pytorch-ada from https://github.com/thuml/Xlearn/blob/master/pytorch/src/network.py
- forward(x)
- output_size()
- training: bool
kale.embed.positional_encoding module
- class kale.embed.positional_encoding.PositionalEncoding(d_model: int, max_len: int = 5000)
Bases:
Module
Implements the positional encoding as described in the NIPS2017 paper ‘Attention Is All You Need’ about Transformers (https://arxiv.org/abs/1706.03762). Essentially, for all timesteps in a given sequence, adds information about the relative temporal location of a timestep directly into the features of that timestep, and then returns this slightly-modified, same-shape sequence.
- Parameters
d_model – the number of features that each timestep has (required).
max_len – the maximum sequence length that the positional encodings should support (required).
- forward(x)
Expects input of shape (sequence_length, batch_size, num_features) and returns output of the same shape. sequence_length is at most allowed to be self.max_len and num_features is expected to be exactly self.d_model
- Parameters
x – a sequence input of shape (sequence_length, batch_size, num_features) (required).
- training: bool
kale.embed.seq_nn module
DeepDTA based models for drug-target interaction prediction problem.
- class kale.embed.seq_nn.CNNEncoder(num_embeddings, embedding_dim, sequence_length, num_kernels, kernel_length)
Bases:
Module
The DeepDTA’s CNN encoder module, which comprises three 1D-convolutional layers and one max-pooling layer. The module is applied to encoding drug/target sequence information, and the input should be transformed information with integer/label encoding. The original paper is “DeepDTA: deep drug–target binding affinity prediction” .
- Parameters
num_embeddings (int) – Number of embedding labels/categories, depends on the types of encoding sequence.
embedding_dim (int) – Dimension of embedding labels/categories.
sequence_length (int) – Max length of input sequence.
num_kernels (int) – Number of kernels (filters).
kernel_length (int) – Length of kernel (filter).
- forward(x)
- training: bool
- class kale.embed.seq_nn.GCNEncoder(in_channel=78, out_channel=128, dropout_rate=0.2)
Bases:
Module
The GraphDTA’s GCN encoder module, which comprises three graph convolutional layers and one full connected layer. The model is a variant of DeepDTA and is applied to encoding drug molecule graph information. The original paper is “GraphDTA: Predicting drug–target binding affinity with graph neural networks” .
- Parameters
in_channel (int) – Dimension of each input node feature.
out_channel (int) – Dimension of each output node feature.
dropout_rate (float) – dropout rate during training.
- forward(x, edge_index, batch)
- training: bool
kale.embed.video_feature_extractor module
Define the feature extractor for video including I3D, R3D_18, MC3_18 and R2PLUS1D_18 w/o SELayers.
- kale.embed.video_feature_extractor.get_video_feat_extractor(model_name, image_modality, attention, num_classes)
Get the feature extractor w/o the pre-trained model and SELayers. The pre-trained models are saved in the path
$XDG_CACHE_HOME/torch/hub/checkpoints/
. For Linux, default path is~/.cache/torch/hub/checkpoints/
. For Windows, default path isC:/Users/$USER_NAME/.cache/torch/hub/checkpoints/
. Provide four pre-trained models: “rgb_imagenet”, “flow_imagenet”, “rgb_charades”, “flow_charades”.- Parameters
model_name (string) – The name of the feature extractor. (Choices=[“I3D”, “R3D_18”, “R2PLUS1D_18”, “MC3_18”])
image_modality (string) – Image type. (Choices=[“rgb”, “flow”, “joint”])
attention (string) – The attention type. (Choices=[“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerCT”, “SELayerTC”, “SELayerMAC”])
num_classes (int) – The class number of specific dataset. (Default: No use)
- Returns
The network to extract features. class_feature_dim (int): The dimension of the feature network output for ClassNet.
It is a convention when the input dimension and the network is fixed.
domain_feature_dim (int): The dimension of the feature network output for DomainNet.
- Return type
feature_network (dictionary)
kale.embed.video_i3d module
Define Inflated 3D ConvNets(I3D) on Action Recognition from https://ieeexplore.ieee.org/document/8099985 Created by Xianyuan Liu from modifying https://github.com/piergiaj/pytorch-i3d/blob/master/pytorch_i3d.py and https://github.com/deepmind/kinetics-i3d/blob/master/i3d.py
- class kale.embed.video_i3d.MaxPool3dSamePadding(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)
Bases:
MaxPool3d
Construct 3d max pool with same padding. PyTorch does not provide same padding. Same padding means the output size matches input size for stride=1.
- compute_pad(dim, s)
Get the zero padding number.
- forward(x)
Compute ‘same’ padding. Add zero to the back position first.
- kernel_size: Union[int, Tuple[int, int, int]]
- stride: Union[int, Tuple[int, int, int]]
- padding: Union[int, Tuple[int, int, int]]
- dilation: Union[int, Tuple[int, int, int]]
- class kale.embed.video_i3d.Unit3D(in_channels, output_channels, kernel_shape=(1, 1, 1), stride=(1, 1, 1), padding=0, activation_fn=<function relu>, use_batch_norm=True, use_bias=False, name='unit_3d')
Bases:
Module
Basic unit containing Conv3D + BatchNorm + non-linearity.
- compute_pad(dim, s)
Get the zero padding number.
- forward(x)
Connects the module to inputs. Dynamically pad based on input size in forward function. :param x: Inputs to the Unit3D component.
- Returns
Outputs from the module.
- training: bool
- class kale.embed.video_i3d.InceptionModule(in_channels, out_channels, name)
Bases:
Module
Construct Inception module. Concatenation after four branches (1x1x1 conv; 1x1x1 + 3x3x3 convs; 1x1x1 + 3x3x3 convs; 3x3x3 max-pool + 1x1x1 conv). In forward, we check if SELayers are used, which are channel-wise (SELayerC), temporal-wise (SELayerT), channel-temporal-wise (SELayerTC & SELayerCT).
- forward(x)
- training: bool
- class kale.embed.video_i3d.InceptionI3d(num_classes=400, spatial_squeeze=True, final_endpoint='Logits', name='inception_i3d', in_channels=3, dropout_keep_prob=0.5)
Bases:
Module
Inception-v1 I3D architecture. The model is introduced in:
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset Joao Carreira, Andrew Zisserman https://arxiv.org/pdf/1705.07750v1.pdf.
- See also the Inception architecture, introduced in:
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/pdf/1409.4842v1.pdf.
- VALID_ENDPOINTS = ('Conv3d_1a_7x7', 'MaxPool3d_2a_3x3', 'Conv3d_2b_1x1', 'Conv3d_2c_3x3', 'MaxPool3d_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'MaxPool3d_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool3d_5a_2x2', 'Mixed_5b', 'Mixed_5c', 'Logits', 'Predictions')
- replace_logits(num_classes)
Update the output size with num_classes according to the specific setting.
- build()
- forward(x)
The output is the result of the final average pooling layer with 1024 dimensions.
- extract_features(x)
- training: bool
- kale.embed.video_i3d.i3d(name, num_channels, num_classes, pretrained=False, progress=True)
Get InceptionI3d module w/o pretrained model.
- kale.embed.video_i3d.i3d_joint(rgb_pt, flow_pt, num_classes, pretrained=False, progress=True)
Get I3D models for different inputs.
- Parameters
rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for flow input.
num_classes (int) – the class number of dataset.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)
- Returns
A dictionary contains RGB and flow models.
- Return type
models (dictionary)
kale.embed.video_res3d module
Define MC3_18, R3D_18, R2plus1D_18 on Action Recognition from https://arxiv.org/abs/1711.11248 Created by Xianyuan Liu from modifying https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py
- class kale.embed.video_res3d.Conv3DSimple(in_planes, out_planes, midplanes=None, stride=1, padding=1)
Bases:
Conv3d
3D convolutions for R3D (3x3x3 kernel)
- static get_downsample_stride(stride)
- bias: Optional[Tensor]
- out_channels: int
- kernel_size: Tuple[int, ...]
- stride: Tuple[int, ...]
- padding: Union[str, Tuple[int, ...]]
- dilation: Tuple[int, ...]
- transposed: bool
- output_padding: Tuple[int, ...]
- groups: int
- padding_mode: str
- weight: Tensor
- class kale.embed.video_res3d.Conv2Plus1D(in_planes, out_planes, midplanes, stride=1, padding=1)
Bases:
Sequential
(2+1)D convolutions for R2plus1D (1x3x3 kernel + 3x1x1 kernel)
- static get_downsample_stride(stride)
- training: bool
- class kale.embed.video_res3d.Conv3DNoTemporal(in_planes, out_planes, midplanes=None, stride=1, padding=1)
Bases:
Conv3d
3D convolutions without temporal dimension for MCx (1x3x3 kernel)
- static get_downsample_stride(stride)
- bias: Optional[Tensor]
- out_channels: int
- kernel_size: Tuple[int, ...]
- stride: Tuple[int, ...]
- padding: Union[str, Tuple[int, ...]]
- dilation: Tuple[int, ...]
- transposed: bool
- output_padding: Tuple[int, ...]
- groups: int
- padding_mode: str
- weight: Tensor
- class kale.embed.video_res3d.BasicBlock(inplanes, planes, conv_builder, stride=1, downsample=None)
Bases:
Module
Basic ResNet building block. Each block consists of two convolutional layers with a ReLU activation function after each layer and residual connections. In forward, we check if SELayers are used, which are channel-wise (SELayerC) and temporal-wise (SELayerT).
- expansion = 1
- forward(x)
- training: bool
- class kale.embed.video_res3d.Bottleneck(inplanes, planes, conv_builder, stride=1, downsample=None)
Bases:
Module
BottleNeck building block. Default: No use. Each block consists of two 1*n*n and one n*n*n convolutional layers with a ReLU activation function after each layer and residual connections.
- expansion = 4
- forward(x)
- training: bool
- class kale.embed.video_res3d.BasicStem
Bases:
Sequential
The default conv-batchnorm-relu stem. The first layer normally. (64 3x7x7 kernels)
- training: bool
- class kale.embed.video_res3d.BasicFLowStem
Bases:
Sequential
The default stem for optical flow.
- training: bool
- class kale.embed.video_res3d.R2Plus1dStem
Bases:
Sequential
R(2+1)D stem is different than the default one as it uses separated 3D convolution. (45 1x7x7 kernels + 64 3x1x1 kernel)
- training: bool
- class kale.embed.video_res3d.R2Plus1dFlowStem
Bases:
Sequential
R(2+1)D stem for optical flow.
- training: bool
- class kale.embed.video_res3d.VideoResNet(block, conv_makers, layers, stem, num_classes=400, zero_init_residual=False)
Bases:
Module
- replace_fc(num_classes, block=<class 'kale.embed.video_res3d.BasicBlock'>)
Update the output size with num_classes according to the specific setting.
- forward(x)
- training: bool
- kale.embed.video_res3d.r3d_18_rgb(pretrained=False, progress=True, **kwargs)
Construct 18 layer Resnet3D model for RGB as in https://arxiv.org/abs/1711.11248
- Parameters
pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr
- Returns
R3D-18 network
- Return type
nn.Module
- kale.embed.video_res3d.r3d_18_flow(pretrained=False, progress=True, **kwargs)
Construct 18 layer Resnet3D model for optical flow.
- kale.embed.video_res3d.mc3_18_rgb(pretrained=False, progress=True, **kwargs)
Constructor for 18 layer Mixed Convolution network for RGB as in https://arxiv.org/abs/1711.11248
- Parameters
pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr
- Returns
MC3 Network definition
- Return type
nn.Module
- kale.embed.video_res3d.mc3_18_flow(pretrained=False, progress=True, **kwargs)
Constructor for 18 layer Mixed Convolution network for optical flow.
- kale.embed.video_res3d.r2plus1d_18_rgb(pretrained=False, progress=True, **kwargs)
Constructor for the 18 layer deep R(2+1)D network for RGB as in https://arxiv.org/abs/1711.11248
- Parameters
pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr
- Returns
R(2+1)D-18 network
- Return type
nn.Module
- kale.embed.video_res3d.r2plus1d_18_flow(pretrained=False, progress=True, **kwargs)
Constructor for the 18 layer deep R(2+1)D network for optical flow.
- kale.embed.video_res3d.r3d(rgb=False, flow=False, pretrained=False, progress=True)
Get R3D_18 models.
- kale.embed.video_res3d.mc3(rgb=False, flow=False, pretrained=False, progress=True)
Get MC3_18 models.
- kale.embed.video_res3d.r2plus1d(rgb=False, flow=False, pretrained=False, progress=True)
Get R2PLUS1D_18 models.
kale.embed.video_selayer module
Python implementation of Squeeze-and-Excitation Layers (SELayer) Initial implementation: channel-wise (SELayerC) Improved implementation: temporal-wise (SELayerT), convolution-based channel-wise (SELayerCoC), max-pooling-based channel-wise (SELayerMC), multi-pooling-based channel-wise (SELayerMAC)
[Redundancy and repeat of code will be reduced in the future.]
References
Hu Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” In CVPR, pp. 7132-7141. 2018. For initial implementation, please go to https://github.com/hujie-frank/SENet
- kale.embed.video_selayer.get_selayer(attention)
Get SELayers referring to attention.
- Parameters
attention (string) – the name of the SELayer. (Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”])
- Returns
the SELayer.
- Return type
se_layer (SELayer, optional)
- class kale.embed.video_selayer.SELayer(channel, reduction=16)
Bases:
Module
Helper class for SELayer design.
- forward(x)
- training: bool
- class kale.embed.video_selayer.SELayerC(channel, reduction=16)
Bases:
SELayer
Construct channel-wise SELayer.
- forward(x)
- training: bool
- class kale.embed.video_selayer.SELayerT(channel, reduction=2)
Bases:
SELayer
Construct temporal-wise SELayer.
- forward(x)
- training: bool
- class kale.embed.video_selayer.SELayerCoC(channel, reduction=16)
Bases:
SELayer
Construct convolution-based channel-wise SELayer.
- forward(x)
- training: bool
kale.embed.video_se_i3d module
Add SELayers to I3D
- class kale.embed.video_se_i3d.SEInceptionI3DRGB(num_channels, num_classes, attention)
Bases:
Module
Add the several SELayers to I3D for RGB input. :param num_channels: the channel number of the input. :type num_channels: int :param num_classes: the class number of dataset. :type num_classes: int :param attention: the name of the SELayer.
(Options: [“SELayerC”, “SELayerT”, “SELayerCoC”, “SELayerMC”, “SELayerMAC”, “SELayerCT” and “SELayerTC”])
- Returns
I3D model with SELayers.
- Return type
model (VideoResNet)
- forward(x)
- training: bool
- class kale.embed.video_se_i3d.SEInceptionI3DFlow(num_channels, num_classes, attention)
Bases:
Module
Add the several SELayers to I3D for optical flow input.
- forward(x)
- training: bool
- kale.embed.video_se_i3d.se_inception_i3d(name, num_channels, num_classes, attention, pretrained=False, progress=True, rgb=True)
Get InceptionI3d module w/o SELayer and pretrained model.
- kale.embed.video_se_i3d.se_i3d_joint(rgb_pt, flow_pt, num_classes, attention, pretrained=False, progress=True)
Get I3D models with SELayers for different inputs.
- Parameters
rgb_pt (string, optional) – the name of pre-trained model for RGB input.
flow_pt (string, optional) – the name of pre-trained model for optical flow input.
num_classes (int) – the class number of dataset.
attention (string, optional) – the name of the SELayer.
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)
- Returns
A dictionary contains models for RGB and optical flow.
- Return type
models (dictionary)
kale.embed.video_se_res3d module
Add SELayers to MC3_18, R3D_18, R2plus1D_18
- kale.embed.video_se_res3d.se_r3d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_r3d_18_flow(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_mc3_18_rgb(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_mc3_18_flow(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_r2plus1d_18_rgb(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_r2plus1d_18_flow(attention, pretrained=False, progress=True, **kwargs)
- kale.embed.video_se_res3d.se_r3d(attention, rgb=False, flow=False, pretrained=False, progress=True)
Get R3D_18 models with SELayers for different inputs.
- Parameters
attention (string) – the name of the SELayer.
rgb (bool) – choose if RGB model is needed. (Default: False)
flow (bool) – choose if optical flow model is needed. (Default: False)
pretrained (bool) – choose if pretrained parameters are used. (Default: False)
progress (bool, optional) – whether or not to display a progress bar to stderr. (Default: True)
- Returns
A dictionary contains models for RGB and optical flow.
- Return type
models (dictionary)
- kale.embed.video_se_res3d.se_mc3(attention, rgb=False, flow=False, pretrained=False, progress=True)
Get MC3_18 models with SELayers for different inputs.
- kale.embed.video_se_res3d.se_r2plus1d(attention, rgb=False, flow=False, pretrained=False, progress=True)
Get R2+1D_18 models with SELayers for different inputs.