Preprocess Data
Submodules
kale.prepdata.chem_transform module
Functions for labeling and encoding chemical characters like Compound SMILES and atom string, refer to https://github.com/hkmztrk/DeepDTA and https://github.com/thinng/GraphDTA.
- kale.prepdata.chem_transform.integer_label_smiles(smiles, max_length=85, isomeric=False)
Integer encoding for SMILES string sequence.
- Parameters
smiles (str) – Simplified molecular-input line-entry system, which is a specification in the form of a line
strings. (notation for describing the structure of chemical species using short ASCII) –
max_length (int) – Maximum encoding length of input SMILES string. (default: 85)
isomeric (bool) – Whether the input SMILES string includes isomeric information (default: False).
- kale.prepdata.chem_transform.integer_label_protein(sequence, max_length=1200)
Integer encoding for protein string sequence.
- Parameters
sequence (str) – Protein string sequence.
max_length – Maximum encoding length of input protein string. (default: 1200)
kale.prepdata.image_transform module
kale.prepdata.supergraph_construct module
kale.prepdata.tensor_reshape module
- kale.prepdata.tensor_reshape.spatial_to_seq(image_tensor: Tensor)
Takes a torch tensor of shape (batch_size, channels, height, width) as used and outputted by CNNs and creates a sequence view of shape (sequence_length, batch_size, channels) as required by torch’s transformer module. In other words, unrolls the spatial grid into the sequence length and rearranges the dimension ordering.
- Parameters
image_tensor – tensor of shape (batch_size, channels, height, width) (required).
- kale.prepdata.tensor_reshape.seq_to_spatial(sequence_tensor: Tensor, desired_height: int, desired_width: int)
Takes a torch tensor of shape (sequence_length, batch_size, num_features) as used and outputted by Transformers and creates a view of shape (batch_size, num_features, height, width) as used and outputted by CNNs. In other words, rearranges the dimension ordering and rolls sequence_length into (height,width). height*width must equal the sequence length of the input sequence.
- Parameters
sequence_tensor – sequence tensor of shape (sequence_length, batch_size, num_features) (required).
desired_height – the height into which the sequence length should be rolled into (required).
desired_width – the width into which the sequence length should be rolled into (required).
kale.prepdata.video_transform module
- kale.prepdata.video_transform.get_transform(kind, image_modality)
Define transforms (for commonly used datasets)
- Parameters
kind ([type]) – the dataset (transformation) name
image_modality (string) – image type (RGB or Optical Flow)
- class kale.prepdata.video_transform.ImglistToTensor
Bases:
Module
Converts a list of PIL images in the range [0,255] to a torch.FloatTensor of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]. Can be used as first transform for
kale.loaddata.videos.VideoFrameDataset
.- forward(img_list)
For RGB input, converts each PIL image in a list to a torch Tensor and stacks them into a single tensor. For flow input, converts every two PIL images (x(u)_img, y(v)_img) in a list to a torch Tensor and stacks them. For example, if input list size is 16, the dimension is [16, 1, 224, 224] and the frame order is [frame 1_x, frame 1_y, frame 2_x, frame 2_y, frame 3_x, …, frame 8_x, frame 8_y]. The output will be [[frame 1_x, frame 1_y], [frame 2_x, frame 2_y], [frame 3_x, …, [frame 8_x, frame 8_y]] and the dimension is [8, 2, 224, 224].
- Parameters
img_list – list of PIL images.
- Returns
tensor of size `` NUM_IMAGES x CHANNELS x HEIGHT x WIDTH``
- class kale.prepdata.video_transform.TensorPermute
Bases:
Module
Convert a torch.FloatTensor of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) to a torch.FloatTensor of shape (CHANNELS x NUM_IMAGES x HEIGHT x WIDTH).