Utilities
Submodules
kale.utils.distance module
Provide distance (i.e., similarity) calculation methods using various distance metrics.
- kale.utils.distance.calculate_distance(x1: Tensor, x2: Tensor | None = None, eps: float = 1e-08, metric: DistanceMetric = DistanceMetric.COSINE) Tensor
Returns similarity between \(x_1\) and \(x_2\), computed along `dim`=1. This method calculates the similarity between each pair of data points in two input matrices.
Note that this implementation differs from the existing implementations in PyTorch, as they calculate the similarity between each row of one matrix with its corresponding row in the other matrix (i.e., pairwise distance between columns of input matrices).
- Parameters:
x1 (torch.Tensor) – The tensor input data.
x2 (torch.Tensor, optional) – The tensor input data. (default
None
)eps (float, optional) – Small value to avoid division by zero. (default: 1e-8)
metric (DistanceMetric, optional) – The metric to compute distance between input matrices. (default:
DistanceMetric.COSINE
)Returns –
torch.Tensor – The computed similarity tensor between \(x_1\) and \(x_2\).
kale.utils.download module
Data downloading and compressed data extraction functions, Based on https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py https://github.com/pytorch/pytorch/blob/master/torch/hub.py
- kale.utils.download.download_file_by_url(url, output_directory, output_file_name, file_format=None)
Download file/compressed file by url.
- Parameters:
url (string) – URL of the object to download
output_directory (string, optional) – Full path where object will be saved Abosolute path recommended. Relative path also works.
output_file_name (string, optional) – File name which object will be saved as
file_format (string, optional) – File format For compressed file, support [“tar.xz”, “tar”, “tar.gz”, “tgz”, “gz”, “zip”]
- Example: (Grab the raw link from GitHub. Notice that using “raw” in the URL.)
>>> url = "https://github.com/pykale/data/raw/main/videos/video_test_data/ADL/annotations/labels_train_test/adl_P_04_train.pkl" >>> download_file_by_url(url, "data", "a.pkl", "pkl")
>>> url = "https://github.com/pykale/data/raw/main/videos/video_test_data.zip" >>> download_file_by_url(url, "data", "video_test_data.zip", "zip")
- kale.utils.download.download_file_gdrive(id, output_directory, output_file_name, file_format=None)
Download file/compressed file by Google Drive id.
- Parameters:
id (string) – Google Drive file id of the object to download
output_directory (string, optional) – Full path where object will be saved Abosolute path recommended. Relative path also works.
output_file_name (string, optional) – File name which object will be saved as
file_format (string, optional) – File format For compressed file, support [“tar.xz”, “tar”, “tar.gz”, “tgz”, “gz”, “zip”]
Example
>>> gdrive_id = "1U4D23R8u8MJX9KVKb92bZZX-tbpKWtga" >>> download_file_gdrive(gdrive_id, "data", "demo_datasets.zip", "zip")
>>> gdrive_id = "1SV7fmAnWj-6AU9X5BGOrvGMoh2Gu9Nih" >>> download_file_gdrive(gdrive_id, "data", "dummy_data.csv", "csv")
kale.utils.initialize_nn module
Provide methods for initializing neural network parameters (i.e., weights and biases).
- kale.utils.initialize_nn.xavier_init(module) None
Fills the weight of the input Tensor with values using a normal distribution.
- Parameters:
module (torch.Tensor) – The input module.
- kale.utils.initialize_nn.bias_init(module) None
Fills the bias of the input Tensor with zeros.
- Parameters:
module (torch.Tensor) – The input module.
kale.utils.logger module
Logging functions, based on https://github.com/HaozhiQi/ISONet/blob/master/isonet/utils/logger.py
- kale.utils.logger.out_file_core()
Creates an output file name concatenating a formatted date and uuid, but without an extension.
- Returns:
A string to be used in a file name.
- Return type:
string
- kale.utils.logger.construct_logger(name, save_dir, log_to_terminal=False)
Constructs a logger. Saves the output as a text file at a specified path. Also saves the output of git diff HEAD to the same folder. Takes option to log to terminal, which will print logging statements. Default is False.
The logger is configured to output messages at the DEBUG level, and it saves the output as a text file with a name based on the current timestamp and the specified name. It also saves the output of git diff HEAD to a file with the same name and the extension .gitdiff.patch.
- Parameters:
name (str) – The name of the logger, typically the name of the method being logged.
save_dir (str) – The directory where the log file and git diff file will be saved.
log_to_terminal (bool, optional) – Whether to also log messages to the terminal. Defaults to False.
- Returns:
The constructed logger.
- Return type:
logging.Logger
- Raises:
None. –
kale.utils.print module
Screen printing functions, from https://github.com/HaozhiQi/ISONet/blob/master/isonet/utils/misc.py
- kale.utils.print.tprint(*args)
Temporarily prints things on the screen so that it won’t be flooded
- kale.utils.print.pprint(*args)
Permanently prints things on the screen to have all info displayed
- kale.utils.print.pprint_without_newline(*args)
Permanently prints things on the screen, separated by space rather than newline
kale.utils.save_xlsx module
Authors: Lawrence Schobs, lawrenceschobs@gmail.com
Functions to save results to Excel files.
- kale.utils.save_xlsx.generate_summary_df(results_dictionary: dict, cols_to_save: list, sheet_name: str, save_location: str) DataFrame
Generates pandas dataframe with summary statistics. Designed for use in Quantile Binning (/pykale/examples/landmark_uncertainty/main.py).
- Parameters:
results_dictionary (dict) – A dictionary containing results for each quantile bin and uncertainty method. The keys are strings indicating the name of each uncertainty method. The values are dictionaries containing results for each quantile bin.
cols_to_save (list) – A list of 2-element lists, each containing a string indicating the key in the results_dictionary and a string indicating the name to use for that column in the output dataframe.
sheet_name (str) – The name of the sheet to create in the output Excel file.
save_location (str) – The file path and name to use for the output Excel file.
- Returns:
A dataframe with statistics including mean error, std error of All and individual targets. Also includes the Sucess detection rates (SDR). The dataframe should have the following structure:
- df = {
“All um col_save_name Mean”: value, “All um col_save_name Std”: value, “B1 um col_save_name Mean”: value, “B1 um col_save_name Std”: value, …
}
- Return type:
pd.DataFrame
- kale.utils.save_xlsx.save_dict_xlsx(data_dict: Dict[Any, Any], save_location: str, sheet_name: str) None
Save a dictionary to an Excel file using the XlsxWriter engine.
- Parameters:
data_dict (Dict[Any, Any]) – The dictionary that needs to be saved to an Excel file. The keys represent the row index and the values represent the data in the row. If a dictionary value is a list or a series, each element in the list/series will be a column in the row.
save_location (str) – The location where the Excel file will be saved. This should include the full path and the filename, for example, “/path/to/save/data.xlsx”. Overwrites the file if it already exists.
sheet_name (str) – The name of the sheet where the dictionary will be saved in the Excel file.
- Returns:
This function does not return anything. It saves the dictionary as an Excel file at the specified location.
- Return type:
None
kale.utils.seed module
Setting seed for reproducibility
- kale.utils.seed.set_seed(seed=1000)
Sets the seed for generating random numbers to get (as) reproducible (as possible) results.
The CuDNN options are set according to the official PyTorch guidance on reproducibility: https://pytorch.org/docs/stable/notes/randomness.html. Another references are https://discuss.pytorch.org/t/difference-between-torch-manual-seed-and-torch-cuda-manual-seed/13848/6 https://pytorch.org/docs/stable/cuda.html#torch.cuda.manual_seed https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/utils.py#L58
- Parameters:
seed (int, optional) – The desired seed. Defaults to 1000.