Interpret

Submodules

kale.interpret.model_weights module

kale.interpret.model_weights.select_top_weight(weights, select_ratio: float = 0.05)

Select top weights in magnitude, and the rest of weights will be zeros

Parameters:
  • weights (array-like) – model weights, can be a vector or a higher order tensor

  • select_ratio (float, optional) – ratio of top weights to be selected. Defaults to 0.05.

Returns:

top weights in the same shape with the input model weights

Return type:

array-like

kale.interpret.uncertainty_quantiles module

Authors: Lawrence Schobs, lawrenceschobs@gmail.com

Module from the implementation of L. A. Schobs, A. J. Swift and H. Lu, “Uncertainty Estimation for Heatmap-Based Landmark Localization,” in IEEE Transactions on Medical Imaging, vol. 42, no. 4, pp. 1021-1034, April 2023, doi: 10.1109/TMI.2022.3222730.

Functions related to interpreting the uncertainty quantiles from the quantile binning method in terms of:
  1. Correlation of uncertainty with error (fit_line_with_ci)

  2. Perform Isotonic regression on uncertainty & error pairs (quantile_binning_and_est_errors)

  3. Plot boxplots: generic_box_plot_loop, format_plot, box_plot_per_model, box_plot_comparing_q

  4. Plot cumularive error plots: plot_cumulative

  5. Big caller functions for analysis loop for QBinning: generate_fig_individual_bin_comparison, generate_fig_comparing_bins

kale.interpret.uncertainty_quantiles.fit_line_with_ci(errors: ndarray, uncertainties: ndarray, quantile_thresholds: List[float], cmaps: List[Dict[str, str]], to_log: bool = False, error_scaling_factor: float = 1.0, save_path: str | None = None) Dict[str, List[Any]]

Calculates Spearman correlation between errors and uncertainties. Plots piecewise linear regression with bootstrap confidence intervals. Breakpoints in linear regression are defined by the uncertainty quantiles of the data.

Parameters:
  • errors (np.ndarray) – Array of errors.

  • uncertainties (np.ndarray) – Array of uncertainties.

  • quantile_thresholds (List[float]) – List of quantile thresholds.

  • cmaps (List[str]) – List of colormap names.

  • to_log (bool, optional) – Whether to apply logarithmic transformation on axes. Defaults to False.

  • error_scaling_factor (float, optional) – Scaling factor for error. Defaults to 1.0.

  • save_path (Optional[str], optional) – Path to save the plot, if None, the plot will be shown. Defaults to None.

Returns:

Dictionary containing Spearman and Pearson correlation coefficients and p-values.

Return type:

Dict[str, Tuple[float, float]]

kale.interpret.uncertainty_quantiles.quantile_binning_and_est_errors(errors: List[float], uncertainties: List[float], num_bins: int, type: str = 'quantile', acceptable_thresh: float = 5, combine_middle_bins: bool = False) Tuple[List[List[float]], List[float]]

Calculate quantile thresholds, and isotonically regress errors and uncertainties and get estimated error bounds.

Parameters:
  • errors (List[float]) – List of errors.

  • uncertainties (List[float]) – List of uncertainties.

  • num_bins (int) – Number of quantile bins.

  • type (str, optional) – Type of thresholds to calculate, “quantile” recommended. Defaults to “quantile”.

  • acceptable_thresh (float, optional) – Acceptable error threshold. Only relevant if type=”error-wise”. Defaults to 5.

  • combine_middle_bins (bool, optional) – Whether to combine middle bins. Defaults to False.

Returns:

List of quantile thresholds and

estimated error bounds.

Return type:

Tuple[List[List[float]], List[float]]

kale.interpret.uncertainty_quantiles.generic_box_plot_loop(cmaps: List[str], target_uncert_dicts: Dict[str, List[List[float]]], uncertainty_types_list: List[List[str]], models: List[str], x_axis_labels: List[str], x_label: str, y_label: str, num_bins: int, list_comp_bool: bool, width: float, y_lim_min: float, font_size_1: int, font_size_2: int, show_sample_info: str = 'None', save_path: str | None = None, y_lim: int = 120, convert_to_percent: bool = True, to_log: bool = False, show_individual_dots: bool = True) None

This function generates box plots for multiple types of data coming from various models. It is highly customizable and can handle different specifications for plot attributes.

Customizations include:

  1. Color specification: User can provide a list of color specifications for each box plot using cmaps parameter.

  2. Axis labels: The x and y axis labels can be customized using x_label and y_label parameters.

  3. Box width: The width of each box plot can be adjusted using width parameter.

  4. Font sizes: Two different font sizes can be used in the plot, adjustable by font_size_1 and font_size_2.

  5. Limits of y-axis: The upper and lower limits of the y-axis can be set using y_lim and y_lim_min parameters.

  6. Logarithmic scale: If to_log is set to True, the y-axis will be in logarithmic scale.

  7. Display of individual data points: The user can choose to display individual data points in each box plot by setting show_individual_dots to True.

  8. Data transformation: The data can be transformed to percentages using convert_to_percent parameter.

  9. Display of sample information: The user can choose to display information about the number of samples in each box plot by setting show_sample_info to “None”, “All”, or “Average”.

The function creates box plots for each combination of model and uncertainty type. It can save the resulting plot to a specified location.

Parameters:
  • cmaps (List[str]) – Colors for the box plots.

  • target_uncert_dicts (Dict[str, List[List[float]]]) – Dictionary with lists of [error, uncertainty values] for all targets and corresponding data.

  • uncertainty_types_list (List[List[str]]) – List of lists containing uncertainty types.

  • models (List[str]) – List of models for which box plots are being made.

  • x_axis_labels (List[str]) – Labels for the x-axis.

  • x_label (str) – The label for the x-axis.

  • y_label (str) – The label for the y-axis.

  • num_bins (int) – The number of bins to be used for the box plot.

  • list_comp_bool (bool) – Flag to determine if list comprehension should be used.

  • width (float) – The width of the boxes in the box plot.

  • y_lim_min (float) – The minimum limit for the y-axis.

  • font_size_1 (int) – Font size for the first element.

  • font_size_2 (int) – Font size for the second element.

  • show_sample_info (str) – Information about the samples to be displayed. Default is “None”.

  • save_path (Optional[str]) – The path where the plot will be saved. If None, the plot won’t be saved. Default is None.

  • y_lim (int) – The maximum limit for the y-axis. Default is 120.

  • convert_to_percent (bool) – Flag to determine if data should be converted to percentages. Default is True.

  • to_log (bool) – Flag to determine if a logarithmic scale should be used. Default is False.

  • show_individual_dots (bool) – Flag to determine if individual data points should be shown. Default is True.

Returns:

None. The function displays and/or saves a plot.

kale.interpret.uncertainty_quantiles.format_plot(ax, save_path: str | None, show_sample_info: str, to_log: bool, circ_patches: List[Patch], y_lim: float, y_lim_min: float, convert_to_percent: bool, x_label: str, y_label: str, font_size_1: int, font_size_2: int, bin_label_locs: List[float], x_axis_labels: List[str], num_bins: int, uncertainty_types_list: List[List[str]], all_sample_percs: List[List[float]], all_sample_label_x_locs: List[List[Any]], max_bin_height: float, comparing_q: bool = False) None

This function takes a matplotlib Axes object and formats the plot according to the provided parameters.

Parameters:
  • ax – A matplotlib axes object to be formatted.

  • save_path – The path where the plot should be saved. If None, the plot will be shown using plt.show().

  • show_sample_info – Determines how sample information is displayed. Can be “None”, “Average”, or “All”.

  • to_log – If True, sets the y-axis to log scale.

  • circ_patches – List of matplotlib patches to be added to the legend.

  • y_lim – The upper limit for the y-axis.

  • y_lim_min – The lower limit for the y-axis.

  • convert_to_percent – If True, converts y-axis values to percentages.

  • x_label – The label for the x-axis.

  • y_label – The label for the y-axis.

  • font_size_1 – The font size for the axis labels.

  • font_size_2 – The font size for the tick labels.

  • bin_label_locs – The x-axis locations of the bin labels.

  • x_axis_labels – The labels for the x-axis.

  • num_bins – The number of bins.

  • uncertainty_types_list – The list of uncertainty types.

  • all_sample_percs – The percentage of samples for each bin.

  • all_sample_label_x_locs – The x-axis locations of the sample percentage labels.

  • max_bin_height – The maximum height of a bin in the plot.

  • comparing_q – If True, it uses a ticker.FixedFormatter for the x-axis.

Returns:

None

kale.interpret.uncertainty_quantiles.box_plot_per_model(cmaps: List[str], target_uncert_dicts: Dict[str, List[List[float]]], uncertainty_types_list: List[List[str]], models: List[str], x_axis_labels: List[str], x_label: str, y_label: str, num_bins: int, show_sample_info: str = 'None', save_path: str | None = None, y_lim: int = 120, convert_to_percent: bool = True, to_log: bool = False, show_individual_dots: bool = True) None

Generates a box plot to visualize and compare the performance of different models across uncertainty bins.

This function creates a box plot for each model, grouped by uncertainty types, and displays the distribution of data within each bin. Individual data points can be shown as dots and additional information such as the percentage of samples per bin can be displayed on top of the box plots.

Parameters:
  • cmaps (List[str]) – List of colors for matplotlib.

  • target_uncert_dicts (Dict[str, List[List[float]]]) – Dict of pandas dataframes for the data to display.

  • uncertainty_types_list (List[List[str]]) – List of lists describing the different uncertainty combinations to test.

  • models (List[str]) – The models we want to compare, keys in target_uncert_dicts.

  • x_axis_labels (List[str]) – List of strings for the x-axis labels, one for each bin.

  • x_label (str) – x-axis label.

  • y_label (str) – y-axis label.

  • num_bins (int) – Number of uncertainty bins.

  • show_sample_info (str) – Show sample information. Options: “None”, “All”, “Average”. Default is “None”.

  • save_path (Optional[str]) – Path to save plot to. If None, displays on screen (default=None).

  • y_lim (int) – y-axis limit of graph (default=120).

  • convert_to_percent (bool) – Flag to turn data into percentages. Default is True.

  • to_log (bool) – Flag to set y-axis scale to log. Default is False.

  • show_individual_dots (bool) – Flag to show individual data points as dots. Default is True.

kale.interpret.uncertainty_quantiles.box_plot_comparing_q(target_uncert_dicts_list: List[Dict[str, List[List[float]]]], uncertainty_type_tuple: List, model: List[str], x_axis_labels: List[str], x_label: str, y_label: str, num_bins_display: int, hatch_type: str, color: str, show_sample_info: str = 'None', save_path: str | None = None, y_lim: int = 120, convert_to_percent: bool = True, to_log: bool = False, show_individual_dots: bool = True) None

Creates a box plot of data, using Q (# Bins) on the x-axis. Only compares 1 model & 1 uncertainty type using Q on the x-axis.

Parameters:
  • target_uncert_dicts_list (List[Dict[str, List[List[float]]]]) – List of Dict of pandas dataframe for the data to dsiplay, 1 for each value for Q.

  • uncertainty_type_tuple (Tuple[str, str]) – Tuple describing the single uncertainty/error type to display.

  • model (Tuple[str, str]) – The model we are comparing over our values of Q.

  • x_axis_labels (List[str]) – List of strings for the x-axis labels, one for each bin.

  • x_label (str) – X-axis label.

  • y_label (str) – Y-axis label.

  • num_bins_display (List[int]) – List of values of Q (#bins) we are comparing on our x-axis.

  • hatch_type (str) – Hatch type for the box plot.

  • color (str) – color for the box plot.

  • show_sample_info (str, optional) – Whether or not to show sample info on the plot. Options are “None”, “All”, or “Average”. Defaults to “None”.

  • save_path (str, optional) – Path to save plot to. If None, displays on screen. Defaults to None.

  • y_lim (int, optional) – Y-axis limit of graph. Defaults to 120.

  • convert_to_percent (bool, optional) – Whether to turn data to percentages. Defaults to True.

  • to_log (bool, optional) – Whether to set the y-axis to logarithmic scale. Defaults to False.

  • show_individual_dots (bool, optional) – Whether to show individual data points. Defaults to True.

kale.interpret.uncertainty_quantiles.plot_cumulative(cmaps: List[str], data_struct: Dict[str, DataFrame], models: List[str], uncertainty_types: List[Tuple[str, str]], bins: List[int] | ndarray, title: str, compare_to_all: bool = False, save_path: str | None = None, error_scaling_factor: float = 1) None

Plots cumulative errors.

Parameters:
  • cmaps – A list of colors for matplotlib.

  • data_struct – A dictionary containing the dataframes for each model.

  • models – A list of models we want to compare, keys in data_struct.

  • uncertainty_types – A list of lists describing the different uncertainty combinations to test.

  • bins – A list of bins to show error form.

  • title – The title of the plot.

  • compare_to_all – Whether to compare the given subset of bins to all the data (default=False).

  • save_path – The path to save plot to. If None, displays on screen (default=None).

  • error_scaling_factor (float, optional) – Scaling factor for error. Defaults to 1.0.

kale.interpret.uncertainty_quantiles.generate_fig_individual_bin_comparison(data: Tuple, display_settings: dict) None

Generate figures to compare localization errors, error bounds accuracy, and Jaccard index across uncertainty bins.

Parameters:
  • data – A tuple containing various inputs needed to generate the figures, including: - uncertainty_error_pairs (List[Tuple[int, float]]): A list of tuples specifying the uncertainty thresholds and corresponding error thresholds to use for binning the data. - models_to_compare (List[str]): A list of model names to compare. - dataset (str): The name of the dataset being used. - target_indices (List[int]): A list of target indices to include in the analysis. - num_bins (int): The number of uncertainty bins to use. - cmaps (List[str]): A list of colormap names to use for the figures. - save_folder (str): The directory in which to save the generated figures. - save_file_preamble (str): A string to use as the prefix for the filenames of the generated figures. - combine_middle_bins (bool): Whether to combine the middle bins or not. - save_figures_bool (bool): Whether to save the generated figures or not. If False, displays instead - confidence_invert (bool): Whether to invert the confidence values to uncertainty or not. - samples_as_dots_bool (bool): Whether to show individual samples as dots in the box plots or not. - show_sample_info_mode (str): The mode for showing sample information in the box plots. - box_plot_error_lim (float): The y-axis limit for the error box plots. - show_individual_target_plots (bool): Whether to generate separate plots for each individual target. - interpret (bool): Whether to perform interpretation analysis i.e. visualization. - num_folds (int): The number of folds to use in cross-validation. - ind_targets_to_show (List[int]): A list of target indices to include in individual target plots. - error_scaling_factor (float, optional): Scaling factor for error. Defaults to 1.0.

  • display_settings – A dictionary containing boolean flags indicating which figures to generate.

Returns:

None

kale.interpret.uncertainty_quantiles.generate_fig_comparing_bins(data: Tuple, display_settings: Dict[str, Any]) None

Generate figures comparing localization error, error bounds accuracy, and Jaccard index for different binning configurations.

Parameters:
  • data (Tuple) – A tuple containing various inputs needed to generate the figures. The tuple should include the following elements: - uncertainty_error_pair (Tuple[float, float]): A tuple representing the mean and standard deviation of the noise uncertainty used during training and evaluation. - model (str): The name of the model being evaluated. - dataset (str): The name of the dataset being used. - targets (List[int]): A list of target indices being evaluated. - all_values_q (List[int]): A list of integers representing the number of bins being used for each evaluation. - cmaps (List[str]): A list of colormap names to use for plotting. - all_fitted_save_paths (List[str]): A list of file paths where the binned data is stored. - save_folder (str): The directory where the figures should be saved. - save_file_preamble (str): The prefix to use for all figure file names. - combine_middle_bins (bool): Whether to combine the middle bins or not. - save_figures_bool (bool): Whether to save the generated figures or not. If false, shows instead. - samples_as_dots_bool (bool): Whether to show individual samples as dots in the box plots or not. - show_sample_info_mode (str): The mode for showing sample information in the box plots. - box_plot_error_lim (float): The y-axis limit for the error box plots. - show_individual_target_plots (bool): Whether to generate individual plots for each target. - interpret (bool): Whether the results are being interpreted. - num_folds (int): The number of cross-validation folds to use. - ind_targets_to_show (List[int]): A list of target indices to show in individual plots. - error_scaling_factor (float, optional): Scaling factor for error. Defaults to 1.0.

  • display_settings – Dictionary containing the following keys: - ‘hatch’: String representing the type of hatch pattern to use in the plots. - ‘color’: String representing the color to use for the plots.

Returns:

None.

kale.interpret.visualize module

kale.interpret.visualize.plot_weights(weight_img, background_img=None, color_marker_pos='rs', color_marker_neg='gs', im_kwargs=None, marker_kwargs=None)

Visualize model weights

Parameters:
  • weight_img (array-like) – Model weight/coefficients in 2D, could be a 2D slice of a 3D or higher order tensor.

  • background_img (array-like, optional) – 2D background image. Defaults to None.

  • color_marker_pos (str, optional) – Color and marker for weights in positive values. Defaults to red “rs”.

  • color_marker_neg (str, optional) – Color and marker for weights in negative values. Defaults to blue “gs”.

  • im_kwargs (dict, optional) – Keyword arguments for background images. Defaults to None.

  • marker_kwargs (dict, optional) – Keyword arguments for background images. Defaults to None.

Returns:

Figure to plot.

Return type:

[matplotlib.figure.Figure]

kale.interpret.visualize.plot_multi_images(images, n_cols=1, n_rows=None, marker_locs=None, image_titles=None, marker_titles=None, marker_cmap=None, figsize=None, im_kwargs=None, marker_kwargs=None, legend_kwargs=None, title_kwargs=None)

Plot multiple images with markers in one figure.

Parameters:
  • images (array-like) – Images to plot, shape(n_samples, dim1, dim2)

  • n_cols (int, optional) – Number of columns for plotting multiple images. Defaults to 1.

  • n_rows (int, optional) – Number of rows for plotting multiple images. If None, n_rows = n_samples / n_cols.

  • marker_locs (array-like, optional) – Locations of markers, shape (n_samples, 2 * n_markers). Defaults to None.

  • marker_titles (list, optional) – Names of the markers, where len(marker_names) == n_markers. Defaults to None.

  • marker_cmap (str, optional) – Name of the color map used for plotting markers. Default to None.

  • image_titles (list, optional) – List of title for each image, where len(image_names) == n_samples. Defaults to None.

  • figsize (tuple, optional) – Figure size. Defaults to None.

  • im_kwargs (dict, optional) – Keyword arguments for plotting images. Defaults to None.

  • marker_kwargs (dict, optional) – Keyword arguments for markers. Defaults to None.

  • legend_kwargs (dict, optional) – Keyword arguments for legend. Defaults to None.

  • title_kwargs (dict, optional) – Keyword arguments for title. Defaults to None.

Returns:

Figure to plot.

Return type:

[matplotlib.figure.Figure]

kale.interpret.visualize.distplot_1d(data, labels=None, xlabel=None, ylabel=None, title=None, figsize=None, colors=None, title_kwargs=None, hist_kwargs=None)

Plot distribution of 1D data.

Parameters:
  • data (array-like or list) – Data to plot.

  • labels (list, optional) – List of labels for each data. Defaults to None.

  • xlabel (str, optional) – Label for x-axis. Defaults to None.

  • ylabel (str, optional) – Label for y-axis. Defaults to None.

  • title (str, optional) – Title of the plot. Defaults to None.

  • figsize (tuple, optional) – Figure size. Defaults to None.

  • colors (str, optional) – Color of the line. Defaults to None.

  • title_kwargs (dict, optional) – Keyword arguments for title. Defaults to None.

  • hist_kwargs (dict, optional) – Keyword arguments for histogram. Defaults to None.

Returns:

Figure to plot.

Return type:

[matplotlib.figure.Figure]

Module contents