Evaluate

Submodules

kale.evaluate.cross_validation module

Functions implementing cross-validation methods for assessing model fit.

kale.evaluate.cross_validation.leave_one_group_out(x, y, groups, estimator, use_domain_adaptation=False) dict

Perform leave one group out cross validation for a given estimator.

Parameters:
  • x (np.ndarray or torch.tensor) – Input data [n_samples, n_features].

  • y (np.ndarray or torch.tensor) – Target labels [n_samples].

  • groups (np.ndarray or torch.tensor) – Group labels to be left out [n_samples].

  • estimator (estimator object) – Machine learning estimator to be evaluated from kale or scikit-learn.

  • use_domain_adaptation (bool) – Whether to use domain adaptation, i.e., leveraging test data, during training.

Returns:

A dictionary containing results for each target group with 3 keys.
  • ’Target’: A list of unique target groups or classes. The final entry is “Average”.

  • ’Num_samples’: A list where each entry indicates the number of samples in its corresponding target group.

    The final entry represents the total number of samples.

  • ’Accuracy’: A list where each entry indicates the accuracy score for its corresponding target group.

    The final entry represents the overall mean accuracy.

Return type:

dict

kale.evaluate.metrics module

kale.evaluate.metrics.concord_index(y, y_pred)

Calculate the Concordance Index (CI), which is a metric to measure the proportion of concordant pairs between real and predict values.

Parameters:
  • y (array) – real values.

  • y_pred (array) – predicted values.

kale.evaluate.metrics.auprc_auroc_ap(target: Tensor, score: Tensor)

auprc: area under the precision-recall curve auroc: area under the receiver operating characteristic curve ap: average precision

Copy-paste from https://github.com/NYXFLOWER/GripNet

kale.evaluate.similarity_metrics

Authors: Lawrence Schobs, lawrenceschobs@gmail.com

Functions related to similarity metrics including similarity measures and correlations.

kale.evaluate.similarity_metrics.jaccard_similarity(list1: list, list2: list) float

Calculates the Jaccard Index (JI) between two lists.

Parameters:
  • list1 (list) – List of elements in set A.

  • list2 (list) – List of elements in set B.

Returns:

The Jaccard Index between list1 and list2.

Return type:

float

Example

>>> jaccard_similarity([1,2,3], [2,3,4])
0.5
kale.evaluate.similarity_metrics.evaluate_correlations(bin_predictions: Dict[str, DataFrame], uncertainty_error_pairs: List[Tuple[str, str, str]], cmaps: List[Dict[Any, Any]], num_bins: int, confidence_invert_tuples: List[Tuple[str, bool]], num_folds: int = 8, error_scaling_factor: float = 1, combine_middle_bins: bool = False, save_path: str | None = None, to_log: bool = False) Dict[str, Dict[str, Dict[str, Any]]]

Calculates the correlation between error and uncertainty for each bin and for each target, using a piece-wise linear regression model.

Designed for use in Quantile Binning (/pykale/examples/landmark_uncertainty/main.py).

Parameters:
  • bin_predictions – A dictionary of Pandas DataFrames containing model predictions for each testing fold.

  • uncertainty_error_pairs – A list of tuples specifying the names of the uncertainty, error, and uncertainty inversion keys for each pair.

  • cmaps – A dictionary of colour maps to use for plotting the results.

  • num_bins – The number of quantile bins to divide the data into.

  • confidence_invert_tuples – A list of tuples specifying whether to invert the uncertainty values for each method. First element is a string specifying the uncertainty method name and the second element is a boolean whether to invert e.g. [[“E-MHA”, True], [“E-CPV”, False]]

  • num_folds – The number of folds to use for cross-validation (default: 8).

  • error_scaling_factor – The scale factor to transform error by (default: 1).

  • combine_middle_bins – Whether to combine the middle bins into one bin (default: False).

  • save_path – The path to save the correlation plots (default: None).

  • to_log – Whether to use logarithmic scaling for the x and y axes of the plots (default: False).

Returns:

A dictionary containing the correlation statistics for each model and uncertainty method. The dictionary has the following structure: {

<model_name>: {
<uncertainty_name>: {
“all_folds”: {

“r”: <correlation coefficient>, “p”: <p-value>, “fit_params”: <regression line parameters>, “ci”: <confidence intervals for the regression line parameters>

}, “quantiles”: {

<quantile_index>: {

“r”: <correlation coefficient>, “p”: <p-value>, “fit_params”: <regression line parameters>, “ci”: <confidence intervals for the regression line parameters>

}

}

}

}

} The “all_folds” key contains the correlation statistics for all testing folds combined. The “quantiles” key contains the correlation statistics for each quantile bin separately.

kale.evaluate.uncertainty_metrics

Authors: Lawrence Schobs, lawrenceschobs@gmail.com Module from the implementation of L. A. Schobs, A. J. Swift and H. Lu, “Uncertainty Estimation for Heatmap-Based Landmark Localization,” in IEEE Transactions on Medical Imaging, vol. 42, no. 4, pp. 1021-1034, April 2023, doi: 10.1109/TMI.2022.3222730.

Functions related to evaluating the quantile binning method in terms of:
  1. Binning accuracy to ground truth bins: evaluate_jaccard, bin_wise_jaccard.

  2. Binning error bound accuracy: evaluate_bounds, bin_wise_bound_eval

  3. Binning attributes such as mean errors of bins (get_mean_errors, bin_wise_errors).

kale.evaluate.uncertainty_metrics.evaluate_bounds(estimated_bounds: Dict[str, DataFrame], bin_predictions: Dict[str, DataFrame], uncertainty_pairs: List, num_bins: int, targets: List[int], num_folds: int = 8, show_fig: bool = False, combine_middle_bins: bool = False) Dict

Evaluates error bounds for given uncertainty pairs and estimated bounds.

Parameters:
  • estimated_bounds (Dict[str, pd.DataFrame]) – Dictionary of error bounds for each model.

  • bin_predictions (Dict[str, pd.DataFrame]) – Dictionary of bin predictions for each model.

  • uncertainty_pairs (List[List[str]]) – List of uncertainty pairs to be evaluated.

  • num_bins (int) – Number of bins to be used.

  • targets (List[str]) – List of targets to be evaluated.

  • num_folds (int, optional) – Number of folds for cross-validation. Defaults to 8.

  • show_fig (bool, optional) – Flag to show the figure. Defaults to False.

  • combine_middle_bins (bool, optional) – Flag to combine the middle bins. Defaults to False.

Returns:

Dictionary containing evaluation results.

Return type:

Dict

kale.evaluate.uncertainty_metrics.bin_wise_bound_eval(fold_bounds_all_targets: list, fold_errors: DataFrame, fold_bins: DataFrame, targets: list, uncertainty_type: str, num_bins: int = 5, show_fig: bool = False) dict

Helper function for evaluate_bounds. Evaluates the accuracy of estimated error bounds for each quantile bin for a given uncertainty type, over a single fold and for multiple targets.

Parameters:
  • fold_bounds_all_targets (list) – A list of lists of estimated error bounds for each target.

  • fold_errors (pd.DataFrame) – A Pandas DataFrame containing the true errors for this fold.

  • fold_bins (pd.DataFrame) – A Pandas DataFrame containing the predicted quantile bins for this fold.

  • targets (list) – A list of targets to measure uncertainty estimation.

  • uncertainty_type (str) – The name of the uncertainty type to calculate accuracy for.

  • num_bins (int) – The number of quantile bins.

  • show_fig (bool) – Whether to show a figure depicting error bound accuracy (default=False).

Returns:

A dictionary containing the following error bound accuracy statistics:
  • ’mean all targets’: The mean accuracy over all targets and quantile bins.

  • ’mean all bins’: A list of mean accuracy values for each quantile bin (all targets included).

  • ’mean all’: A list of accuracy values for each quantile bin and target, weighted by # targets in each bin.

  • ’all bins concatenated targets separated’: A list of accuracy values for each quantile bin, concatenated

for each target separately.

Return type:

dict

Example

>>> bin_wise_bound_eval(fold_bounds_all_targets, fold_errors, fold_bins, [0,1], 'S-MHA', num_bins=5)
kale.evaluate.uncertainty_metrics.get_mean_errors(bin_predictions: Dict[str, DataFrame], uncertainty_pairs: List, num_bins: int, targets: List[int], num_folds: int = 8, error_scaling_factor: float = 1.0, combine_middle_bins: bool = False) Dict

Evaluate uncertainty estimation’s mean error of each bin. For each bin, we calculate the mean localization error for each target and for all targets. We calculate the mean error for each dictionary in the bin_predictions dict. For each bin, we calculate: a) the mean and std over all folds and all targets b) the mean and std for each target over all folds.

Parameters:
  • bin_predictions (Dict) – Dict of Pandas DataFrames where each DataFrame has errors, predicted bins for all

  • model. (uncertainty measures for a) –

  • uncertainty_pairs (List[Tuple[str, str]]) – List of tuples describing the different uncertainty combinations to test.

  • num_bins (int) – Number of quantile bins.

  • targets (List[str]) – List of targets to measure uncertainty estimation.

  • num_folds (int, optional) – Number of folds. Defaults to 8.

  • error_scaling_factor (int, optional) – Scale error factor. Defaults to 1.

  • combine_middle_bins (bool, optional) – Combine middle bins if True. Defaults to False.

Returns:

Dictionary with mean error for all
targets combined and targets separated.
Keys that are returned:

”all mean error bins nosep”: For every fold, the mean error for each bin. All targets are combined in the same list. “all mean error bins targets sep”: For every fold, the mean error for each bin. Each target is in a separate list. “all error concat bins targets nosep”: For every fold, every error value in a list. Each target is in the same list. The list is flattened for all the folds. “all error concat bins targets sep foldwise”: For every fold, every error value in a list. Each target is in a separate list. Each list has a list of results by fold. “all error concat bins targets sep all”: For every fold, every error value in a list. Each target is in a separate list. The list is flattened for all the folds.

Return type:

Dict[str, Union[Dict[str, List[List[float]]], List[Dict[str, List[float]]]]]

kale.evaluate.uncertainty_metrics.evaluate_jaccard(bin_predictions, uncertainty_pairs, num_bins, targets, num_folds=8, combine_middle_bins=False)

Evaluate uncertainty estimation’s ability to predict true error quantiles. For each bin, we calculate the jaccard index (JI) between the pred bins and GT error quantiles. We calculate the JI for each dictionary in the bin_predictions dict. For each bin, we calculate: a) the mean and std over all folds and all targets b) the mean and std for each target over all folds.

Parameters:
  • bin_predictions (Dict) – dict of Pandas Dataframes where each dataframe has errors, predicted bins for all uncertainty measures for a model,

  • uncertainty_pairs ([list]) – list of lists describing the different uncert combinations to test,

  • num_bins (int) – Number of quantile bins,

  • targets (list) –

  • num_folds (int) – Number of folds,

Returns:

Dicts with JI for all targets combined and targets seperated.

Return type:

[Dict]

kale.evaluate.uncertainty_metrics.bin_wise_errors(fold_errors, fold_bins, num_bins, targets, uncertainty_key, error_scaling_factor)

Helper function for get_mean_errors. Calculates the mean error for each bin and for each target.

Parameters:
  • fold_errors (Pandas Dataframe) – Pandas Dataframe of errors for this fold.

  • fold_bins (Pandas Dataframe) – Pandas Dataframe of predicted quantile bins for this fold.

  • num_bins (int) – Number of quantile bins,

  • targets (list) –

  • uncertainty_key (string) – Name of uncertainty type to calculate accuracy for,

Returns:

Dict with mean error statistics.

Return type:

[Dict]

kale.evaluate.uncertainty_metrics.bin_wise_jaccard(fold_errors: DataFrame, fold_bins: DataFrame, num_bins: int, num_bins_quantiles: int, targets: list, uncertainty_key: str, combine_middle_bins: bool) dict

Helper function for evaluate_jaccard. Calculates the Jaccard Index statistics for each quantile bin and target.

If combine_middle_bins is True, then the middle bins are combined into one bin. e.g. if num_bins_quantiles = 10, it will return 3 bins: 1, 2-9, 10. You may find the first bin and the last bin are the most accurate, so combining the middle bins may be useful.

Parameters:
  • fold_errors (Pandas Dataframe) – Pandas Dataframe of errors for this fold.

  • fold_bins (Pandas Dataframe) – Pandas Dataframe of predicted quantile bins for this fold.

  • num_bins (int) – Number of quantile bins,

  • targets (list) –

  • uncertainty_key (string) – Name of uncertainty type to calculate accuracy for,

Returns:

Dict with JI statistics.

Return type:

[Dict]

Raises:

None.

Example

>>> bin_wise_jaccard(fold_errors, fold_bins, 10, 5, [0,1], 'S-MHA', True)

Module contents