candle.feature_selection_utils.select_decorrelated_features

candle.feature_selection_utils.select_decorrelated_features#

candle.feature_selection_utils.select_decorrelated_features(data, method='pearson', threshold=None, random_seed=None)#

This function selects features whose mutual absolute correlation coefficients are smaller than a threshold. It allows missing values in data. The correlation coefficient of two features are calculated based on the observations that are not missing in both features. Features with only one or no value present and features with a zero standard deviation are not considered for selection.

Parameters:
  • data – numpy array or pandas data frame of numeric values, with a shape of [n_samples, n_features].

  • method (string) –

    indicating the method used for calculating correlation coefficient. Default is ‘pearson’.

    • pearson: Pearson correlation coefficient

    • kendall: Kendall Tau correlation coefficient

    • spearman: Spearman rank correlation coefficient

  • threshold (float) – If two features have an absolute correlation coefficient higher than threshold, one of the features is removed. If threshold is None, a feature is removed only when the two features are exactly identical. Default is None.

  • random_seed (int) – seed of random generator for ordering the features. If it is None, features are not re-ordered before feature selection and thus the first feature is always selected. Default is None.

Returns:

1-D numpy array containing the indices of selected features.