candle.P1_utils.coxen_single_drug_gene_selection

candle.P1_utils.coxen_single_drug_gene_selection#

candle.P1_utils.coxen_single_drug_gene_selection(source_data, target_data, drug_response_data, drug_response_col, tumor_col, prediction_power_measure='pearson', num_predictive_gene=100, generalization_power_measure='ccc', num_generalizable_gene=50, multi_drug_mode=False)#

This function selects genes for drug response prediction using the COXEN approach. The COXEN approach is designed for selecting genes to predict the response of tumor cells to a specific drug. This function assumes no missing data exist.

Parameters:
  • source_data – pandas data frame of gene expressions of tumors, for which drug response is known. Its size is [n_source_samples, n_features].

  • target_data – pandas data frame of gene expressions of tumors, for which drug response needs to be predicted. Its size is [n_target_samples, n_features]. source_data and target_data have the same set of features and the orders of features must match.

  • drug_response_data – pandas data frame of drug response values for a drug. It must include a column of drug response values and a column of tumor IDs.

  • drug_response_col – non-negative integer or string. If integer, it is the column index of drug response in drug_response_data. If string, it is the column name of drug response.

  • tumor_col – non-negative integer or string. If integer, it is the column index of tumor IDs in drug_response_data. If string, it is the column name of tumor IDs.

  • prediction_power_measure (string) – ‘pearson’ uses the absolute value of Pearson correlation coefficient to measure prediction power of gene; ‘mutual_info’ uses the mutual information to measure prediction power of gene. Default is ‘pearson’.

  • num_predictive_gene (int) – the number of predictive genes to be selected.

  • generalization_power_measure (string) – ‘pearson’ indicates the Pearson correlation coefficient; ‘ccc’ indicates the concordance correlation coefficient. Default is ‘ccc’.

  • num_generalizable_gene (int) – the number of generalizable genes to be selected. :param bool multi_drug_mode: indicating whether the function runs as an auxiliary function of COXEN gene selection for multiple drugs. Default is False.

Returns:

1-D numpy array containing the indices of selected genes, if multi_drug_mode is False; 1-D numpy array of indices of sorting all genes according to their prediction power, if multi_drug_mode is True.