candle.P1_utils.coxen_single_drug_gene_selection#
- candle.P1_utils.coxen_single_drug_gene_selection(source_data, target_data, drug_response_data, drug_response_col, tumor_col, prediction_power_measure='pearson', num_predictive_gene=100, generalization_power_measure='ccc', num_generalizable_gene=50, multi_drug_mode=False)#
This function selects genes for drug response prediction using the COXEN approach. The COXEN approach is designed for selecting genes to predict the response of tumor cells to a specific drug. This function assumes no missing data exist.
- Parameters:
source_data – pandas data frame of gene expressions of tumors, for which drug response is known. Its size is [n_source_samples, n_features].
target_data – pandas data frame of gene expressions of tumors, for which drug response needs to be predicted. Its size is [n_target_samples, n_features]. source_data and target_data have the same set of features and the orders of features must match.
drug_response_data – pandas data frame of drug response values for a drug. It must include a column of drug response values and a column of tumor IDs.
drug_response_col – non-negative integer or string. If integer, it is the column index of drug response in drug_response_data. If string, it is the column name of drug response.
tumor_col – non-negative integer or string. If integer, it is the column index of tumor IDs in drug_response_data. If string, it is the column name of tumor IDs.
prediction_power_measure (string) – ‘pearson’ uses the absolute value of Pearson correlation coefficient to measure prediction power of gene; ‘mutual_info’ uses the mutual information to measure prediction power of gene. Default is ‘pearson’.
num_predictive_gene (int) – the number of predictive genes to be selected.
generalization_power_measure (string) – ‘pearson’ indicates the Pearson correlation coefficient; ‘ccc’ indicates the concordance correlation coefficient. Default is ‘ccc’.
num_generalizable_gene (int) – the number of generalizable genes to be selected. :param bool multi_drug_mode: indicating whether the function runs as an auxiliary function of COXEN gene selection for multiple drugs. Default is False.
- Returns:
1-D numpy array containing the indices of selected genes, if multi_drug_mode is False; 1-D numpy array of indices of sorting all genes according to their prediction power, if multi_drug_mode is True.