candle.data_utils.load_Xy_data_noheader

candle.data_utils.load_Xy_data_noheader#

candle.data_utils.load_Xy_data_noheader(train_file, test_file, classes, usecols=None, scaling=None, dtype=<class 'numpy.float32'>)#

Load training and testing data from the files specified, with the first column to use as label. Construct corresponding training and testing pandas DataFrames, separated into data (i.e. features) and labels. Labels to output are one-hot encoded (categorical). Columns to load can be selected. Data can be rescaled. Training and testing partitions (coming from the respective files) are preserved. This function assumes that the files do not contain a header.

Parameters:
  • train_file (string) – Name of the file to load the training data.

  • test_file (string) – Name of the file to load the testing data.

  • classes (int) – Number of total classes to consider when building the categorical (one-hot) label encoding.

  • usecols – List of column indices to load from the files. (Default: None, all the columns are used).

  • scaling (string) –

    String describing type of scaling to apply. Options recognized: ‘maxabs’, ‘minmax’, ‘std’.

    • maxabs: scales data to range [-1 to 1].

    • minmax: scales data to range [-1 to 1].

    • std : scales data to normal variable with mean 0 and standard deviation 1. (Default: None, no scaling).

  • dtype – Data type to use for the output pandas DataFrames. (Default: DEFAULT_DATATYPE defined in default_utils).

Returns:

Tuple of pandas DataFrames where

  • X_train - Data features for training loaded in a pandas DataFrame and pre-processed as specified.

  • Y_train - Data labels for training loaded in a pandas DataFrame. One-hot encoding (categorical) is used.

  • X_test - Data features for testing loaded in a pandas DataFrame and pre-processed as specified.

  • Y_test - Data labels for testing loaded in a pandas DataFrame. One-hot encoding (categorical) is used.