Skip to content

Datasets


The Dataset class is used to provide utilities for data management, such as adding training samples, encoding and decoding data points for the training, serialization and restoration to continue training and so on.

A Dataset is an internal component of the algorithm being used, and will often not be interacted with directly. It provides a convenient interface to the HyperParameterList that is used to wrap the hyper parameters, and its serialization / recovery for multiple training runs.

Class Information


[source]

Dataset

pyshac.config.data.Dataset(parameter_list=None, basedir='shac')

Dataset manager for the engines.

Holds the samples and their associated evaluated values in a format that can be serialized / restored as well as encoder / decoded for training.

Arguments:

  • parameter_list (hp.HyperParameterList | list | None): A python list of Hyper Parameters, or a HyperParameterList that has been built. Can also be None, if the parameters are to be assigned later.
  • basedir (str): The base directory where the data of the engine will be stored.

Dataset methods

add_sample

add_sample(parameters, value)

Adds a single row of data to the dataset. Each row contains the hyper parameter configuration as well as its associated evaluation measure.

Arguments:

  • parameters (list): A list of hyper parameters that have been sampled
  • value (float): The evaluation measure for the above sample.

clear

clear()

Removes all the data of the dataset.


decode_dataset

decode_dataset(X=None)

Decode the input samples such that discrete hyper parameters are mapped to their original values and continuous valued hyper paramters are left alone.

Arguments:

  • X (np.ndarray | None): The input list of encoded samples. Can be None, in which case it defaults to the internal samples, which are encoded and then decoded.

Returns:

np.ndarray


encode_dataset

encode_dataset(X=None, Y=None, objective='max')

Encode the entire dataset such that discrete hyper parameters are mapped to integer indices and continuous valued hyper paramters are left alone.

Arguments

  • X (list | np.ndarray | None): The input list of samples. Can be None, in which case it defaults to the internal samples.
  • Y (list | np.ndarray | None): The input list of evaluation measures. Can be None, in which case it defaults to the internal evaluation values.
  • objective (str): Whether to maximize or minimize the value of the labels.

Raises:

  • ValueError: If objective is not in [max, min]

Returns:

A tuple of numpy arrays (np.ndarray, np.ndarray)


get_best_parameters

get_best_parameters(objective='max')

Selects the best hyper parameters according to the maximization or minimization of the objective value.

Returns None if there are no samples in the dataset.

Arguments:

  • objective: String label indicating whether to maximize or minimize the objective value.

Raises:

  • ValueError: If the objective is not max or min.

Returns:

A list of hyperparameter settings or None if the dataset is empty.


get_dataset

get_dataset()

Gets the entire dataset as a numpy array.

Returns:

(np.ndarray, np.ndarray)


get_parameters

get_parameters()

Gets the hyper parameter list manager

Returns:

HyperParameterList


load_from_directory

load_from_directory(basedir='shac')

Static method to load the dataset from a directory.

Arguments:

  • basedir (str): The base directory where 'shac' directory is. It will build the path to the data and parameters itself.

Raises:

  • FileNotFoundError: If the directory does not contain the data and parameters.

prepare_parameter

prepare_parameter(sample)

Wraps a hyper parameter sample list with the name of the parameter in an OrderedDict.

Arguments:

  • sample (list): A list of sampled hyper parameters

Returns:

OrderedDict(str, int | float | str)


restore_dataset

restore_dataset()

Restores the entire dataset from a CSV file saved at the path provided by data_path. Also loads the parameters (list of hyperparameters).

Raises:

  • FileNotFoundError: If the dataset is not at the provided path.

save_dataset

save_dataset()

Serializes the entire dataset into a CSV file saved at the path provide by data_path. Also saves the parameters (list of hyper parameters).

Raises:

  • ValueError: If trying to save a dataset when its parameters have not been set.

set_dataset

set_dataset(X, Y)

Sets a numpy array as the dataset.

Arguments:

  • X (list | tuple | np.ndarray): A numpy array or python list/tuple that contains the samples of the dataset.
  • Y (list | tuple | np.ndarray): A numpy array or python list/tuple that contains the evaluations of the dataset.

set_parameters

set_parameters(parameters)

Sets the hyper parameter list manager

Arguments:

  • parameters (hp.HyperParameterList | list): a Hyper Parameter List or a python list of Hyper Parameters.

flatten_parameters

flatten_parameters(params)

Takes an OrderedDict or a list of lists, and flattens it into a list containing the items.

Arguments:

  • params (OrderedDict | list of lists): The parameters that were provided by the engine, either as an OrderedDict or a list of list representation.

Returns:

a flattened python list containing just the sampled values.