Datasets¶
The Dataset
class is used to provide utilities for data management, such as adding training samples,
encoding and decoding data points for the training, serialization and restoration to continue training and so on.
A Dataset
is an internal component of the algorithm being used, and will often not be interacted with directly.
It provides a convenient interface to the HyperParameterList
that is used to wrap the hyper parameters, and its
serialization / recovery for multiple training runs.
Class Information¶
Dataset¶
pyshac.config.data.Dataset(parameter_list=None, basedir='shac')
Dataset manager for the engines.
Holds the samples and their associated evaluated values in a format that can be serialized / restored as well as encoder / decoded for training.
Arguments:
- parameter_list (hp.HyperParameterList | list | None): A python list of Hyper Parameters, or a HyperParameterList that has been built. Can also be None, if the parameters are to be assigned later.
- basedir (str): The base directory where the data of the engine will be stored.
Dataset methods¶
add_sample¶
add_sample(parameters, value)
Adds a single row of data to the dataset. Each row contains the hyper parameter configuration as well as its associated evaluation measure.
Arguments:
- parameters (list): A list of hyper parameters that have been sampled
- value (float): The evaluation measure for the above sample.
clear¶
clear()
Removes all the data of the dataset.
decode_dataset¶
decode_dataset(X=None)
Decode the input samples such that discrete hyper parameters are mapped to their original values and continuous valued hyper paramters are left alone.
Arguments:
- X (np.ndarray | None): The input list of encoded samples. Can be None, in which case it defaults to the internal samples, which are encoded and then decoded.
Returns:
np.ndarray
encode_dataset¶
encode_dataset(X=None, Y=None, objective='max')
Encode the entire dataset such that discrete hyper parameters are mapped to integer indices and continuous valued hyper paramters are left alone.
Arguments
- X (list | np.ndarray | None): The input list of samples. Can be None, in which case it defaults to the internal samples.
- Y (list | np.ndarray | None): The input list of evaluation measures. Can be None, in which case it defaults to the internal evaluation values.
- objective (str): Whether to maximize or minimize the value of the labels.
Raises:
- ValueError: If
objective
is not in [max
,min
]
Returns:
A tuple of numpy arrays (np.ndarray, np.ndarray)
get_best_parameters¶
get_best_parameters(objective='max')
Selects the best hyper parameters according to the maximization or minimization of the objective value.
Returns None
if there are no samples in the dataset.
Arguments:
- objective: String label indicating whether to maximize or minimize the objective value.
Raises:
- ValueError: If the objective is not
max
ormin
.
Returns:
A list of hyperparameter settings or None
if the dataset is empty.
get_dataset¶
get_dataset()
Gets the entire dataset as a numpy array.
Returns:
(np.ndarray, np.ndarray)
get_parameters¶
get_parameters()
Gets the hyper parameter list manager
Returns:
HyperParameterList
load_from_directory¶
load_from_directory(basedir='shac')
Static method to load the dataset from a directory.
Arguments:
- basedir (str): The base directory where 'shac' directory is. It will build the path to the data and parameters itself.
Raises:
- FileNotFoundError: If the directory does not contain the data and parameters.
prepare_parameter¶
prepare_parameter(sample)
Wraps a hyper parameter sample list with the name of the parameter in an OrderedDict.
Arguments:
- sample (list): A list of sampled hyper parameters
Returns:
OrderedDict(str, int | float | str)
restore_dataset¶
restore_dataset()
Restores the entire dataset from a CSV file saved at the path provided by
data_path
. Also loads the parameters (list of hyperparameters).
Raises:
- FileNotFoundError: If the dataset is not at the provided path.
save_dataset¶
save_dataset()
Serializes the entire dataset into a CSV file saved at the path
provide by data_path
. Also saves the parameters (list of hyper parameters).
Raises:
- ValueError: If trying to save a dataset when its parameters have not been set.
set_dataset¶
set_dataset(X, Y)
Sets a numpy array as the dataset.
Arguments:
- X (list | tuple | np.ndarray): A numpy array or python list/tuple that contains the samples of the dataset.
- Y (list | tuple | np.ndarray): A numpy array or python list/tuple that contains the evaluations of the dataset.
set_parameters¶
set_parameters(parameters)
Sets the hyper parameter list manager
Arguments:
- parameters (hp.HyperParameterList | list): a Hyper Parameter List or a python list of Hyper Parameters.
flatten_parameters¶
flatten_parameters(params)
Takes an OrderedDict or a list of lists, and flattens it into a list containing the items.
Arguments:
- params (OrderedDict | list of lists): The parameters that were provided by the engine, either as an OrderedDict or a list of list representation.
Returns:
a flattened python list containing just the sampled values.