During cross-validation, the available data is split into k folds. More...

#include <CrossValidator.h>

Inheritance diagram for lucid::CrossValidator:

Public Types
using	SliceSelector = std::vector<std::vector<Index>>
	Type alias for slice selector.

Public Member Functions
double	fit (Estimator &estimator, ConstMatrixRef training_inputs, ConstMatrixRef training_outputs, const scorer::Scorer &scorer=nullptr) const
	Fit the estimator using cross-validation on the provided training_inputs and training_outputs.
double	fit (Estimator &estimator, ConstMatrixRef training_inputs, ConstMatrixRef training_outputs, const Tuner &tuner, const scorer::Scorer &scorer=nullptr) const
	Fit the estimator using cross-validation on the provided training_inputs and training_outputs using the tuner tuner to optimize @hp.
std::vector< double >	score (const Estimator &estimator, ConstMatrixRef inputs, ConstMatrixRef outputs, const scorer::Scorer &scorer=nullptr) const
	Evaluate the estimator using cross-validation on the provided inputs and outputs.
virtual Dimension	num_folds (ConstMatrixRef training_inputs) const =0
	Get the number of folds used in the cross-validation on the provided training_inputs.
virtual std::string	to_string () const
	Obtain the string representation of this object.

Protected Member Functions
double	fit (Estimator &estimator, ConstMatrixRef training_inputs, ConstMatrixRef training_outputs, const Tuner *tuner, const scorer::Scorer &scorer) const
	Fit the estimator using cross-validation on the provided training_inputs and training_outputs using the tuner tuner to optimize @hp, if provided.
virtual std::pair< SliceSelector, SliceSelector >	compute_folds (ConstMatrixRef training_inputs) const =0
	Compute the training and validation folds for cross-validation.

Detailed Description

During cross-validation, the available data is split into k folds.

The model is trained k times, each time using k-1 folds for training and the remaining fold for validation. The final model is the one that achieved the best validation score across all folds. This technique helps to mitigate overfitting and provides a more robust estimate of the model's performance.

Member Function Documentation

◆ compute_folds()

virtual std::pair< SliceSelector, SliceSelector > lucid::CrossValidator::compute_folds ( ConstMatrixRef training_inputs ) const

nodiscardprotectedpure virtual

Compute the training and validation folds for cross-validation.

Each fold is represented as a pair of index vectors, where the first vector contains the indices for the training set and the second vector contains the indices for the validation set.

Parameters

training_inputs \( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) training input data

Returns: pair of slice selectors for training and validation folds

Implemented in lucid::KFold, and lucid::LeaveOneOut.

◆ fit() [1/3]

double lucid::CrossValidator::fit	(	Estimator &	estimator,
		ConstMatrixRef	training_inputs,
		ConstMatrixRef	training_outputs,
		const scorer::Scorer &	scorer = nullptr ) const

Fit the estimator using cross-validation on the provided training_inputs and training_outputs.

The scorer is used to evaluate the performance of the model on the validation folds. If no scorer is provided, the estimator's default scoring method is used. At the end, the estimator is updated to the best model found during cross-validation.

Precondition: The number of samples \( n \) in training_inputs must be at least equal to the number of folds num_folds.

Parameters

[in,out]	estimator	estimator to fit. It will be updated to the best model found
	training_inputs	\( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) training input data
	training_outputs	\( \texttip{n}{Number of samples} \times \texttip{d_y}{Dimension of the output vector space} \) training output data
	scorer	scoring function to evaluate the model's performance

Returns: best score achieved during cross-validation

◆ fit() [2/3]

double lucid::CrossValidator::fit	(	Estimator &	estimator,
		ConstMatrixRef	training_inputs,
		ConstMatrixRef	training_outputs,
		const Tuner &	tuner,
		const scorer::Scorer &	scorer = nullptr ) const

Fit the estimator using cross-validation on the provided training_inputs and training_outputs using the tuner tuner to optimize @hp.

The scorer is used to evaluate the performance of the model on the validation folds. If no scorer is provided, the estimator's default scoring method is used. At the end, the estimator is updated to the best model found during cross-validation.

Precondition: The number of samples \( n \) in training_inputs must be at least equal to the number of folds num_folds.

Parameters

[in,out]	estimator	estimator to fit. It will be updated to the best model found
	training_inputs	\( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) training input data
	training_outputs	\( \texttip{n}{Number of samples} \times \texttip{d_y}{Dimension of the output vector space} \) training output data
	tuner	tuner to optimize @hp
	scorer	scoring function to evaluate the model's performance

Returns: best score achieved during cross-validation

◆ fit() [3/3]

double lucid::CrossValidator::fit	(	Estimator &	estimator,
		ConstMatrixRef	training_inputs,
		ConstMatrixRef	training_outputs,
		const Tuner *	tuner,
		const scorer::Scorer &	scorer ) const

protected

Fit the estimator using cross-validation on the provided training_inputs and training_outputs using the tuner tuner to optimize @hp, if provided.

The scorer is used to evaluate the performance of the model on the validation folds. If no scorer is provided, the estimator's default scoring method is used. At the end, the estimator is updated to the best model found during cross-validation.

Precondition: The number of samples \( n \) in training_inputs must be at least equal to the number of folds num_folds.

Parameters

[in,out]	estimator	estimator to fit. It will be updated to the best model found
	training_inputs	\( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) training input data
	training_outputs	\( \texttip{n}{Number of samples} \times \texttip{d_y}{Dimension of the output vector space} \) training output data
	scorer	scoring function to evaluate the model's performance

Returns: best score achieved during cross-validation

◆ num_folds()

virtual Dimension lucid::CrossValidator::num_folds ( ConstMatrixRef training_inputs ) const

nodiscardpure virtual

Get the number of folds used in the cross-validation on the provided training_inputs.

This is determined by the specific cross-validation strategy implemented in the derived class.

Parameters

training_inputs \( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) training input data

Returns: number of folds

Implemented in lucid::KFold, and lucid::LeaveOneOut.

◆ score()

std::vector< double > lucid::CrossValidator::score	(	const Estimator &	estimator,
		ConstMatrixRef	inputs,
		ConstMatrixRef	outputs,
		const scorer::Scorer &	scorer = nullptr ) const

nodiscard

Evaluate the estimator using cross-validation on the provided inputs and outputs.

No fitting or tuning is performed, hence the estimator will only be consolidated on each fold, but no @hp will be optimized. The scorer is used to evaluate the performance of the model on each fold. If no scorer is provided, the estimator's default scoring method is used. The result is a vector of scores, one for each fold.

Precondition: The number of samples \( n \) in training_inputs must be at least equal to the number of folds num_folds.

Parameters

estimator	estimator to evaluate
inputs	\( \texttip{n}{Number of samples} \times \texttip{d_x}{Dimension of the input vector space} \) input data
outputs	\( \texttip{n}{Number of samples} \times \texttip{d_y}{Dimension of the output vector space} \) output data
scorer	scoring function to evaluate the model's performance

Returns: vector of scores, one for each fold

◆ to_string()

std::string lucid::CrossValidator::to_string ( ) const

nodiscardvirtual

Obtain the string representation of this object.

Returns: string representation of this object

Reimplemented in lucid::KFold, and lucid::LeaveOneOut.

The documentation for this class was generated from the following files:

lucid/model/CrossValidator.h
lucid/model/CrossValidator.cpp

Public Types

Public Member Functions

Protected Member Functions

Detailed Description

Member Function Documentation

◆ compute_folds()

◆ fit() [1/3]

◆ fit() [2/3]

◆ fit() [3/3]

◆ num_folds()

◆ score()

◆ to_string()