probnmn.evaluators._evaluator

class probnmn.evaluators._evaluator._Evaluator(config: probnmn.config.Config, dataloader: torch.utils.data.dataloader.DataLoader, models: Dict[str, Type[torch.nn.modules.module.Module]], gpu_ids: List[int] = [0])[source]

Bases: object

A base class for generic evaluation of models. This class can have multiple models interacting with each other, rather than a single model, which is suitable to our use-case (for example, module_training phase has two models: ProgramGenerator and NeuralModuleNetwork). It offers full flexibility, with sensible defaults which may be changed (or disabled) while extending this class.

Extend this class and override _do_iteration() method, with core evaluation loop - what happens every iteration, given a batch from the dataloader this class holds.

Parameters
config: Config

A Config object with all the relevant configuration parameters.

dataloader: torch.utils.data.DataLoader

A DataLoader which provides batches of evaluation examples. It wraps one of probnmn.data.datasets depending on the evaluation phase.

models: Dict[str, Type[nn.Module]]

All the models which interact with each other for evaluation. These are one or more from probnmn.models depending on the evaluation phase.

gpu_ids: List[int], optional (default=[0])

List of GPU IDs to use or evaluation, [-1] - use CPU.

Notes

  1. All models are passed by assignment, so they could be shared with an external trainer. Do not set self._models = ... anywhere while extending this class.

  2. An instantiation of this class will always be paired in conjunction to a _Trainer. Pass the models of trainer class while instantiating this class.

evaluate(self, num_batches:Union[int, NoneType]=None) → Dict[str, Any][source]

Perform evaluation using first num_batches of dataloader and return all evaluation metrics from the models.

Parameters
num_batches: int, optional (default=None)

Number of batches to use from dataloader. If None, use all batches.

Returns
Dict[str, Any]

Final evaluation metrics for all the models.

_do_iteration(self, batch:Dict[str, Any]) → Dict[str, Any][source]

Core evaluation logic for one iteration, operates on a batch. This base class has a dummy implementation - just forward pass through some “model”.

Parameters
batch: Dict[str, Any]

A batch of evaluation examples sampled from dataloader. See evaluate() on how this batch is sampled.

Returns
Dict[str, Any]

An output dictionary typically returned by the models. This may contain predictions from models, validation loss etc.