probnmn.models.nmn

class probnmn.models.nmn.NeuralModuleNetwork(vocabulary: allennlp.data.vocabulary.Vocabulary, image_feature_size: Tuple[int, int, int] = (1024, 14, 14), module_channels: int = 128, class_projection_channels: int = 1024, classifier_linear_size: int = 1024)[source]

Bases: torch.nn.modules.module.Module

A NeuralModuleNetwork holds neural modules, a stem network, and a classifier network. It hooks these all together to answer a question given some scene and a program describing how to arrange the neural modules.

Parameters
vocabulary: allennlp.data.vocabulary.Vocabulary

AllenNLP’s vocabulary. This vocabulary has three namespaces - “questions”, “programs” and “answers”, which contain respective token to integer mappings.

image_feature_size: tuple (K, R, C), optional (default = (1024, 14, 14))

Shape of input image features, in the form (channel, height, width).

module_channels: int, optional (default = 128)

Number of channels for each neural module’s convolutional blocks.

class_projection_channels: int, optional (default = 512)

Number of channels in projected final feature map (input to classifier).

classifier_linear_size: int, optional (default = 1024)

Size of input to the linear classifier.

classmethod from_config(config:probnmn.config.Config)[source]

Instantiate this class directly from a Config.

forward(self, features:torch.Tensor, programs:torch.Tensor, answers:Union[torch.Tensor, NoneType]=None)[source]

Given image features and program sequences, lay out a modular network and pass through the image features, further take the final feature representation output from modular network and pass it throuh the classifier to get the answer distribution.

Parameters
features: torch.Tensor

Input image features of shape (batch, channels, height, width).

programs: torch.Tensor

Program sequences padded up to maximum length, shape (batch_size, max_program_length).

answers: torch.Tensor, optional (default = None)

Target answers for corresponding images and programs, shape (batch_size, ).

Returns
Dict[str, Any]

Model predictions, answer cross-entropy loss and (if training, ) batch metrics. When answer targets are not provided, it returns negative log-probabilities of predicted answers. A dict with structure:

{
    "predictions": torch.Tensor (shape: (batch_size, )),
    "loss": torch.Tensor (shape: (batch_size, )),
    "metrics": {
        "answer_accuracy": float,
        "average_invalid": float,
    }
}

Notes

The structure of modular network is different for each program sequence, so we just loop through all programs of a batch and do forward pass for each example in the loop.

get_metrics(self, reset:bool=True) → Dict[str, float][source]

Return recorded answer accuracy and average invalid programs per batch.

Parameters
reset: bool, optional (default = True)

Whether to reset the accumulated metrics after retrieving them.

Returns
Dict[str, float]

A dictionary with metrics {"answer_accuracy", "average_invalid"}.