
class probnmn.models.nmn.NeuralModuleNetwork(vocabulary:, image_feature_size: Tuple[int, int, int] = (1024, 14, 14), module_channels: int = 128, class_projection_channels: int = 1024, classifier_linear_size: int = 1024)[source]

Bases: torch.nn.modules.module.Module

A NeuralModuleNetwork holds neural modules, a stem network, and a classifier network. It hooks these all together to answer a question given some scene and a program describing how to arrange the neural modules.


AllenNLP’s vocabulary. This vocabulary has three namespaces - “questions”, “programs” and “answers”, which contain respective token to integer mappings.

image_feature_size: tuple (K, R, C), optional (default = (1024, 14, 14))

Shape of input image features, in the form (channel, height, width).

module_channels: int, optional (default = 128)

Number of channels for each neural module’s convolutional blocks.

class_projection_channels: int, optional (default = 512)

Number of channels in projected final feature map (input to classifier).

classifier_linear_size: int, optional (default = 1024)

Size of input to the linear classifier.

classmethod from_config(config:probnmn.config.Config)[source]

Instantiate this class directly from a Config.

forward(self, features:torch.Tensor, programs:torch.Tensor, answers:Union[torch.Tensor, NoneType]=None)[source]

Given image features and program sequences, lay out a modular network and pass through the image features, further take the final feature representation output from modular network and pass it throuh the classifier to get the answer distribution.

features: torch.Tensor

Input image features of shape (batch, channels, height, width).

programs: torch.Tensor

Program sequences padded up to maximum length, shape (batch_size, max_program_length).

answers: torch.Tensor, optional (default = None)

Target answers for corresponding images and programs, shape (batch_size, ).

Dict[str, Any]

Model predictions, answer cross-entropy loss and (if training, ) batch metrics. When answer targets are not provided, it returns negative log-probabilities of predicted answers. A dict with structure:

    "predictions": torch.Tensor (shape: (batch_size, )),
    "loss": torch.Tensor (shape: (batch_size, )),
    "metrics": {
        "answer_accuracy": float,
        "average_invalid": float,


The structure of modular network is different for each program sequence, so we just loop through all programs of a batch and do forward pass for each example in the loop.

get_metrics(self, reset:bool=True) → Dict[str, float][source]

Return recorded answer accuracy and average invalid programs per batch.

reset: bool, optional (default = True)

Whether to reset the accumulated metrics after retrieving them.

Dict[str, float]

A dictionary with metrics {"answer_accuracy", "average_invalid"}.