probnmn.modules.nmn_modules

Collection of PyTorch modules used by our Neural Module Network.

Adopted from: davidmascharka/tbd-nets.

class probnmn.modules.nmn_modules.AndModule[source]

Bases: torch.nn.modules.module.Module

A neural module that (basically) performs a logical and.

An AndModule is a neural module that takes two input attention masks and (basically) performs a set intersection. This would be used in a question like “What color is the cube to the left of the sphere and right of the yellow cylinder?” After localizing the regions left of the sphere and right of the yellow cylinder, an AndModule would be used to find the intersection of the two. Its output would then go into an AttentionModule that finds cubes.

class probnmn.modules.nmn_modules.OrModule[source]

Bases: torch.nn.modules.module.Module

A neural module that (basically) performs a logical or.

An OrModule is a neural module that takes two input attention masks and (basically) performs a set union. This would be used in a question like “How many cubes are left of the brown sphere or right of the cylinder?” After localizing the regions left of the brown sphere and right of the cylinder, an OrModule would be used to find the union of the two. Its output would then go into an AttentionModule that finds cubes.

class probnmn.modules.nmn_modules.AttentionModule(dim: int)[source]

Bases: torch.nn.modules.module.Module

A neural module that takes a feature map and attention, attends to the features, and produces an attention.

A AttentionModule takes input features and an attention and produces an attention. It multiplicatively combines its input feature map and attention to attend to the relevant region of the feature map. It then processes the attended features via a series of convolutions and produces an attention mask highlighting the objects that possess the attribute the module is looking for.

For example, an AttentionModule may be tasked with finding cubes. Given an input attention of all ones, it will highlight all the cubes in the provided input features. Given an attention mask highlighting all the red objects, it will produce an attention mask highlighting all the red cubes.

Parameters
dim: int

The number of channels of each convolutional filter.

class probnmn.modules.nmn_modules.QueryModule(dim: int)[source]

Bases: torch.nn.modules.module.Module

A neural module that takes as input a feature map and an attention and produces a feature map as output.

A QueryModule takes a feature map and an attention mask as input. It attends to the feature map via an elementwise multiplication with the attention mask, then processes this attended feature map via a series of convolutions to extract relevant information.

For example, a QueryModule tasked with determining the color of objects would output a feature map encoding what color the attended object is. A module intended to count would output a feature map encoding the number of attended objects in the scene.

Parameters
dim: int

The number of channels of each convolutional filter.

class probnmn.modules.nmn_modules.RelateModule(dim: int)[source]

Bases: torch.nn.modules.module.Module

A neural module that takes as input a feature map and an attention and produces an attention as output.

A RelateModule takes input features and an attention and produces an attention. It multiplicatively combines the attention and the features to attend to a relevant region, then uses a series of dilated convolutional filters to indicate a spatial relationship to the input attended region.

Parameters
dim: int

The number of channels of each convolutional filter.

class probnmn.modules.nmn_modules.SameModule(dim: int)[source]

Bases: torch.nn.modules.module.Module

A neural module that takes as input a feature map and an attention and produces an attention as output.

A SameModule takes input features and an attention and produces an attention. It determines the index of the maximally-attended object, extracts the feature vector at that spatial location, then performs a cross-correlation at each spatial location to determine which other regions have this same property. This correlated feature map then goes through a convolutional block whose output is an attention mask.

As an example, this module can be used with the CLEVR dataset to perform the same_shape operation, which will highlight every region of an image that shares the same shape as an object of interest (excluding the original object).

Parameters
dim: int

The number of channels in the input feature map.

class probnmn.modules.nmn_modules.ComparisonModule(dim: int)[source]

Bases: torch.nn.modules.module.Module

A neural module that takes as input two feature maps and produces a feature map as output.

A ComparisonModule takes two feature maps as input and concatenates these. It then processes the concatenated features and produces a feature map encoding whether the two input feature maps encode the same property.

This block is useful in making integer comparisons, for example to answer the question, “Are there more red things than small spheres?” It can also be used to determine whether some relationship holds of two objects (e.g. they are the same shape, size, color, or material).

Parameters
dim: int

The number of channels of each convolutional filter.