probnmn.data.datasets

class probnmn.data.datasets.ProgramPriorDataset(tokens_h5path: str)[source]

Bases: torch.utils.data.dataset.Dataset

Provides programs as tokenized sequences to train the class:~probnmn.models.program_prior.ProgramPrior.

Parameters
tokens_h5path: str

Path to an HDF file to initialize the underlying reader.

class probnmn.data.datasets.QuestionCodingDataset(tokens_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 40)[source]

Bases: torch.utils.data.dataset.Dataset

Provides questions and programs as tokenized sequences for Question Coding. It also provides a “supervision” flag, which can behave as a mask when batched, to tune the amount of program supervision on ProgramGenerator.

Parameters
tokens_h5path: str

Path to an HDF file to initialize the underlying reader.

num_supervision: int, optional (default = None)

Number of examples where there would be a program supervision over questions, for ProgramGenerator.

supervision_question_max_length: int, optional (default = 30)

Maximum length of question for picking examples with program supervision.

Notes

For a fixed numpy random seed, the randomly generated supervision list will always be same.

get_supervision_list(self)[source]

Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by SupervisionWeightedRandomSampler to form a mini-batch with nearly equal number of examples with(out) program supervision.

class probnmn.data.datasets.ModuleTrainingDataset(tokens_h5path: str, features_h5path: str, in_memory: bool = True)[source]

Bases: torch.utils.data.dataset.Dataset

Provides questions, image features an answers for module training. Programs are inferred by ProgramGenerator trained during Question Coding.

Parameters
tokens_h5path: str

Path to an HDF file to initialize the underlying reader.

features_h5path: str

Path to an HDF file containing a ‘dataset’ of pre-extracted image features.

in_memory: bool, optional (default = True)

Whether to load all image features in memory.

class probnmn.data.datasets.JointTrainingDataset(tokens_h5path: str, features_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 30, in_memory: bool = True)[source]

Bases: torch.utils.data.dataset.Dataset

Provides questions, programs, supervision flag, image features and answers for Joint Training. If the random seed is set carefully, then the supervision list is made same as that in QuestionCodingDataset.

Parameters
tokens_h5path: str

Path to an HDF file to initialize the underlying reader.

features_h5path: str

Path to an HDF file containing a ‘dataset’ of pre-extracted image features.

num_supervision: int, optional (default = None)

Number of examples where there would be a program supervision over questions, for ProgramGenerator.

supervision_question_max_length: int, optional (default = 30)

Maximum length of question for picking examples with program supervision.

in_memory: bool, optional (default = True)

Whether to load all image features in memory.

Notes

For a fixed numpy random seed, the randomly generated supervision list will always be same.

get_supervision_list(self)[source]

Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by SupervisionWeightedRandomSampler to form a mini-batch with nearly equal number of examples with(out) program supervision.