probnmn.data.datasets¶
-
class
probnmn.data.datasets.
ProgramPriorDataset
(tokens_h5path: str)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Provides programs as tokenized sequences to train the class:~probnmn.models.program_prior.ProgramPrior.
- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
-
class
probnmn.data.datasets.
QuestionCodingDataset
(tokens_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 40)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Provides questions and programs as tokenized sequences for Question Coding. It also provides a “supervision” flag, which can behave as a mask when batched, to tune the amount of program supervision on
ProgramGenerator
.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- num_supervision: int, optional (default = None)
Number of examples where there would be a program supervision over questions, for
ProgramGenerator
.- supervision_question_max_length: int, optional (default = 30)
Maximum length of question for picking examples with program supervision.
Notes
For a fixed numpy random seed, the randomly generated supervision list will always be same.
-
get_supervision_list
(self)[source]¶ Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by
SupervisionWeightedRandomSampler
to form a mini-batch with nearly equal number of examples with(out) program supervision.
-
class
probnmn.data.datasets.
ModuleTrainingDataset
(tokens_h5path: str, features_h5path: str, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Provides questions, image features an answers for module training. Programs are inferred by
ProgramGenerator
trained during Question Coding.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- features_h5path: str
Path to an HDF file containing a ‘dataset’ of pre-extracted image features.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
-
class
probnmn.data.datasets.
JointTrainingDataset
(tokens_h5path: str, features_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 30, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Provides questions, programs, supervision flag, image features and answers for Joint Training. If the random seed is set carefully, then the supervision list is made same as that in
QuestionCodingDataset
.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- features_h5path: str
Path to an HDF file containing a ‘dataset’ of pre-extracted image features.
- num_supervision: int, optional (default = None)
Number of examples where there would be a program supervision over questions, for
ProgramGenerator
.- supervision_question_max_length: int, optional (default = 30)
Maximum length of question for picking examples with program supervision.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
Notes
For a fixed numpy random seed, the randomly generated supervision list will always be same.
-
get_supervision_list
(self)[source]¶ Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by
SupervisionWeightedRandomSampler
to form a mini-batch with nearly equal number of examples with(out) program supervision.