virtex.data.datasets


Pretraining Datasets

class virtex.data.datasets.coco_captions.CocoCaptionsDataset(data_root: str, split: str)[source]

Bases: torch.utils.data.dataset.Dataset

A PyTorch dataset to read COCO Captions dataset and provide it completely unprocessed. This dataset is used by various task-specific datasets in datasets module.

Parameters
  • data_root – Path to the COCO dataset root directory.

  • split – Name of COCO 2017 split to read. One of {"train", "val"}.

class virtex.data.datasets.captioning.CaptioningDataset(data_root: str, split: str, tokenizer: virtex.data.tokenizers.SentencePieceBPETokenizer, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}), max_caption_length: int = 30)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which provides image-caption (forward and backward) pairs from a COCO Captions annotation file. This is used for pretraining tasks which use captions - bicaptioning, forward captioning and token classification.

Parameters
  • data_root – Path to dataset directory containing images and annotations.

  • split – Name of COCO 2017 split to read. One of {"train", "val"}.

  • tokenizer – Tokenizer which maps word tokens to their integer IDs.

  • image_transform – List of image transformations, from either albumentations or virtex.data.transforms.

  • max_caption_length – Maximum number of tokens to keep in caption tokens. Extra tokens will be trimmed from the right end of the token list.

class virtex.data.datasets.classification.TokenClassificationDataset(data_root: str, split: str, tokenizer: virtex.data.tokenizers.SentencePieceBPETokenizer, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}), max_caption_length: int = 30)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which provides image-labelset pairs from a COCO Captions annotation file. The set of caption tokens (unordered) is treated as a labelset.

Parameters
  • data_root – Path to dataset directory containing images and annotations.

  • split – Name of COCO 2017 split to read. One of {"train", "val"}.

  • tokenizer – Tokenizer which maps word tokens to their integer IDs.

  • image_transform

    List of image transformations, from either albumentations or virtex.data.transforms.

  • max_caption_length – Maximum number of tokens to keep in caption tokens. Extra tokens will be trimmed from the right end of the token list.

class virtex.data.datasets.classification.MultiLabelClassificationDataset(data_root: str, split: str, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which provides image-labelset pairs from COCO instance annotation files. This is used for multilabel classification pretraining task.

Parameters
  • data_root – Path to dataset directory containing images and annotations.

  • split – Name of COCO 2017 split to read. One of {"train", "val"}.

  • image_transform

    List of image transformations, from either albumentations or virtex.data.transforms.


Downstream Datasets

class virtex.data.datasets.downstream.ImageNetDataset(data_root: str = 'datasets/imagenet', split: str = 'train', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]

Bases: torchvision.datasets.imagenet.ImageNet

Simple wrapper over torchvision’s ImageNet dataset. Image transform is handled here instead of passing to super class.

Parameters
  • data_root – Path to the ImageNet dataset directory.

  • split – Which split to read from. One of {"train", "val"}.

  • image_transform

    List of image transformations, from either albumentations or virtex.data.transforms.

class virtex.data.datasets.downstream.INaturalist2018Dataset(data_root: str = 'datasets/inaturalist', split: str = 'train', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which provides image-label pairs from the iNaturalist 2018 dataset.

Parameters
  • data_root – Path to the iNaturalist 2018 dataset directory.

  • split – Which split to read from. One of {"train", "val"}.

  • image_transform

    List of image transformations, from either albumentations or virtex.data.transforms.

class virtex.data.datasets.downstream.VOC07ClassificationDataset(data_root: str = 'datasets/VOC2007', split: str = 'trainval', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which provides image-label pairs from the PASCAL VOC 2007 dataset.

Parameters
  • data_root – Path to VOC 2007 directory containing sub-directories named Annotations, ImageSets, and JPEGImages.

  • split – Which split to read from. One of {"trainval", "test"}.

  • image_transform

    List of image transformations, from either albumentations or virtex.data.transforms.

class virtex.data.datasets.downstream.ImageDirectoryDataset(data_root: str, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]

Bases: torch.utils.data.dataset.Dataset

A dataset which reads images from any directory. This class is useful to run image captioning inference on our models with any arbitrary images.

Parameters