torchaudio.datasets¶

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

The following datasets are available:

Datasets

VCTK
YESNO

All the datasets have almost similar API. They all have two common arguments: transform and target_transform to transform the input and target respectively.

VCTK ¶

class torchaudio.datasets.VCTK(root, downsample=True, transform=None, target_transform=None, download=False, dev_mode=False)[source]¶

VCTK Dataset. alternate url

Parameters

root (str) – Root directory of dataset where processed/training.pt and processed/test.pt exist.
downsample (bool, optional) – Whether to downsample the signal (Default: True)
transform (Callable, optional) – A function/transform that takes in an raw audio and returns a transformed version. E.g, transforms.Spectrogram. (Default: None)
target_transform (callable, optional) – A function/transform that takes in the target and transforms it. (Default: None)
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. (Default: True)
dev_mode (bool, optional) – If true, clean up is not performed on downloaded files. Useful to keep raw audio and transcriptions. (Default: False)

__getitem__(index)[source]¶

Parameters: index (int) – Index
Returns: The output tuple (image, target) where target is index of the target class.
Return type: Tuple[torch.Tensor, int]

YESNO ¶

class torchaudio.datasets.YESNO(root, transform=None, target_transform=None, download=False, dev_mode=False)[source]¶

YesNo Hebrew Dataset.

Parameters

root (str) – Root directory of dataset where processed/training.pt and processed/test.pt exist.
transform (Callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.Spectrogram. ( Default: None)
target_transform (Callable, optional) – A function/transform that takes in the target and transforms it. (Default: None)
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. (Default: False)
dev_mode (bool, optional) – If true, clean up is not performed on downloaded files. Useful to keep raw audio and transcriptions. (Default: False)

__getitem__(index)[source]¶

Parameters: index (int) – Index
Returns: The output tuple (image, target) where target is index of the target class.
Return type: Tuple[torch.Tensor, int]

torchaudio.datasets¶

VCTK ¶

YESNO ¶

Docs

Tutorials

Resources

torchaudio.datasets¶

VCTK¶

YESNO¶

Docs

Tutorials

Resources

VCTK ¶

YESNO ¶