cyto_dl.datamodules.dataframe.utils module#
- class cyto_dl.datamodules.dataframe.utils.AlternatingBatchSampler(subset: ~torch.utils.data.dataset.Subset, target_columns: ~typing.Sequence[str] | None = None, grouping_column: ~typing.Sequence[str] | None = None, batch_size: int = 1, drop_last: bool = False, shuffle: bool = False, sampler: ~torch.utils.data.sampler.Sampler = <class 'torch.utils.data.sampler.SubsetRandomSampler'>)[source]#
Bases:
BatchSampler
Subclass of pytorch’s BatchSampler that alternates between sampling from mutually exclusive columns of a dataframe dataset.
- Parameters:
subset (Subset) – Subset of monai dataset wrapping a dataframe
target_columns (Sequence[str]) – names of columns in subset dataframe representing types of ground truth images to alternate between
batch_size (int) – Size of batch
drop_last (bool=False) – Whether to drop last incomplete batch
shuffle (bool=False) – Whether to randomly select between columns in target_columns. If False, batches will follow the order of target_columns
sampler (Sampler=SubsetRandomSampler) – Sampler to sample from each column in target_columns
- class cyto_dl.datamodules.dataframe.utils.RemoveNaNKeysd[source]#
Bases:
Transform
Transform to remove ‘nan’ keys from data dictionary.
When combined with adding allow_missing_keys=True to transforms and the alternating batch sampler, this allows multi-task training when only one target is available at a time.
- cyto_dl.datamodules.dataframe.utils.get_dataset(dataframe, transform, split, cache_dir=None, smartcache_args=None)[source]#
- cyto_dl.datamodules.dataframe.utils.make_multiple_dataframe_splits(split_path, transforms, columns=None, just_inference=False, cache_dir=None, smartcache_args=None)[source]#