cyto_dl.datamodules.folder module#

cyto_dl.datamodules.folder.make_folder_dataloader(path: UPath | str, transforms: Sequence[Callable] | Callable, cache_dir: UPath | str | None = None, regex: str | None = None, startswith: str | None = None, endswith: str | None = None, contains: str | None = None, excludes: str | None = None, **dataloader_kwargs)[source]#

Create a dataloader based on a folder of samples. If no transforms are applied, each sample is a dictionary with a key “input” containing the corresponding path and a key “orig_fname” containing the original filename (with no extension).

Files can be filtered out of the list with name-based rules, using regex, startswith, endswith, contains, excludes.

Parameters:
  • path (Union[Path, str],) – Path to folder

  • transforms (Union[Sequence[Callable], Callable],) – Transforms to apply to each sample

  • cache_dir (Optional[Union[Path, str]] = None) – Path to a directory in which to store cached transformed inputs, to accelerate batch loading.

  • regex (Optional[str] = None) – A string containing a regular expression to be matched

  • startswith (Optional[str] = None) – A substring the matching columns must start with

  • endswith (Optional[str] = None) – A substring the matching columns must end with

  • contains (Optional[str] = None) – A substring the matching columns must contain

  • excludes (Optional[str] = None) – A substring the matching columns must not contain

  • dataloader_kwargs – Additional keyword arguments are passed to the torch.utils.data.DataLoader class when instantiating it (aside from shuffle which is only used for the train dataloader). Among these args are num_workers, batch_size, shuffle, etc. See the PyTorch docs for more info on these args: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader