cyto_dl.nn.vits.blocks.patchify.patchify_base module#

class cyto_dl.nn.vits.blocks.patchify.patchify_base.PatchifyBase(patch_size: List[int], emb_dim: int, n_patches: List[int], spatial_dims: int = 3, context_pixels: List[int] = [0, 0, 0], input_channels: int = 1, tasks: List[str] | None = [], learnable_pos_embedding: bool = True)[source]#

Bases: Module, ABC

Class for converting images to a masked sequence of patches with positional embeddings.

Parameters:
  • patch_size (List[int]) – Size of each patch in pix (ZYX order for 3D, YX order for 2D)

  • emb_dim (int) – Dimension of encoder

  • n_patches (List[int]) – Number of patches in each spatial dimension (ZYX order for 3D, YX order for 2D)

  • spatial_dims (int) – Number of spatial dimensions

  • context_pixels (List[int]) – Number of extra pixels around each patch to include in convolutional embedding to encoder dimension.

  • input_channels (int) – Number of input channels

  • tasks (List[str]) – List of tasks to encode

  • learnable_pos_embedding (bool) – If True, learnable positional embeddings are used. If False, fixed sin/cos positional embeddings. Empirically, fixed positional embeddings work better for brightfield images.

create_conv(input_channels, emb_dim, patch_size, context_pixels)[source]#
create_patch2img(n_patches, patch_size)[source]#

Converts boolean array of whether to keep index of each patch to an image-shaped mask of same size as input image.

abstract extract_visible_tokens()[source]#
forward(img, mask_ratio, task=None)[source]#
get_mask(img, n_visible_patches, num_patches)[source]#
abstract get_mask_args()[source]#
abstract property img2token#