cyto_dl.nn.vits.blocks.patchify.patchify_base module#

class cyto_dl.nn.vits.blocks.patchify.patchify_base.PatchifyBase(patch_size: List[int], emb_dim: int, n_patches: List[int], spatial_dims: int = 3, context_pixels: List[int] = [0, 0, 0], input_channels: int = 1, tasks: List[str] | None = [], learnable_pos_embedding: bool = True)[source]#

Bases: Module, ABC

Class for converting images to a masked sequence of patches with positional embeddings.

Parameters:

patch_size (List[int]) – Size of each patch in pix (ZYX order for 3D, YX order for 2D)
emb_dim (int) – Dimension of encoder
n_patches (List[int]) – Number of patches in each spatial dimension (ZYX order for 3D, YX order for 2D)
spatial_dims (int) – Number of spatial dimensions
context_pixels (List[int]) – Number of extra pixels around each patch to include in convolutional embedding to encoder dimension.
input_channels (int) – Number of input channels
tasks (List[str]) – List of tasks to encode
learnable_pos_embedding (bool) – If True, learnable positional embeddings are used. If False, fixed sin/cos positional embeddings. Empirically, fixed positional embeddings work better for brightfield images.

create_conv(input_channels, emb_dim, patch_size, context_pixels)[source]#

create_patch2img(n_patches, patch_size)[source]#: Converts boolean array of whether to keep index of each patch to an image-shaped mask of same size as input image.

abstract extract_visible_tokens()[source]#

forward(img, mask_ratio, task=None)[source]#

get_mask(img, n_visible_patches, num_patches)[source]#

abstract get_mask_args()[source]#

abstract property img2token#