cyto_dl.nn.vits.decoder module#

Decoder inspired by [CrossMAE](https://crossmae.github.io/) where masked tokens only attend to visible tokens.

Parameters:

num_patches (List[int], int) – Number of patches in each dimension. If int, the same number of patches is used for all dimensions.
patch_size (Tuple[int]) – Size of each patch in each dimension. If int, the same patch size is used for all dimensions.
enc_dim (int) – Dimension of encoder
emb_dim (int) – Dimension of embedding
num_layer (int) – Number of transformer layers
num_head (int) – Number of heads in transformer
has_cls_token (bool) – Whether encoder features have a cls token
learnable_pos_embedding (bool) – If True, learnable positional embeddings are used. If False, fixed sin/cos positional embeddings are used. Empirically, fixed positional embeddings work better for brightfield images.

Bases: Module

Parameters:

num_patches (List[int], int) – Number of patches in each dimension. If int, the same number of patches is used for all dimensions.
patch_size (Tuple[int], int) – Size of each patch. If int, the same patch size is used for all dimensions.
enc_dim (int) – Dimension of encoder
emb_dim (int) – Dimension of decoder
num_layer (int) – Number of transformer layers
num_head (int) – Number of heads in transformer
has_cls_token (bool) – Whether encoder features have a cls token
learnable_pos_embedding (bool) – If True, learnable positional embeddings are used. If False, fixed sin/cos positional embeddings. Empirically, fixed positional embeddings work better for brightfield images.