actk.steps package

Module contents

class actk.steps.Raw(filepath_columns=['SourceReadPath', 'NucleusSegmentationReadPath', 'MembraneSegmentationReadPath'], metadata_columns=['FOVId'], **kwargs)[source]

Bases: datastep.step.Step

run(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], **kwargs)[source]

Simple passthrough to store the dataset in local_staging/raw. This does not copy any the image files to local_staging/raw, only the manifest. This is an optional step that will only run if you want to upload the raw data.

Parameters

dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The dataset to use for the rest of the pipeline run.

Required dataset columns: [“CellId”, “CellIndex”, “FOVId”, “SourceReadPath”, “NucleusSegmentationReadPath”, “MembraneSegmentationReadPath”, “ChannelIndexDNA”, “ChannelIndexMembrane”, “ChannelIndexStructure”, “ChannelIndexBrightfield”]

Returns

manifest_save_path – The path to the manifest in local_staging with the raw data.

Return type

Path

class actk.steps.SingleCellFeatures(direct_upstream_tasks=[<class 'actk.steps.standardize_fov_array.standardize_fov_array.StandardizeFOVArray'>], filepath_columns=['CellFeaturesPath'], **kwargs)[source]

Bases: datastep.step.Step

run(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], cell_ceiling_adjustment: int = 0, distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs)[source]

Provided a dataset generate a features JSON file for each cell.

Parameters
  • dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to use for generating features JSON for each cell.

    Required dataset columns: [“CellId”, “CellIndex”, “FOVId”, “StandardizedFOVPath”]

  • cell_ceiling_adjustment (int) – The adjust to use for raising the cell shape ceiling. If <= 0, this will be ignored and cell data will be selected but not adjusted. Default: 0

  • distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None

  • batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)

  • overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)

Returns

manifest_save_path – Path to the produced manifest with the CellFeaturesPath column added.

Return type

Path

class actk.steps.StandardizeFOVArray(filepath_columns=['StandardizedFOVPath'], **kwargs)[source]

Bases: datastep.step.Step

run(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], current_pixel_sizes: Optional[Tuple[float]] = (0.10833333333333332, 0.10833333333333332, 0.29), desired_pixel_sizes: Tuple[float] = (0.29, 0.29, 0.29), distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs) → pathlib.Path[source]

Convert a dataset of raw FOV images and their nucleus and membrane segmentations, into a single, standard order and shape, and normalized image.

Parameters

dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The dataset to use for generating standard order, normalized, image arrays.

Required dataset columns: [“FOVId”, “SourceReadPath”, “NucleusSegmentationReadPath”, “MembraneSegmentationReadPath”, “ChannelIndexDNA”, “ChannelIndexMembrane”, “ChannelIndexStructure”, “ChannelIndexBrightfield”]

current_pixel_sizes: Optional[Tuple[float]]

The current physical pixel sizes as a tuple of the raw image. Default: (0.10833333333333332, 0.10833333333333332, 0.29), though if None, uses (aicsimageio.AICSImage.get_physical_pixel_size on the raw image)

desired_pixel_sizes: Tuple[float]

The desired pixel size for to resize each image to in XYZ order. Default: (0.29, 0.29, 0.29)

distributed_executor_address: Optional[str]

An optional executor address to pass to some computation engine. Default: None

batch_size: Optional[int]

An optional batch size to process n features at a time. Default: None (Process all at once)

overwrite: bool

If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)

Returns

manifest_save_path – Path to the produced manifest with the StandardizedFOVPath column added.

Return type

Path

class actk.steps.SingleCellImages(direct_upstream_tasks=[<class 'actk.steps.single_cell_features.single_cell_features.SingleCellFeatures'>], filepath_columns=['CellImage3DPath', 'CellImage2DAllProjectionsPath', 'CellImage2DYXProjectionPath'], **kwargs)[source]

Bases: datastep.step.Step

run(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], cell_ceiling_adjustment: int = 0, bounding_box_percentile: float = 95.0, projection_method: str = 'max', distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, bbox: Union[tuple, list, dict] = None, **kwargs)[source]

Provided a dataset of cell features and standardized FOV images, generate 3D single cell crops and 2D projections.

Parameters
  • dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to generate 3D single cell images for.

    Required dataset columns: [“CellId”, “StandardizedFOVPath”, “CellFeaturesPath”]

  • cell_ceiling_adjustment (int) – The adjust to use for raising the cell shape ceiling. If <= 0, this will be ignored and cell data will be selected but not adjusted. Default: 0

  • bounding_box_percentile (float) – A float used to generate the actual bounding box for all cells by finding provided percentile of all cell image sizes. Default: 95.0

  • bbox (tuple, list, dict) – Hard coded ZYX dimensions to set the bounding box. Note: This overrides the bounding_box_percentile parameter. Example: (64, 168, 104)

  • projection_method (str) – The method to use for generating the flat projection. Default: max

    More details: https://allencellmodeling.github.io/aicsimageprocessing/aicsimageprocessing.html#aicsimageprocessing.imgToProjection.imgtoprojection

  • distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None

  • batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)

  • overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)

Returns

manifest_save_path – Path to the produced manifest with the various cell image path fields added.

Return type

Path

class actk.steps.DiagnosticSheets(direct_upstream_tasks: List[Step] = [<class 'actk.steps.single_cell_images.single_cell_images.SingleCellImages'>], filepath_columns=['DiagnosticSheetPath'], **kwargs)[source]

Bases: datastep.step.Step

run(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], max_cells: int = 200, metadata: Union[list, str, None] = 'FOVId', feature: Optional[str] = None, fig_width: Optional[int] = None, fig_height: Optional[int] = None, distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs)[source]

Provided a dataset of single cell all projection images, generate a diagnostic sheet grouped by desired metadata and feature

Parameters
  • dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to use for generating diagnistic sheet for a group of cells.

    Required dataset columns: [“CellId”, “CellImage2DAllProjectionsPath”]

  • max_cells (int) – The maximum number of cells to display on a single diagnostic sheet. Deafult: 200

  • metadata (Optional[Union[list, str]]) – The metadata to group cells and generate a diagnostic sheet. For example, “FOVId” or “[“FOVId”, “ProteinDisplayName”]” Default: “FOVId”

  • feature (Optional[str]) – The name of the single cell feature to display. For example, “imsize_orig”.

  • fig_width (Optional[int]) – Width of the diagnostic sheet figure.

  • fig_height (Optional[int]) – Height of the diagnostic sheet figure.

  • distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None

  • batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)

  • overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)

Returns

manifest_save_path – Path to the produced manifest with the DiagnosticSheetPath column added.

Return type

Path