actk.steps package¶
Subpackages¶
Module contents¶
-
class
actk.steps.
Raw
(filepath_columns=['SourceReadPath', 'NucleusSegmentationReadPath', 'MembraneSegmentationReadPath'], metadata_columns=['FOVId'], **kwargs)[source]¶ Bases:
datastep.step.Step
-
run
(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], **kwargs)[source]¶ Simple passthrough to store the dataset in local_staging/raw. This does not copy any the image files to local_staging/raw, only the manifest. This is an optional step that will only run if you want to upload the raw data.
- Parameters
dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The dataset to use for the rest of the pipeline run.
Required dataset columns: [“CellId”, “CellIndex”, “FOVId”, “SourceReadPath”, “NucleusSegmentationReadPath”, “MembraneSegmentationReadPath”, “ChannelIndexDNA”, “ChannelIndexMembrane”, “ChannelIndexStructure”, “ChannelIndexBrightfield”]
- Returns
manifest_save_path – The path to the manifest in local_staging with the raw data.
- Return type
Path
-
-
class
actk.steps.
SingleCellFeatures
(direct_upstream_tasks=[<class 'actk.steps.standardize_fov_array.standardize_fov_array.StandardizeFOVArray'>], filepath_columns=['CellFeaturesPath'], **kwargs)[source]¶ Bases:
datastep.step.Step
-
run
(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], cell_ceiling_adjustment: int = 0, distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs)[source]¶ Provided a dataset generate a features JSON file for each cell.
- Parameters
dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to use for generating features JSON for each cell.
Required dataset columns: [“CellId”, “CellIndex”, “FOVId”, “StandardizedFOVPath”]
cell_ceiling_adjustment (int) – The adjust to use for raising the cell shape ceiling. If <= 0, this will be ignored and cell data will be selected but not adjusted. Default: 0
distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None
batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)
overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)
- Returns
manifest_save_path – Path to the produced manifest with the CellFeaturesPath column added.
- Return type
Path
-
-
class
actk.steps.
StandardizeFOVArray
(filepath_columns=['StandardizedFOVPath'], **kwargs)[source]¶ Bases:
datastep.step.Step
-
run
(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], current_pixel_sizes: Optional[Tuple[float]] = (0.10833333333333332, 0.10833333333333332, 0.29), desired_pixel_sizes: Tuple[float] = (0.29, 0.29, 0.29), distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs) → pathlib.Path[source]¶ Convert a dataset of raw FOV images and their nucleus and membrane segmentations, into a single, standard order and shape, and normalized image.
- Parameters
dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The dataset to use for generating standard order, normalized, image arrays.
Required dataset columns: [“FOVId”, “SourceReadPath”, “NucleusSegmentationReadPath”, “MembraneSegmentationReadPath”, “ChannelIndexDNA”, “ChannelIndexMembrane”, “ChannelIndexStructure”, “ChannelIndexBrightfield”]
- current_pixel_sizes: Optional[Tuple[float]]
The current physical pixel sizes as a tuple of the raw image. Default: (0.10833333333333332, 0.10833333333333332, 0.29), though if None, uses (aicsimageio.AICSImage.get_physical_pixel_size on the raw image)
- desired_pixel_sizes: Tuple[float]
The desired pixel size for to resize each image to in XYZ order. Default: (0.29, 0.29, 0.29)
- distributed_executor_address: Optional[str]
An optional executor address to pass to some computation engine. Default: None
- batch_size: Optional[int]
An optional batch size to process n features at a time. Default: None (Process all at once)
- overwrite: bool
If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)
- Returns
manifest_save_path – Path to the produced manifest with the StandardizedFOVPath column added.
- Return type
Path
-
-
class
actk.steps.
SingleCellImages
(direct_upstream_tasks=[<class 'actk.steps.single_cell_features.single_cell_features.SingleCellFeatures'>], filepath_columns=['CellImage3DPath', 'CellImage2DAllProjectionsPath', 'CellImage2DYXProjectionPath'], **kwargs)[source]¶ Bases:
datastep.step.Step
-
run
(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], cell_ceiling_adjustment: int = 0, bounding_box_percentile: float = 95.0, projection_method: str = 'max', distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, bbox: Union[tuple, list, dict] = None, **kwargs)[source]¶ Provided a dataset of cell features and standardized FOV images, generate 3D single cell crops and 2D projections.
- Parameters
dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to generate 3D single cell images for.
Required dataset columns: [“CellId”, “StandardizedFOVPath”, “CellFeaturesPath”]
cell_ceiling_adjustment (int) – The adjust to use for raising the cell shape ceiling. If <= 0, this will be ignored and cell data will be selected but not adjusted. Default: 0
bounding_box_percentile (float) – A float used to generate the actual bounding box for all cells by finding provided percentile of all cell image sizes. Default: 95.0
bbox (tuple, list, dict) – Hard coded ZYX dimensions to set the bounding box. Note: This overrides the bounding_box_percentile parameter. Example: (64, 168, 104)
projection_method (str) – The method to use for generating the flat projection. Default: max
distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None
batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)
overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)
- Returns
manifest_save_path – Path to the produced manifest with the various cell image path fields added.
- Return type
Path
-
-
class
actk.steps.
DiagnosticSheets
(direct_upstream_tasks: List[Step] = [<class 'actk.steps.single_cell_images.single_cell_images.SingleCellImages'>], filepath_columns=['DiagnosticSheetPath'], **kwargs)[source]¶ Bases:
datastep.step.Step
-
run
(dataset: Union[str, pathlib.Path, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame], max_cells: int = 200, metadata: Union[list, str, None] = 'FOVId', feature: Optional[str] = None, fig_width: Optional[int] = None, fig_height: Optional[int] = None, distributed_executor_address: Optional[str] = None, batch_size: Optional[int] = None, overwrite: bool = False, **kwargs)[source]¶ Provided a dataset of single cell all projection images, generate a diagnostic sheet grouped by desired metadata and feature
- Parameters
dataset (Union[str, Path, pd.DataFrame, dd.DataFrame]) – The primary cell dataset to use for generating diagnistic sheet for a group of cells.
Required dataset columns: [“CellId”, “CellImage2DAllProjectionsPath”]
max_cells (int) – The maximum number of cells to display on a single diagnostic sheet. Deafult: 200
metadata (Optional[Union[list, str]]) – The metadata to group cells and generate a diagnostic sheet. For example, “FOVId” or “[“FOVId”, “ProteinDisplayName”]” Default: “FOVId”
feature (Optional[str]) – The name of the single cell feature to display. For example, “imsize_orig”.
fig_width (Optional[int]) – Width of the diagnostic sheet figure.
fig_height (Optional[int]) – Height of the diagnostic sheet figure.
distributed_executor_address (Optional[str]) – An optional executor address to pass to some computation engine. Default: None
batch_size (Optional[int]) – An optional batch size to process n features at a time. Default: None (Process all at once)
overwrite (bool) – If this step has already partially or completely run, should it overwrite the previous files or not. Default: False (Do not overwrite or regenerate files)
- Returns
manifest_save_path – Path to the produced manifest with the DiagnosticSheetPath column added.
- Return type
Path
-