cyto_dl.dataframe.readers module#
- cyto_dl.dataframe.readers.filter_columns(columns_to_filter: Sequence[str], regex: str | None = None, startswith: str | None = None, endswith: str | None = None, contains: str | None = None, excludes: str | None = None) Sequence[str] [source]#
Filter a list of columns, using a combination of different queries, or a regex pattern. If regex is supplied it takes precedence and the remaining arguments are ignored. Otherwise, the logical AND of the supplied filters is applied, i.e. the columns that respect all of the supplied conditions are returned.
- Parameters:
columns_to_filter (Sequence[str]) – List of columns to filter
regex (Optional[str] = None) – A string containing a regular expression to be matched
startswith (Optional[str] = None) – A substring the matching columns must start with
endswith (Optional[str] = None) – A substring the matching columns must end with
contains (Optional[str] = None) – A substring the matching columns must contain
excludes (Optional[str] = None) – A substring the matching columns must not contain
- cyto_dl.dataframe.readers.read_csv(path, include_columns=None)[source]#
Read a dataframe stored in a .csv file, and optionally include only the columns given by include_columns
- Parameters:
path (Union[Path, UPath, str]) – Path to the .csv file
include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe.
- Returns:
dataframe
- Return type:
pd.DataFrame
- cyto_dl.dataframe.readers.read_dataframe(dataframe: Path | UPath | str | DataFrame, required_columns: Sequence[str] | None = None, include_columns: Sequence[str] | None = None) DataFrame [source]#
Load a dataframe from a .csv or .parquet file, or assert a given pd.DataFrame contains the expected required columns.
- Parameters:
dataframe (Union[Path, UPath, str, pd.DataFrame]) – Either the path to the dataframe to be loaded, or a pd.DataFrame. Supported file types are .csv and .parquet
required_columns (Optional[Sequence[str]] = None) – List of columns that the dataframe must contain. If these aren’t found, a ValueError is thrown
include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe. If required_columns is not None, those get appended to include_columns (without duplication).
- Returns:
dataframe
- Return type:
pd.DataFrame
- cyto_dl.dataframe.readers.read_h5ad(path, include_columns=None, backed=None)[source]#
Read an annData object stored in a .h5ad file.
- Parameters:
path (Union[Path, str]) – Path to the .h5ad file
include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe.
backed (Optional[str] = None) – Can be (either “r” or “r+”). See anndata’s docs for details: https://anndata.readthedocs.io/en/latest/generated/anndata.read_h5ad.html#anndata.read_h5ad
- Return type:
annData
- cyto_dl.dataframe.readers.read_parquet(path, include_columns=None)[source]#
Read a dataframe stored in a .parquet file, and optionally include only the columns given by include_columns
- Parameters:
path (Union[Path, UPath, str]) – Path to the .parquet file
include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe.
- Returns:
dataframe
- Return type:
pd.DataFrame