gsmm.csm package
Submodules
gsmm.csm.analyse_csm module
- gsmm.csm.analyse_csm.analyse_and_save_fluxes(model_paths: Dict[str, str]) None[source]
Analyze fluxes from given model paths and save the results.
Parameters: - model_paths (dict, optional): Dictionary containing model names as keys and file paths as values. Defaults to None.
Returns: - None
- gsmm.csm.analyse_csm.collect_sink_fluxes(models: Dict[str, Model], metabolites: List[str]) DataFrame[source]
Collect sink fluxes for all context-specific models.
- Args:
- models (Dict[str, Model]): Dictionary containing context-specific models.
Keys are model names and values are COBRApy Model objects.
metabolites (List[str]): List of metabolites of interest for sink flux collection.
- Returns:
- pd.DataFrame: DataFrame containing collected sink flux data.
Columns typically include ‘model’, ‘metabolite’, and ‘flux’.
- Notes:
This function iterates over each model in the provided dictionary of models. For each model, it collects sink fluxes for specified metabolites using the get_sink_fluxes function. The collected sink flux data from all models is concatenated into a single DataFrame and returned.
- gsmm.csm.analyse_csm.extract_fluxes(models: Dict[str, Model]) DataFrame[source]
Extract fluxes for all reactions from a set of COBRApy models.
- Args:
models (Dict[str, Model]): Dictionary of model names mapped to COBRApy Model objects.
- Returns:
pd.DataFrame: DataFrame containing extracted flux data with columns ‘Reaction’, ‘Flux’, and ‘Model’.
- Notes:
This function iterates over each model in the input dictionary and optimizes it to obtain fluxes for all reactions in the model. The extracted flux data is structured into a DataFrame, where each row represents a reaction with its associated flux and model name.
- gsmm.csm.analyse_csm.filter_deletion_results(deletion_results: Dict[str, DataFrame], common_reactions: List[str]) Dict[str, DataFrame][source]
Filter deletion results to keep only common reactions across models.
- Args:
- deletion_results (Dict[str, pd.DataFrame]): Dictionary containing deletion results for each model.
Keys are model names and values are DataFrames with deletion results.
common_reactions (List[str]): List of reaction IDs representing common reactions to retain.
- Returns:
- Dict[str, pd.DataFrame]: Filtered deletion results dictionary.
Keys remain the same as deletion_results, and values are DataFrames filtered to include only common reactions.
- Notes:
This function filters deletion results for each model provided in deletion_results based on the list of common reactions. Only reactions present in common_reactions are retained in each model’s deletion results DataFrame.
- gsmm.csm.analyse_csm.filter_non_zero_fluxes(df: DataFrame, threshold: float = 1) DataFrame[source]
Filter non-zero fluxes from a DataFrame based on a specified threshold.
- Args:
df (pd.DataFrame): Input DataFrame containing flux data. threshold (float, optional): Threshold value for flux. Default is 1.
- Returns:
pd.DataFrame: Filtered DataFrame containing reactions with fluxes above or equal to the threshold.
- Raises:
ValueError: If the ‘Flux’ column is not present in the input DataFrame.
- Notes:
This function filters the input DataFrame to retain only rows where the ‘Flux’ column meets or exceeds the specified threshold. Missing flux values are filled with 0.
- gsmm.csm.analyse_csm.find_common_reactions(reaction_deletion_results: Dict[str, DataFrame]) List[str][source]
Find common reactions across all models based on their deletion results.
- Args:
- reaction_deletion_results (Dict[str, pd.DataFrame]): Dictionary containing deletion results for each model.
Keys are model names and values are DataFrames with deletion results.
- Returns:
List[str]: List of reaction IDs that are common across all models.
- Notes:
This function identifies common reactions present in the deletion results of all models provided. It computes the intersection of reaction sets from all model deletion results to find common reactions.
- gsmm.csm.analyse_csm.get_sink_fluxes(model: Model, model_name: str, metabolites: List[str]) List[Dict[str, Union[str, float]]][source]
Extract sink reaction fluxes for specified metabolites from a given COBRApy model.
- Args:
model (Model): COBRApy Model object from which sink fluxes will be extracted. model_name (str): Name or identifier of the model for logging purposes. metabolites (List[str]): List of metabolite IDs or names for which sink fluxes will be extracted.
- Returns:
- List[Dict[str, Union[str, float]]]: List of dictionaries, each containing information about a sink reaction flux.
Each dictionary typically includes keys ‘Metabolite’, ‘Flux’, and ‘Context_Model’.
- Notes:
This function iterates over each metabolite in the provided list of metabolites. For each metabolite, it identifies sink reactions (reactions starting with ‘SK’) and calculates their flux under optimal conditions using COBRApy’s optimization.
- gsmm.csm.analyse_csm.load_models(model_paths: Dict[str, str]) Dict[str, Model][source]
Load SBML models from specified paths into COBRApy Model objects.
- Args:
model_paths (Dict[str, str]): Dictionary where keys are model names and values are paths to SBML files.
- Returns:
Dict[str, Model]: Dictionary mapping model names to corresponding COBRApy Model objects.
- Notes:
This function iterates through the provided dictionary of model paths, attempts to load each SBML model using COBRApy’s read_sbml_model function, and stores the loaded models in a dictionary. If any path is invalid or model loading fails, an error message is printed and that model is skipped.
- gsmm.csm.analyse_csm.optimize_and_get_fluxes(model: Model) Dict[str, float][source]
Optimize the given COBRApy model and retrieve flux distributions.
- Args:
model (Model): COBRApy Model object to be optimized.
- Returns:
Dict[str, float]: Dictionary mapping reaction IDs to their corresponding flux values.
- Notes:
This function attempts to optimize the input model using COBRApy’s default solver. If successful, it returns a dictionary containing reaction IDs as keys and their respective flux values. If optimization fails due to an OptimizationError or Infeasible solution, an empty dictionary is returned.
- gsmm.csm.analyse_csm.perform_single_reaction_deletion(model: Model) DataFrame[source]
Perform single reaction deletion analysis on a given COBRApy model.
- Args:
model (Model): COBRApy Model object on which single reaction deletion will be performed.
- Returns:
- pd.DataFrame: DataFrame containing the results of single reaction deletion.
Index represents reaction IDs, and columns typically include ‘growth’ or similar metrics.
- Notes:
This function utilizes COBRApy’s single_reaction_deletion method to analyze the impact of deleting each individual reaction in the model on growth or another specified metric. It returns a DataFrame where each row corresponds to a reaction and columns represent the results of the deletion analysis.
- gsmm.csm.analyse_csm.save_data_for_visualization(df_fluxes: DataFrame, df_sink_fluxes: DataFrame, filename: str = 'flux_data.pkl')[source]
Save flux and sink flux dataframes to pickle files for visualization.
- Args:
df_fluxes (pd.DataFrame): DataFrame containing flux data to be saved. df_sink_fluxes (pd.DataFrame): DataFrame containing sink flux data to be saved. filename (str, optional): Name of the pickle file to save. Default is ‘flux_data.pkl’.
- Notes:
This function saves two DataFrames, df_fluxes and df_sink_fluxes, to pickle files. df_fluxes is saved directly to the specified filename. df_sink_fluxes is saved to a file with ‘sink_flux’ substituted for ‘flux’ in the filename.
- gsmm.csm.analyse_csm.save_filtered_reaction_deletion_results(filtered_results: Dict[str, DataFrame], output_dir: str) None[source]
Save filtered reaction deletion results to pickle files.
Parameters: - filtered_results (Dict[str, pd.DataFrame]): Dictionary mapping model names to filtered deletion results DataFrames. - output_dir (str): Directory path where to save the results.
- gsmm.csm.analyse_csm.save_reaction_deletion_results(models_dict: Dict[str, Model], output_dir: str) Dict[str, DataFrame][source]
Perform single reaction deletion on each model in models_dict and save the deletion results.
- Args:
- models_dict (Dict[str, Model]): Dictionary containing models to analyze.
Keys are model names and values are COBRApy Model objects.
output_dir (str): Directory path where the deletion results will be saved.
- Returns:
- Dict[str, pd.DataFrame]: Dictionary containing deletion results for each model.
Keys are model names and values are DataFrames with deletion results.
- Notes:
This function performs single reaction deletion for each model provided in models_dict. It saves the deletion results as pickle files in the specified output_dir with filenames formatted as “{model_name}_reaction_deletion_results.pkl”.
gsmm.csm.build_csm module
- gsmm.csm.build_csm.assign_reaction_confidences(model: Model, expression_data: DataFrame, gene_id_column: str, scores_column: str) DataFrame[source]
Assigns confidence levels to reactions based on associated genes’ expression data.
- Args:
model (cobra.Model): The metabolic model to assign reaction confidence levels to. expression_data (pd.DataFrame): DataFrame containing expression data and gene confidence levels. gene_id_column (str): Column name in expression_data containing gene IDs or names. scores_column (str): Name of the column in expression_data used for gene expression values.
- Returns:
pd.DataFrame: DataFrame with assigned confidence levels for reactions.
- Notes:
This function assigns confidence levels (ranging from 0 to 3) to reactions in the provided metabolic model based on the associated genes’ expression levels in expression_data. Specific biomass reactions are assigned the highest confidence level (3) by default.
The number of different confidence levels assigned depends on the quartiles of scores_column in expression_data, distributed as follows: - (-1): Less than 25th percentile - 0: 25th to 49.99th percentile - 1: 50th to 74.99th percentile - 2: 75th to 89.99th percentile - 3: >= 90th percentile
- Example:
If scores_column represents gene expression levels, the function assigns the confidence levels based on quartiles of these expression values.
- gsmm.csm.build_csm.extract_genes(data: DataFrame, id_column: Optional[str]) list[source]
Extracts gene IDs or names from a pandas DataFrame column.
- Args:
data (pd.DataFrame): DataFrame containing gene data. id_column (Union[str, None]): Name of the column containing gene IDs or names.
- Returns:
list: List of gene IDs or names extracted from the specified column.
- Raises:
ValueError: If id_column is not provided or does not exist in the DataFrame.
- Notes:
This function extracts gene IDs or names from a specified column in the provided DataFrame. It drops missing values (NaNs) and returns a list of extracted gene IDs or names.
- gsmm.csm.build_csm.filter_model_by_genes(model: Model, genes: list) Model[source]
Filters a COBRApy model by removing genes not present in a specified list.
- Args:
model (cobra.Model): The COBRApy model to be filtered. genes (list): List of gene IDs or names to retain in the model.
- Returns:
cobra.Model: Filtered COBRApy model containing only the specified genes.
- Notes:
This function creates a copy of the input model and removes genes that are not present in the provided list from the copied model. It also prunes unused reactions and metabolites to ensure consistency after gene removal.
- gsmm.csm.build_csm.load_expression_data(data_path: str) DataFrame[source]
Loads expression data from a CSV file into a pandas DataFrame.
- Args:
data_path (str): Path to the CSV file containing expression data.
- Returns:
pd.DataFrame: DataFrame containing the loaded expression data.
- Raises:
FileNotFoundError: If the specified data file path does not exist. IOError: If there is an error reading the CSV file.
- Notes:
This function uses pandas’ read_csv function to read expression data from a CSV file and returns it as a DataFrame. The CSV file should contain columns representing Gene IDs and corresponding expression values.
- gsmm.csm.build_csm.main(model_path: str, data_path: str, gene_id_column: Optional[str], scores_column: str, base_model_path: str)[source]
Executes a pipeline to load, filter, normalize data, assign reaction confidences, optimize a metabolic model using CORDA.
- Args:
model_path (str): Path to the SBML model file (Using big model like Recon3D). data_path (str): Path to the CSV file containing expression data. gene_id_column (Union[str, None]): Column name in expression_data containing gene IDs or names. scores_column (str): Name of the column in expression_data used for gene confidence levels. base_model_path (str): Path to save the filtered SBML model.
- Returns:
cobra.Model: Optimized COBRApy model (Context Specific Model) object after CORDA optimization.
- Notes:
This function orchestrates the entire process of setting up the solver, loading the SBML model, loading expression data, extracting genes, filtering the model based on genes, normalizing expression data, assigning reaction confidence levels based on gene expression, optimizing the model using CORDA, and saving the optimized model to a new SBML file.
- gsmm.csm.build_csm.normalize_expression_data(expression_data: DataFrame, scores_column: str) DataFrame[source]
Normalizes expression data and assigns confidence levels based on quantiles.
- Args:
expression_data (pd.DataFrame): DataFrame containing expression data. scores_column (str): Name of the column to normalize and use for confidence levels.
- Returns:
pd.DataFrame: DataFrame with normalized scores and assigned confidence levels.
- Notes:
This function uses MinMaxScaler from scikit-learn to normalize values in the specified column of the input DataFrame. It calculates confidence levels based on quantiles of the normalized scores.
- gsmm.csm.build_csm.optimize_model(model: Model, reaction_confidence_dict: dict) CORDA[source]
Initializes and optimizes a COBRApy model using CORDA with reaction confidence levels.
- Args:
model (cobra.Model): The COBRApy model to be optimized. reaction_confidence_dict (dict): Dictionary mapping reaction IDs to confidence levels.
- Returns:
CORDA: Optimized CORDA object containing the optimized model and solution.
- Notes:
This function initializes a CORDA optimization object using the provided COBRApy model and reaction confidence levels. CORDA (Cost Optimisation Reaction Dependency Assessment) optimizes metabolic models based on context-specific data and reactions’ confidence levels.
- gsmm.csm.build_csm.read_parent_model(model_path: str) Model[source]
Reads an SBML model from the specified path using COBRApy.
- Args:
model_path (str): Path to the SBML model file path to reconstruct from. This is usully base model.
- Returns:
cobra.Model: The COBRApy model object loaded from the SBML file.
- Raises:
FileNotFoundError: If the specified model file path does not exist. IOError: If there is an error reading the SBML model file.
- Notes:
This function uses COBRApy’s read_sbml_model function to read an SBML model file and return the corresponding COBRApy model object.
- gsmm.csm.build_csm.run_model_reconstruction(model_path: str, base_model_path: str, data_path: str, gene_id_column: str, scores_column: str) Optional[Model][source]
Runs the pipeline for reconstructing an optimized metabolic model from expression data.
- Args:
model_path (str): Path to the Parent SBML model file from which Base model will be derived. base_model_path (str): Path to save the filtered SBML Base model which will be used for Reconstruction of Context Specific Models. data_path (str): Path to the CSV file containing expression data. gene_id_column (Optional[str], optional): Column name in the expression data containing gene IDs or names. scores_column (str): Name of the column in the expression data used for gene confidence levels.
- Returns:
Optional[cobra.Model]: Optimized COBRApy model object after reconstruction, or None if an error occurs.
- Notes:
This function serves as an interface to run the main pipeline (main function) for reconstructing an optimized metabolic model from the provided expression data. It handles exceptions and prints error messages if reconstruction fails, returning None in case of errors.
- gsmm.csm.build_csm.set_glpk_solver() None[source]
Sets the default solver in COBRApy to GLPK (GNU Linear Programming Kit).
- Raises:
RuntimeError: If setting the GLPK solver fails for any reason.
- Notes:
This function updates the global solver configuration for COBRApy. GLPK is chosen as it is an open-source linear programming solver suitable for solving many types of optimization problems encountered in constraint-based modeling of metabolic networks.
gsmm.csm.config module
gsmm.csm.visualisation module
- gsmm.csm.visualisation.filter_and_save_results(filtered_results: dict, file_path: str) None[source]
Filter and save valid DataFrames from a dictionary to a pickle file.
- Args:
- filtered_results (dict): Dictionary containing results to filter and save. Each value should be a pandas
DataFrame with columns ‘ids’, ‘growth’, and ‘status’.
file_path (str): Path to the pickle file where filtered results will be saved.
- Returns:
None
- Notes:
This function iterates through the provided dictionary of results. It filters out DataFrames that do not contain the required columns (‘ids’, ‘growth’, ‘status’) and saves the valid DataFrames to a pickle file. If a DataFrame does not meet the criteria, a warning is logged and it is skipped.
- gsmm.csm.visualisation.load_data(filepath: str) Optional[DataFrame][source]
Load data from a pickle file into a pandas DataFrame.
- Args:
filepath (str): Path to the pickle file containing the data.
- Returns:
Optional[pd.DataFrame]: Loaded pandas DataFrame if successful, otherwise None.
- Notes:
This function attempts to load a pandas DataFrame from the specified pickle file. If the file is not found, a FileNotFoundError is caught and logged, returning None. Any other loading errors are also caught, logged, and None is returned.
- gsmm.csm.visualisation.plot_flux_correlation_heatmap(df_fluxes: DataFrame, save_path: str, show_plot: bool = False) None[source]
Generate a heatmap of correlation coefficients for fluxes between different models for all reactions.
Parameters: - df_fluxes (pd.DataFrame): DataFrame with ‘Model’, ‘Reaction’, and ‘Flux’ columns. - save_path (str): Path to save the heatmap. - show_plot (bool): Whether the plot should be plotted along (Default: False)
- Returns:
None
- gsmm.csm.visualisation.plot_flux_distribution_clustermap(df_fluxes: DataFrame, save_path: str, show_plot: bool = False) None[source]
Generate a clustermap for flux distribution across different models and reactions.
Parameters: - df_fluxes (pd.DataFrame): DataFrame with ‘Model’, ‘Reaction’, and ‘Flux’ columns. - save_path (str): Path to save the clustermap. - show_plot (bool): Whether the plot should be plotted along (Default: False)
- Returns:
None
- gsmm.csm.visualisation.plot_fluxes(flux_filepath: Optional[str] = 'flux_data.pkl', sink_flux_filepath: Optional[str] = 'sink_flux_data.pkl', show_plot: bool = False) None[source]
Plot flux distributions and sink fluxes using default or provided file paths.
Parameters: - flux_filepath (str, optional): File path to flux data (default: ‘flux_data.pkl’). - sink_flux_filepath (str, optional): File path to sink flux data (default: ‘sink_flux_data.pkl’). - show_plot (bool): Whether the plot should be plotted along (Default: False)
Returns: - None
- gsmm.csm.visualisation.plot_sink_flux_correlation_heatmap(df_sink_fluxes: DataFrame, save_path: str, show_plot: bool = False) None[source]
Generate a heatmap of correlation coefficients for sink fluxes between different models.
Parameters: - df_sink_fluxes (pd.DataFrame): DataFrame with ‘Metabolite’, ‘Flux’, ‘Context_Model’ columns. - save_path (str): Path to save the heatmap. - show_plot (bool): Whether the plot should be plotted along (Default: False)
- Returns:
None
- gsmm.csm.visualisation.plot_sink_fluxes_heatmap(df: DataFrame, output_file: str, show_plot: bool = False) None[source]
Plot a heatmap for sink fluxes across different context models.
Parameters: - df (pd.DataFrame): DataFrame containing sink flux data with columns ‘Metabolite’, ‘Flux’, ‘Context_Model’. - output_file (str): Path to save the sink fluxes heatmap. - show_plot (bool): Whether the plot should be plotted along (Default: False)
- Returns:
None