tlo.analysis.utils module¶
General utility functions for TLO analysis
- parse_log_file(log_filepath)[source]¶
Parses logged output from a TLO run, split it into smaller logfiles and returns a class containing paths to these split logfiles.
- Parameters
log_filepath – file path to log file
- Returns
a class containing paths to split logfiles
- write_log_to_excel(filename, log_dataframes)[source]¶
Takes the output of parse_log_file() and creates an Excel file from dataframes
- make_calendar_period_lookup()[source]¶
Returns a dictionary mapping calendar year (in years) to five year period i.e. { 1950: ‘1950-1954’, 1951: ‘1950-1954, …}
- make_calendar_period_type()[source]¶
Make an ordered categorical type for calendar periods Returns CategoricalDType
- make_age_grp_lookup()[source]¶
Returns a dictionary mapping age (in years) to five year period i.e. { 0: ‘0-4’, 1: ‘0-4’, …, 119: ‘100+’, 120: ‘100+’ }
- make_age_grp_types()[source]¶
Make an ordered categorical type for age-groups Returns CategoricalDType
- get_scenario_outputs(scenario_filename: str, outputs_dir: pathlib.Path) list [source]¶
Returns paths of folders associated with a batch_file, in chronological order.
- get_scenario_info(scenario_output_dir: pathlib.Path) dict [source]¶
Utility function to get the the number draws and the number of runs in a batch set.
TODO: read the JSON file to get further information
- load_pickled_dataframes(results_folder: pathlib.Path, draw=0, run=0, name=None) dict [source]¶
Utility function to create a dict contaning all the logs from the specified run within a batch set
- extract_params(results_folder: pathlib.Path) Optional[pandas.core.frame.DataFrame] [source]¶
Utility function to get overridden parameters from scenario runs
Returns dateframe summarizing parameters that change across the draws. It produces a dataframe with index of draw and columns of each parameters that is specified to be varied in the batch. NB. This does the extraction from run 0 in each draw, under the assumption that the over-written parameters are the same in each run.
- extract_results(results_folder: pathlib.Path, module: str, key: str, column: Optional[str] = None, index: Optional[str] = None, custom_generate_series=None, do_scaling: bool = False) pandas.core.frame.DataFrame [source]¶
Utility function to unpack results
Produces a dataframe that summaries one series from the log, with column multi-index for the draw/run. If an ‘index’ component of the log_element is provided, the dataframe uses that index (but note that this will only work if the index is the same in each run). Optionally, instead of a series that exists in the dataframe already, a function can be provided that, when applied to the dataframe indicated, yields a new pd.Series. Optionally, with do_scaling, each element is multiplied by the the scaling_factor recorded in the simulation (if available)
- summarize(results: pandas.core.frame.DataFrame, only_mean: bool = False, collapse_columns: bool = False) pandas.core.frame.DataFrame [source]¶
Utility function to compute summary statistics
Finds mean value and 95% interval across the runs for each draw.
- get_grid(params: pandas.core.frame.DataFrame, res: pandas.core.series.Series)[source]¶
Utility function to create the arrays needed to plot a heatmap.
- Parameters
params (pd.DataFrame) – the dataframe of parameters with index=draw (made using extract_params()).
res (pd.Series) – results of interest with index=draw (can be made using extract_params())
- Returns
grid as dictionary
- format_gbd(gbd_df: pandas.core.frame.DataFrame)[source]¶
Format GBD data to give standarize categories for age_group and period
- create_pickles_locally(scenario_output_dir)[source]¶
For a run from the Batch system that has not resulted in the creation of the pickles, reconstruct the pickles locally.
- compare_number_of_deaths(logfile: pathlib.Path, resourcefilepath: pathlib.Path)[source]¶
Helper function to produce tables summarising deaths in the model run (given be a logfile) and the corresponding number of deaths in the GBD dataset. NB. * Requires output from the module tlo.methods.demography * Will do scaling automatically if the scaling-factor has been computed in the simulation (but not otherwise).
- flatten_multi_index_series_into_dict_for_logging(ser: pandas.core.series.Series) dict [source]¶
Helper function that converts a pd.Series with multi-index into a dict format that is suitable for logging. It does this by converting the multi-index into keys of type str in a format that later be used to reconstruct the multi-index (using unflatten_flattened_multi_index_in_logging).
- unflatten_flattened_multi_index_in_logging(_x: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]¶
Helper function that recreate the multi-index of logged results from a pd.DataFrame that is generated by parse_log. If a pd.DataFrame created by parse_log is the result of repeated logging of a pd.Series with a multi-index that was transformed before logging using flatten_multi_index_series_into_dict_for_logging, then the pd.DataFrame’s columns will be those flattened labels. This helper function recreates the original multi-index from which the flattened labels were created and applies it to the pd.DataFrame.
- class LogsDict(file_names_and_paths)[source]¶
Bases:
collections.abc.Mapping
Parses module-specific log files and returns Pandas dataframes.
The dictionary returned has the format:
{ <logger 1 name>: { <log key 1>: <pandas dataframe>, <log key 2>: <pandas dataframe>, <log key 3>: <pandas dataframe> }, <logger 2 name>: { <log key 4>: <pandas dataframe>, <log key 5>: <pandas dataframe>, <log key 6>: <pandas dataframe> }, ... }