tlo.analysis.utils module

General utility functions for TLO analysis


Parses logged output from a TLO run, split it into smaller logfiles and returns a class containing paths to these split logfiles.


log_filepath – file path to log file


a class containing paths to split logfiles

write_log_to_excel(filename, log_dataframes)[source]

Takes the output of parse_log_file() and creates an Excel file from dataframes


Returns a dictionary mapping calendar year (in years) to five year period i.e. { 1950: ‘1950-1954’, 1951: ‘1950-1954, …}


Make an ordered categorical type for calendar periods Returns CategoricalDType


Returns a dictionary mapping age (in years) to five year period i.e. { 0: ‘0-4’, 1: ‘0-4’, …, 119: ‘100+’, 120: ‘100+’ }


Make an ordered categorical type for age-groups Returns CategoricalDType

get_scenario_outputs(scenario_filename: str, outputs_dir: pathlib.Path) list[source]

Returns paths of folders associated with a batch_file, in chronological order.

get_scenario_info(scenario_output_dir: pathlib.Path) dict[source]

Utility function to get the the number draws and the number of runs in a batch set.

TODO: read the JSON file to get further information

load_pickled_dataframes(results_folder: pathlib.Path, draw=0, run=0, name=None) dict[source]

Utility function to create a dict contaning all the logs from the specified run within a batch set

extract_params(results_folder: pathlib.Path) Optional[pandas.core.frame.DataFrame][source]

Utility function to get overridden parameters from scenario runs

Returns dateframe summarizing parameters that change across the draws. It produces a dataframe with index of draw and columns of each parameters that is specified to be varied in the batch. NB. This does the extraction from run 0 in each draw, under the assumption that the over-written parameters are the same in each run.

extract_results(results_folder: pathlib.Path, module: str, key: str, column: Optional[str] = None, index: Optional[str] = None, custom_generate_series=None, do_scaling: bool = False) pandas.core.frame.DataFrame[source]

Utility function to unpack results

Produces a dataframe that summaries one series from the log, with column multi-index for the draw/run. If an ‘index’ component of the log_element is provided, the dataframe uses that index (but note that this will only work if the index is the same in each run). Optionally, instead of a series that exists in the dataframe already, a function can be provided that, when applied to the dataframe indicated, yields a new pd.Series. Optionally, with do_scaling, each element is multiplied by the the scaling_factor recorded in the simulation (if available)

summarize(results: pandas.core.frame.DataFrame, only_mean: bool = False, collapse_columns: bool = False) pandas.core.frame.DataFrame[source]

Utility function to compute summary statistics

Finds mean value and 95% interval across the runs for each draw.

get_grid(params: pandas.core.frame.DataFrame, res: pandas.core.series.Series)[source]

Utility function to create the arrays needed to plot a heatmap.

  • params (pd.DataFrame) – the dataframe of parameters with index=draw (made using extract_params()).

  • res (pd.Series) – results of interest with index=draw (can be made using extract_params())


grid as dictionary

format_gbd(gbd_df: pandas.core.frame.DataFrame)[source]

Format GBD data to give standarize categories for age_group and period


For a run from the Batch system that has not resulted in the creation of the pickles, reconstruct the pickles locally.

compare_number_of_deaths(logfile: pathlib.Path, resourcefilepath: pathlib.Path)[source]

Helper function to produce tables summarising deaths in the model run (given be a logfile) and the corresponding number of deaths in the GBD dataset. NB. * Requires output from the module tlo.methods.demography * Will do scaling automatically if the scaling-factor has been computed in the simulation (but not otherwise).

flatten_multi_index_series_into_dict_for_logging(ser: pandas.core.series.Series) dict[source]

Helper function that converts a pd.Series with multi-index into a dict format that is suitable for logging. It does this by converting the multi-index into keys of type str in a format that later be used to reconstruct the multi-index (using unflatten_flattened_multi_index_in_logging).

unflatten_flattened_multi_index_in_logging(_x: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Helper function that recreate the multi-index of logged results from a pd.DataFrame that is generated by parse_log. If a pd.DataFrame created by parse_log is the result of repeated logging of a pd.Series with a multi-index that was transformed before logging using flatten_multi_index_series_into_dict_for_logging, then the pd.DataFrame’s columns will be those flattened labels. This helper function recreates the original multi-index from which the flattened labels were created and applies it to the pd.DataFrame.

class LogsDict(file_names_and_paths)[source]


Parses module-specific log files and returns Pandas dataframes.

The dictionary returned has the format:

    <logger 1 name>: {
                       <log key 1>: <pandas dataframe>,
                       <log key 2>: <pandas dataframe>,
                       <log key 3>: <pandas dataframe>

    <logger 2 name>: {
                       <log key 4>: <pandas dataframe>,
                       <log key 5>: <pandas dataframe>,
                       <log key 6>: <pandas dataframe>
items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
values() an object providing a view on D's values[source]