tlo.lm module

class Predictor(property_name: str = None, external: bool = False, conditions_are_mutually_exclusive: bool | None = None, conditions_are_exhaustive: bool | None = False)

Bases: object

A Predictor variable for the regression model.

Parameters:

property_name – A property of the population dataframe e.g. age, sex, etc. or if external=True the name of the external property that will be passed as a keyword argument to the LinearModel.predict method.
external – Whether the named property is external (True) and so will be passed as a keyword argument to the LinearModel.predict method) or is a property of the population dataframe (False).
conditions_are_mutually_exclusive – Whether the set of conditions that are declared for this predictor are all mutually exclusive, that is, for any pair of conditions, one condition evaluating to True implies the other must evaluate to False. If this is declared to be the case a more efficient method of evaluation will be used in LinearModel.predict. Note however that the validity of this declaration will not be checked so if this is set to True for predictors with non-mutually exclusive conditions, the model output will be erroneous.
conditions_are_exhaustive – Whether the set of conditions that are declared for this predictor are all exhaustive, that is at least one condition will always be True irrespective of the value of the property. If this is declared to be the case, a more efficient method of evaluation maye be used in LinearModel.predict`, though if a catch-all ``otherwise condition is included this flag will provide no benefit. Note that the validity of this declaration will not be checked so if this is set to True for predictors with non-exhaustive conditions, the model output will be erroneous.

when(condition: str | float | bool, value: float) → Predictor

otherwise(value: float) → Predictor

apply(callback: Callable[[Any], float]) → Predictor

class LinearModelType(value)

Bases: Enum

The type of model specifies how the results from the predictor are combined: ‘additive’ -> adds the effect_sizes from the predictors ‘logisitic’ -> multiples the effect_sizes from the predictors and applies the transform x/(1+x) [Thus, the intercept can be taken to be an Odds and effect_sizes Odds Ratios, and the prediction is a probability.] ‘multiplicative’ -> multiplies the effect_sizes from the predictors

ADDITIVE = 1

LOGISTIC = 2

MULTIPLICATIVE = 3

CUSTOM = 4

class LinearModel(lm_type: LinearModelType, intercept: float | int, *predictors: Predictor)

Bases: object

A linear model has an intercept and zero or more Predictor variables.

Parameters:

lm_type – Model type to use.
intercept – Intercept term for the model.
*predictors –
Any Predictor instances to use in computing output.

property lm_type: LinearModelType: The model type.

property intercept: float | int: The intercept value for the model.

property predictors: Tuple[Predictor]: The predictors used in calculating the model output.

static multiplicative(*predictors: Predictor)

Returns a multplicative LinearModel with intercept=1.0

Parameters:: predictors – One or more Predictor objects defining the model

static custom(predict_function, **kwargs)

Define a linear model using the supplied function

The function acts as a drop-in replacement to the predict function and must implement the interface:

(
self: LinearModel, df: Union[pd.DataFrame, pd.Series], rng: Optional[np.random.RandomState] = None, **kwargs

) -> pd.Series

It is the responsibility of the caller of predict to ensure they pass either a dataframe or an individual record as expected by the custom function.

See test_custom() in test_lm.py for a couple of examples.

predict(df: DataFrame, rng: RandomState | None = None, squeeze_single_row_output=True, **kwargs) → Series | bool_

Evaluate linear model output for a given set of input data.

Parameters:

df – The input DataFrame containing the input data to evaluate the model with.
rng – If set to a NumPy RandomState instance, returned output will be boolean Series corresponding to Bernoulli random variables sampled according to probabilities specified by model output. Otherwise model output directly returned.
squeeze_single_row_output – If rng argument is not None and this argument is set to True, the output for a df input with a single-row will be a scalar boolean value rather than a boolean Series, if set to False, the output will always be a Series.
**kwargs –
Values for any external variables included in model predictors.