epysurv.models.timepoint package

Submodules

epysurv.models.timepoint.bayes module

class epysurv.models.timepoint.bayes.Bayes(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True, alpha: float = 0.05)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Evaluation of timepoints with the Bayes subsystem.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

include_recent_year

is a boolean to decide if the year of timePoint also contributes w reference values.

alpha

The parameter alpha is the (1 − α)-quantile to use in order to calculate the upper threshold. As default b, w, actY are set for the Bayes 1 system with alpha=0.05.

References

1

Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor’s thesis

2

Höhle, M., & Riebler, A. (2005). Höhle, Riebler: The R-Package “surveillance.” Sonderforschungsbereich (Vol. 386). Retrieved from https://epub.ub.uni-muenchen.de/1791/1/paper_422.pdf

epysurv.models.timepoint.boda module

class epysurv.models.timepoint.boda.Boda(trend: bool = False, season: bool = False, prior: str = 'iid', alpha: float = 0.05, mc_munu: int = 100, mc_y: int = 10, quantile_method: str = 'MM')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The Boda model.

trend

Boolean indicating whether a linear trend term should be included in the model for the expectation the log-scale

season

Boolean to indicate whether a cyclic spline should be included.

prior

Either of “iid”, “rw1” or “rw2”.

alpha

The threshold for declaring an observed count as an aberration is the (1 − α) · 100% quantile of the predictive posterior.

mc_munu
mc_y

Number of samples of y to generate for each pair of the mean and size parameter. A total of mc.munu × mc.y samples are generated.

sampling_method

Should one sample from the parameters joint distribution (joint) or from their respective marginal posterior distribution (marginals)

quantile_method

Either of “MC” or “MM”. Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Manitz and Höhle 2013); or by sampling mc_munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.

epysurv.models.timepoint.cdc module

class epysurv.models.timepoint.cdc.CDC(years_back: int = 5, window_half_width: int = 1, alpha: float = 0.001)[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

The CDC model.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

References

1

Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurence of notifiable diseases surveillance data. Statistics in Medicine 8, 323-329.

2

Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.

epysurv.models.timepoint.cusum module

class epysurv.models.timepoint.cusum.Cusum(reference_value: float = 1.04, decision_boundary: float = 2.26, expected_numbers_method: str = 'mean', transform: str = 'standard', negbin_alpha: float = 0.1)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The Cusum model.

reference_value
decision_boundary
expected_numbers_method

How to determine the expected number of cases – the following arguments are possible: {“glm”, “mean”}.

mean

Use the mean of all data points passed to fit.

glm

Fit a glm to the data ponts passed to fit.

transform

One of the following transformations (warning: Anscombe and NegBin transformations are experimental) - standard standardized variables z1 (based on asymptotic normality) - This is the default. - rossi standardized variables z3 as proposed by Rossi - anscombe anscombe residuals – experimental - anscombe2nd anscombe residuals as in Pierce and Schafer (1986) based on 2nd order approximation of E(X) – experimental - pearsonNegBin compute Pearson residuals for NegBin – experimental - anscombeNegBin anscombe residuals for NegBin – experimental - "none" no transformation

negbin_alpha

Parameter of the negative binomial distribution, such that the variance is \(m + α \cdot m2\).

References

1
  1. Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122

2

D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986

epysurv.models.timepoint.ears module

class epysurv.models.timepoint.ears.EarsC1(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

Computes a threshold for the number of counts based on values from the recent past.

This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

min_sigma

If minSigma is higher than 0, the quantity zAlpha * minSigma is then the alerting threshold if the baseline is zero.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.ears.EarsC2(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

Computes a threshold for the number of counts based on values from the recent past.

This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

min_sigma

If minSigma is higher than 0, zAlpha * minSigma is then the alerting threshold if the baseline is zero.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.ears.EarsC3(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

The EarsC3 model.

Computes a threshold for the number of counts based on values from the recent past. This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

epysurv.models.timepoint.farrington module

class epysurv.models.timepoint.farrington.Farrington(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, alpha: float = 0.01, trend: bool = True, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3')[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

The Farrington algorithm.

For each time point uses a GLM to predict the number of counts according to the procedure by Farrington et al. (1996). This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

reweight

Boolean specifying whether to perform reweighting step.

alpha

An approximate (two-sided) (1 − α) prediction interval is calculated.

trend

Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then no trend is fit.

past_period_cutoff

Periods considered for suppression of low case numbers.

min_cases_in_past_periods

The minimal number of cases in past periods such that an outbreak is considered.

power_transform

Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.

References

1

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

class epysurv.models.timepoint.farrington.FarringtonFlexible(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, weights_threshold: float = 2.58, alpha: float = 0.01, trend: bool = True, trend_threshold: float = 0.05, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3', past_weeks_not_included: int = 26, threshold_method: str = 'delta')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The extended Farrington algorithm.

For each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

reweight

Boolean specifying whether to perform reweighting step.

weights_threshold

Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).

alpha

An approximate (one-sided) (1 − α) · 100% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1 − α) · 100% quantile serves as an upperbound.

trend

Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.

trend_threshold

Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).

past_period_cutoff

Periods considered for suppression of low case numbers.

min_cases_in_past_periods

The minimal number of cases in past periods such that an outbreak is considered. power_transform Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.

past_weeks_not_included

Number of past weeks to ignore in the calculation.

threshold_method

Method to be used to derive the upperbound. Options are - “delta” for the method described in Farrington et al. (1996) - “Noufaily” for the method described in Noufaily et al. (2012) - “muan” for the method extended from Noufaily et al. (2012)

References

1

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

2

Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.

3

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

epysurv.models.timepoint.glr module

Count data regression charts for the monitoring of surveillance time series.

Method as proposed by Höhle and Paul (2008). The implementation is described in Salmon et al. (2016).

class epysurv.models.timepoint.glr.GLRNegativeBinomial(alpha: float = 0, glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases', x_max: float = 10000.0)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Generalized likelihood ratio algorithm using negative binomial distribution.

alpha

The (known) dispersion parameter of the negative binomial distribution, i.e. the parametrization of the negative binomial is such that the variance is mean + alpha ∗ mean2. Note: This parametrization is the inverse of the shape parametrization used in R – for example in dnbinom and glr.nb. Hence, if alpha=0 then the negative binomial distribution boils down to the Poisson distribution and a call of algo.glrnb is equivalent to a call to algo.glrpois. If alpha=NULL the parameter is calculated as part of the in-control estimation. However, the parameter is estimated only once from the first fit. Subsequent fittings are only for the parameters of the linear predictor with alpha fixed.

glr_test_threshold

Threshold in the GLR test, i.e. cγ.

m

Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.

change

A string specifying the type of the alternative. The two choices are “intercept” and “epi”.

direction

Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.

upperbound_statistic

A string specifying the type of upperbound-statistic that is returned. - “cases” for the number of cases that would have been necessary to produce an alarm - “value” for the GLR-statistic

x_max

Maximum value to try for x to see if this is the upperbound number of cases before sounding an alarm (Default: 1e4). This only applies only when upperbound_statistic == "cases".

References

1

Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.glr.GLRPoisson(glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Generalized likelihood ratio algorithm using Poisson distribution.

glr_test_threshold

Threshold in the GLR test, i.e. cγ.

m

Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.

change

A string specifying the type of the alternative. The two choices are “intercept” and “epi”.

direction

Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.

upperbound_statistic

a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed.

References

1

Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

change = 'intercept'

a string specifying the type of the alternative. Currently the two choices are intercept and epi. See the SFB Discussion Paper 500 for details

direction = ('inc', 'dec')

Specifying the direction of testing in GLR scheme. With “inc” only increases in x are considered in the GLR-statistic, with “dec” decreases are regarded.

glr_test_threshold = 5

threshold in the GLR test, i.e. cγ.

m = -1

number of time instances back in time in the window-limited approach, i.e. the last value considered is max 1, n − M. To always look back until the first observation use M=-1.

upperbound_statistic = 'cases'

a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed (see below)

epysurv.models.timepoint.hmm module

class epysurv.models.timepoint.hmm.HMM(n_observations: int = -1, n_hidden_states: int = 2, trend: bool = True, n_harmonics: int = 1, equal_covariate_effects: bool = False)[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

Hidden Markov model for outbreak detection.

n_observations

number of observations back in time to use for fitting the HMM (including the current observation). Reasonable values are a multiple of observations per year, the default is -1, which means to use all possible values - for long series this might take very long time!

n_hidden_states

number of hidden states in the HMM – the typical choice is 2. The initial rates are set such that the noStates’th state is the one having the highest rate. In other words: this state is considered the outbreak state.

trend

The two choices are “intercept” and “epi”.

n_harmonics

Number of harmonic waves to include in the linear predictor.

equal_covariate_effects

If set then all covariate effects parameters are equal for the states.

References

1
  1. Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478

2

I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70

epysurv.models.timepoint.outbreak_p module

class epysurv.models.timepoint.outbreak_p.OutbreakP(threshold: int = 100, upperbound_statistic: str = 'cases', max_upperbound_cases: int = 100000)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The OutbreakP model.

threshold

The threshold value. Once the outbreak statistic is above this threshold an alarm is sounded.

upperbound_statistic

A string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm (NNBA) or with “value” the outbreakP-statistic is computed.

max_upperbound_cases

Upperbound when numerically searching for NNBA. Default is 1e5.

References

1

Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.

2

Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.

epysurv.models.timepoint.rki module

class epysurv.models.timepoint.rki.RKI(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The old algorithm from the Robert Koch Institute.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

include_recent_year

Is a boolean to decide if the year of timePoint also contributes w reference values.

Module contents

class epysurv.models.timepoint.Bayes(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True, alpha: float = 0.05)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Evaluation of timepoints with the Bayes subsystem.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

include_recent_year

is a boolean to decide if the year of timePoint also contributes w reference values.

alpha

The parameter alpha is the (1 − α)-quantile to use in order to calculate the upper threshold. As default b, w, actY are set for the Bayes 1 system with alpha=0.05.

References

1

Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor’s thesis

2

Höhle, M., & Riebler, A. (2005). Höhle, Riebler: The R-Package “surveillance.” Sonderforschungsbereich (Vol. 386). Retrieved from https://epub.ub.uni-muenchen.de/1791/1/paper_422.pdf

class epysurv.models.timepoint.Boda(trend: bool = False, season: bool = False, prior: str = 'iid', alpha: float = 0.05, mc_munu: int = 100, mc_y: int = 10, quantile_method: str = 'MM')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The Boda model.

trend

Boolean indicating whether a linear trend term should be included in the model for the expectation the log-scale

season

Boolean to indicate whether a cyclic spline should be included.

prior

Either of “iid”, “rw1” or “rw2”.

alpha

The threshold for declaring an observed count as an aberration is the (1 − α) · 100% quantile of the predictive posterior.

mc_munu
mc_y

Number of samples of y to generate for each pair of the mean and size parameter. A total of mc.munu × mc.y samples are generated.

sampling_method

Should one sample from the parameters joint distribution (joint) or from their respective marginal posterior distribution (marginals)

quantile_method

Either of “MC” or “MM”. Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Manitz and Höhle 2013); or by sampling mc_munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.

class epysurv.models.timepoint.CDC(years_back: int = 5, window_half_width: int = 1, alpha: float = 0.001)[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

The CDC model.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

References

1

Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurence of notifiable diseases surveillance data. Statistics in Medicine 8, 323-329.

2

Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.

class epysurv.models.timepoint.Cusum(reference_value: float = 1.04, decision_boundary: float = 2.26, expected_numbers_method: str = 'mean', transform: str = 'standard', negbin_alpha: float = 0.1)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The Cusum model.

reference_value
decision_boundary
expected_numbers_method

How to determine the expected number of cases – the following arguments are possible: {“glm”, “mean”}.

mean

Use the mean of all data points passed to fit.

glm

Fit a glm to the data ponts passed to fit.

transform

One of the following transformations (warning: Anscombe and NegBin transformations are experimental) - standard standardized variables z1 (based on asymptotic normality) - This is the default. - rossi standardized variables z3 as proposed by Rossi - anscombe anscombe residuals – experimental - anscombe2nd anscombe residuals as in Pierce and Schafer (1986) based on 2nd order approximation of E(X) – experimental - pearsonNegBin compute Pearson residuals for NegBin – experimental - anscombeNegBin anscombe residuals for NegBin – experimental - "none" no transformation

negbin_alpha

Parameter of the negative binomial distribution, such that the variance is \(m + α \cdot m2\).

References

1
  1. Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122

2

D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986

class epysurv.models.timepoint.EarsC1(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

Computes a threshold for the number of counts based on values from the recent past.

This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

min_sigma

If minSigma is higher than 0, the quantity zAlpha * minSigma is then the alerting threshold if the baseline is zero.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.EarsC2(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

Computes a threshold for the number of counts based on values from the recent past.

This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

min_sigma

If minSigma is higher than 0, zAlpha * minSigma is then the alerting threshold if the baseline is zero.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.EarsC3(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]

Bases: epysurv.models.timepoint.ears._EarsBase

The EarsC3 model.

Computes a threshold for the number of counts based on values from the recent past. This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.

alpha

An approximate (two-sided)(1 − α) prediction interval is calculated.

baseline

How many time points to use for calculating the baseline.

References

1

Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.FarringtonFlexible(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, weights_threshold: float = 2.58, alpha: float = 0.01, trend: bool = True, trend_threshold: float = 0.05, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3', past_weeks_not_included: int = 26, threshold_method: str = 'delta')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The extended Farrington algorithm.

For each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

reweight

Boolean specifying whether to perform reweighting step.

weights_threshold

Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).

alpha

An approximate (one-sided) (1 − α) · 100% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1 − α) · 100% quantile serves as an upperbound.

trend

Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.

trend_threshold

Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).

past_period_cutoff

Periods considered for suppression of low case numbers.

min_cases_in_past_periods

The minimal number of cases in past periods such that an outbreak is considered. power_transform Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.

past_weeks_not_included

Number of past weeks to ignore in the calculation.

threshold_method

Method to be used to derive the upperbound. Options are - “delta” for the method described in Farrington et al. (1996) - “Noufaily” for the method described in Noufaily et al. (2012) - “muan” for the method extended from Noufaily et al. (2012)

References

1

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

2

Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.

3

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.Farrington(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, alpha: float = 0.01, trend: bool = True, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3')[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

The Farrington algorithm.

For each time point uses a GLM to predict the number of counts according to the procedure by Farrington et al. (1996). This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

reweight

Boolean specifying whether to perform reweighting step.

alpha

An approximate (two-sided) (1 − α) prediction interval is calculated.

trend

Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then no trend is fit.

past_period_cutoff

Periods considered for suppression of low case numbers.

min_cases_in_past_periods

The minimal number of cases in past periods such that an outbreak is considered.

power_transform

Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.

References

1

Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.

class epysurv.models.timepoint.GLRNegativeBinomial(alpha: float = 0, glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases', x_max: float = 10000.0)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Generalized likelihood ratio algorithm using negative binomial distribution.

alpha

The (known) dispersion parameter of the negative binomial distribution, i.e. the parametrization of the negative binomial is such that the variance is mean + alpha ∗ mean2. Note: This parametrization is the inverse of the shape parametrization used in R – for example in dnbinom and glr.nb. Hence, if alpha=0 then the negative binomial distribution boils down to the Poisson distribution and a call of algo.glrnb is equivalent to a call to algo.glrpois. If alpha=NULL the parameter is calculated as part of the in-control estimation. However, the parameter is estimated only once from the first fit. Subsequent fittings are only for the parameters of the linear predictor with alpha fixed.

glr_test_threshold

Threshold in the GLR test, i.e. cγ.

m

Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.

change

A string specifying the type of the alternative. The two choices are “intercept” and “epi”.

direction

Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.

upperbound_statistic

A string specifying the type of upperbound-statistic that is returned. - “cases” for the number of cases that would have been necessary to produce an alarm - “value” for the GLR-statistic

x_max

Maximum value to try for x to see if this is the upperbound number of cases before sounding an alarm (Default: 1e4). This only applies only when upperbound_statistic == "cases".

References

1

Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

class epysurv.models.timepoint.GLRPoisson(glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases')[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

Generalized likelihood ratio algorithm using Poisson distribution.

glr_test_threshold

Threshold in the GLR test, i.e. cγ.

m

Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.

change

A string specifying the type of the alternative. The two choices are “intercept” and “epi”.

direction

Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.

upperbound_statistic

a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed.

References

1

Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.

2

Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10

change = 'intercept'

a string specifying the type of the alternative. Currently the two choices are intercept and epi. See the SFB Discussion Paper 500 for details

direction = ('inc', 'dec')

Specifying the direction of testing in GLR scheme. With “inc” only increases in x are considered in the GLR-statistic, with “dec” decreases are regarded.

glr_test_threshold = 5

threshold in the GLR test, i.e. cγ.

m = -1

number of time instances back in time in the window-limited approach, i.e. the last value considered is max 1, n − M. To always look back until the first observation use M=-1.

upperbound_statistic = 'cases'

a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed (see below)

class epysurv.models.timepoint.HMM(n_observations: int = -1, n_hidden_states: int = 2, trend: bool = True, n_harmonics: int = 1, equal_covariate_effects: bool = False)[source]

Bases: epysurv.models.timepoint._base.DisProgBasedAlgorithm

Hidden Markov model for outbreak detection.

n_observations

number of observations back in time to use for fitting the HMM (including the current observation). Reasonable values are a multiple of observations per year, the default is -1, which means to use all possible values - for long series this might take very long time!

n_hidden_states

number of hidden states in the HMM – the typical choice is 2. The initial rates are set such that the noStates’th state is the one having the highest rate. In other words: this state is considered the outbreak state.

trend

The two choices are “intercept” and “epi”.

n_harmonics

Number of harmonic waves to include in the linear predictor.

equal_covariate_effects

If set then all covariate effects parameters are equal for the states.

References

1
  1. Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478

2

I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70

class epysurv.models.timepoint.OutbreakP(threshold: int = 100, upperbound_statistic: str = 'cases', max_upperbound_cases: int = 100000)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The OutbreakP model.

threshold

The threshold value. Once the outbreak statistic is above this threshold an alarm is sounded.

upperbound_statistic

A string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm (NNBA) or with “value” the outbreakP-statistic is computed.

max_upperbound_cases

Upperbound when numerically searching for NNBA. Default is 1e5.

References

1

Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.

2

Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.

class epysurv.models.timepoint.RKI(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True)[source]

Bases: epysurv.models.timepoint._base.STSBasedAlgorithm

The old algorithm from the Robert Koch Institute.

years_back

How many years back in time to include when forming the base counts.

window_half_width

Number of weeks to include before and after the current week in each year.

include_recent_year

Is a boolean to decide if the year of timePoint also contributes w reference values.