epysurv.models.timepoint package¶
Submodules¶
epysurv.models.timepoint.bayes module¶
-
class
epysurv.models.timepoint.bayes.
Bayes
(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True, alpha: float = 0.05)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Evaluation of timepoints with the Bayes subsystem.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
include_recent_year
¶ is a boolean to decide if the year of timePoint also contributes w reference values.
-
alpha
¶ The parameter alpha is the (1 − α)-quantile to use in order to calculate the upper threshold. As default b, w, actY are set for the Bayes 1 system with alpha=0.05.
References
- 1
Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor’s thesis
- 2
Höhle, M., & Riebler, A. (2005). Höhle, Riebler: The R-Package “surveillance.” Sonderforschungsbereich (Vol. 386). Retrieved from https://epub.ub.uni-muenchen.de/1791/1/paper_422.pdf
-
epysurv.models.timepoint.boda module¶
-
class
epysurv.models.timepoint.boda.
Boda
(trend: bool = False, season: bool = False, prior: str = 'iid', alpha: float = 0.05, mc_munu: int = 100, mc_y: int = 10, quantile_method: str = 'MM')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The Boda model.
-
trend
¶ Boolean indicating whether a linear trend term should be included in the model for the expectation the log-scale
-
season
¶ Boolean to indicate whether a cyclic spline should be included.
-
prior
¶ Either of “iid”, “rw1” or “rw2”.
-
alpha
¶ The threshold for declaring an observed count as an aberration is the (1 − α) · 100% quantile of the predictive posterior.
-
mc_munu
¶
-
mc_y
¶ Number of samples of y to generate for each pair of the mean and size parameter. A total of mc.munu × mc.y samples are generated.
-
sampling_method
¶ Should one sample from the parameters joint distribution (joint) or from their respective marginal posterior distribution (marginals)
-
quantile_method
¶ Either of “MC” or “MM”. Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Manitz and Höhle 2013); or by sampling mc_munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.
-
epysurv.models.timepoint.cdc module¶
-
class
epysurv.models.timepoint.cdc.
CDC
(years_back: int = 5, window_half_width: int = 1, alpha: float = 0.001)[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
The CDC model.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
References
- 1
Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurence of notifiable diseases surveillance data. Statistics in Medicine 8, 323-329.
- 2
Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.
-
epysurv.models.timepoint.cusum module¶
-
class
epysurv.models.timepoint.cusum.
Cusum
(reference_value: float = 1.04, decision_boundary: float = 2.26, expected_numbers_method: str = 'mean', transform: str = 'standard', negbin_alpha: float = 0.1)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The Cusum model.
-
reference_value
¶
-
decision_boundary
¶
-
expected_numbers_method
¶ How to determine the expected number of cases – the following arguments are possible: {“glm”, “mean”}.
mean
Use the mean of all data points passed to
fit
.glm
Fit a glm to the data ponts passed to
fit
.
-
transform
¶ One of the following transformations (warning: Anscombe and NegBin transformations are experimental) - standard standardized variables z1 (based on asymptotic normality) - This is the default. - rossi standardized variables z3 as proposed by Rossi - anscombe anscombe residuals – experimental - anscombe2nd anscombe residuals as in Pierce and Schafer (1986) based on 2nd order approximation of E(X) – experimental - pearsonNegBin compute Pearson residuals for NegBin – experimental - anscombeNegBin anscombe residuals for NegBin – experimental -
"none"
no transformation
-
negbin_alpha
¶ Parameter of the negative binomial distribution, such that the variance is \(m + α \cdot m2\).
References
- 1
Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122
- 2
D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986
-
epysurv.models.timepoint.ears module¶
-
class
epysurv.models.timepoint.ears.
EarsC1
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
Computes a threshold for the number of counts based on values from the recent past.
This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
-
min_sigma
¶ If minSigma is higher than 0, the quantity zAlpha * minSigma is then the alerting threshold if the baseline is zero.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.ears.
EarsC2
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
Computes a threshold for the number of counts based on values from the recent past.
This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
-
min_sigma
¶ If minSigma is higher than 0, zAlpha * minSigma is then the alerting threshold if the baseline is zero.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.ears.
EarsC3
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
The EarsC3 model.
Computes a threshold for the number of counts based on values from the recent past. This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
epysurv.models.timepoint.farrington module¶
-
class
epysurv.models.timepoint.farrington.
Farrington
(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, alpha: float = 0.01, trend: bool = True, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3')[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
The Farrington algorithm.
For each time point uses a GLM to predict the number of counts according to the procedure by Farrington et al. (1996). This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
reweight
¶ Boolean specifying whether to perform reweighting step.
-
alpha
¶ An approximate (two-sided) (1 − α) prediction interval is calculated.
-
trend
¶ Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then no trend is fit.
-
past_period_cutoff
¶ Periods considered for suppression of low case numbers.
-
min_cases_in_past_periods
¶ The minimal number of cases in past periods such that an outbreak is considered.
-
power_transform
¶ Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.
References
- 1
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
-
-
class
epysurv.models.timepoint.farrington.
FarringtonFlexible
(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, weights_threshold: float = 2.58, alpha: float = 0.01, trend: bool = True, trend_threshold: float = 0.05, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3', past_weeks_not_included: int = 26, threshold_method: str = 'delta')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The extended Farrington algorithm.
For each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
reweight
¶ Boolean specifying whether to perform reweighting step.
-
weights_threshold
¶ Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).
-
alpha
¶ An approximate (one-sided) (1 − α) · 100% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1 − α) · 100% quantile serves as an upperbound.
-
trend
¶ Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.
-
trend_threshold
¶ Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).
-
past_period_cutoff
¶ Periods considered for suppression of low case numbers.
-
min_cases_in_past_periods
¶ The minimal number of cases in past periods such that an outbreak is considered. power_transform Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.
-
past_weeks_not_included
¶ Number of past weeks to ignore in the calculation.
-
threshold_method
¶ Method to be used to derive the upperbound. Options are - “delta” for the method described in Farrington et al. (1996) - “Noufaily” for the method described in Noufaily et al. (2012) - “muan” for the method extended from Noufaily et al. (2012)
References
- 1
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
- 2
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
- 3
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
epysurv.models.timepoint.glr module¶
Count data regression charts for the monitoring of surveillance time series.
Method as proposed by Höhle and Paul (2008). The implementation is described in Salmon et al. (2016).
-
class
epysurv.models.timepoint.glr.
GLRNegativeBinomial
(alpha: float = 0, glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases', x_max: float = 10000.0)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Generalized likelihood ratio algorithm using negative binomial distribution.
-
alpha
¶ The (known) dispersion parameter of the negative binomial distribution, i.e. the parametrization of the negative binomial is such that the variance is mean + alpha ∗ mean2. Note: This parametrization is the inverse of the shape parametrization used in R – for example in dnbinom and glr.nb. Hence, if alpha=0 then the negative binomial distribution boils down to the Poisson distribution and a call of algo.glrnb is equivalent to a call to algo.glrpois. If alpha=NULL the parameter is calculated as part of the in-control estimation. However, the parameter is estimated only once from the first fit. Subsequent fittings are only for the parameters of the linear predictor with alpha fixed.
-
glr_test_threshold
¶ Threshold in the GLR test, i.e. cγ.
-
m
¶ Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.
-
change
¶ A string specifying the type of the alternative. The two choices are “intercept” and “epi”.
-
direction
¶ Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.
-
upperbound_statistic
¶ A string specifying the type of upperbound-statistic that is returned. - “cases” for the number of cases that would have been necessary to produce an alarm - “value” for the GLR-statistic
-
x_max
¶ Maximum value to try for x to see if this is the upperbound number of cases before sounding an alarm (Default: 1e4). This only applies only when
upperbound_statistic == "cases"
.
References
- 1
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.glr.
GLRPoisson
(glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Generalized likelihood ratio algorithm using Poisson distribution.
-
glr_test_threshold
¶ Threshold in the GLR test, i.e. cγ.
-
m
¶ Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.
-
change
¶ A string specifying the type of the alternative. The two choices are “intercept” and “epi”.
-
direction
¶ Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.
-
upperbound_statistic
¶ a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed.
References
- 1
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
change
= 'intercept' a string specifying the type of the alternative. Currently the two choices are intercept and epi. See the SFB Discussion Paper 500 for details
-
direction
= ('inc', 'dec') Specifying the direction of testing in GLR scheme. With “inc” only increases in x are considered in the GLR-statistic, with “dec” decreases are regarded.
-
glr_test_threshold
= 5 threshold in the GLR test, i.e. cγ.
-
m
= -1 number of time instances back in time in the window-limited approach, i.e. the last value considered is max 1, n − M. To always look back until the first observation use M=-1.
-
upperbound_statistic
= 'cases' a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed (see below)
-
epysurv.models.timepoint.hmm module¶
-
class
epysurv.models.timepoint.hmm.
HMM
(n_observations: int = -1, n_hidden_states: int = 2, trend: bool = True, n_harmonics: int = 1, equal_covariate_effects: bool = False)[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
Hidden Markov model for outbreak detection.
-
n_observations
¶ number of observations back in time to use for fitting the HMM (including the current observation). Reasonable values are a multiple of observations per year, the default is -1, which means to use all possible values - for long series this might take very long time!
number of hidden states in the HMM – the typical choice is 2. The initial rates are set such that the noStates’th state is the one having the highest rate. In other words: this state is considered the outbreak state.
-
trend
¶ The two choices are “intercept” and “epi”.
-
n_harmonics
¶ Number of harmonic waves to include in the linear predictor.
-
equal_covariate_effects
¶ If set then all covariate effects parameters are equal for the states.
References
- 1
Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478
- 2
I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70
-
epysurv.models.timepoint.outbreak_p module¶
-
class
epysurv.models.timepoint.outbreak_p.
OutbreakP
(threshold: int = 100, upperbound_statistic: str = 'cases', max_upperbound_cases: int = 100000)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The OutbreakP model.
-
threshold
¶ The threshold value. Once the outbreak statistic is above this threshold an alarm is sounded.
-
upperbound_statistic
¶ A string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm (NNBA) or with “value” the outbreakP-statistic is computed.
-
max_upperbound_cases
¶ Upperbound when numerically searching for NNBA. Default is 1e5.
References
- 1
Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.
- 2
Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.
-
epysurv.models.timepoint.rki module¶
-
class
epysurv.models.timepoint.rki.
RKI
(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The old algorithm from the Robert Koch Institute.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
include_recent_year
¶ Is a boolean to decide if the year of timePoint also contributes w reference values.
-
Module contents¶
-
class
epysurv.models.timepoint.
Bayes
(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True, alpha: float = 0.05)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Evaluation of timepoints with the Bayes subsystem.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
include_recent_year
¶ is a boolean to decide if the year of timePoint also contributes w reference values.
-
alpha
¶ The parameter alpha is the (1 − α)-quantile to use in order to calculate the upper threshold. As default b, w, actY are set for the Bayes 1 system with alpha=0.05.
References
- 1
Riebler, A. (2004), Empirischer Vergleich von statistischen Methoden zur Ausbruchserkennung bei Surveillance Daten, Bachelor’s thesis
- 2
Höhle, M., & Riebler, A. (2005). Höhle, Riebler: The R-Package “surveillance.” Sonderforschungsbereich (Vol. 386). Retrieved from https://epub.ub.uni-muenchen.de/1791/1/paper_422.pdf
-
-
class
epysurv.models.timepoint.
Boda
(trend: bool = False, season: bool = False, prior: str = 'iid', alpha: float = 0.05, mc_munu: int = 100, mc_y: int = 10, quantile_method: str = 'MM')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The Boda model.
-
trend
¶ Boolean indicating whether a linear trend term should be included in the model for the expectation the log-scale
-
season
¶ Boolean to indicate whether a cyclic spline should be included.
-
prior
¶ Either of “iid”, “rw1” or “rw2”.
-
alpha
¶ The threshold for declaring an observed count as an aberration is the (1 − α) · 100% quantile of the predictive posterior.
-
mc_munu
¶
-
mc_y
¶ Number of samples of y to generate for each pair of the mean and size parameter. A total of mc.munu × mc.y samples are generated.
-
sampling_method
¶ Should one sample from the parameters joint distribution (joint) or from their respective marginal posterior distribution (marginals)
-
quantile_method
¶ Either of “MC” or “MM”. Indicates how to compute the quantile based on the posterior distribution (no matter the inference method): either by sampling mc.munu values from the posterior distribution of the parameters and then for each sampled parameters vector sampling mc.y response values so that one gets a vector of response values based on which one computes an empirical quantile (MC method, as explained in Manitz and Höhle 2013); or by sampling mc_munu from the posterior distribution of the parameters and then compute the quantile of the mixture distribution using bisectioning, which is faster.
-
-
class
epysurv.models.timepoint.
CDC
(years_back: int = 5, window_half_width: int = 1, alpha: float = 0.001)[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
The CDC model.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
References
- 1
Stroup, D., G. Williamson, J. Herndon, and J. Karon (1989). Detection of aberrations in the occurence of notifiable diseases surveillance data. Statistics in Medicine 8, 323-329.
- 2
Farrington, C. and N. Andrews (2003). Monitoring the Health of Populations, Chapter Outbreak Detection: Application to Infectious Disease Surveillance, pp. 203-231. Oxford University Press.
-
-
class
epysurv.models.timepoint.
Cusum
(reference_value: float = 1.04, decision_boundary: float = 2.26, expected_numbers_method: str = 'mean', transform: str = 'standard', negbin_alpha: float = 0.1)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The Cusum model.
-
reference_value
¶
-
decision_boundary
¶
-
expected_numbers_method
¶ How to determine the expected number of cases – the following arguments are possible: {“glm”, “mean”}.
mean
Use the mean of all data points passed to
fit
.glm
Fit a glm to the data ponts passed to
fit
.
-
transform
¶ One of the following transformations (warning: Anscombe and NegBin transformations are experimental) - standard standardized variables z1 (based on asymptotic normality) - This is the default. - rossi standardized variables z3 as proposed by Rossi - anscombe anscombe residuals – experimental - anscombe2nd anscombe residuals as in Pierce and Schafer (1986) based on 2nd order approximation of E(X) – experimental - pearsonNegBin compute Pearson residuals for NegBin – experimental - anscombeNegBin anscombe residuals for NegBin – experimental -
"none"
no transformation
-
negbin_alpha
¶ Parameter of the negative binomial distribution, such that the variance is \(m + α \cdot m2\).
References
- 1
Rossi, L. Lampugnani and M. Marchi (1999), An approximate CUSUM procedure for surveillance of health events, Statistics in Medicine, 18, 2111–2122
- 2
D. A. Pierce and D. W. Schafer (1986), Residuals in Generalized Linear Models, Journal of the American Statistical Association, 81, 977–986
-
-
class
epysurv.models.timepoint.
EarsC1
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
Computes a threshold for the number of counts based on values from the recent past.
This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
-
min_sigma
¶ If minSigma is higher than 0, the quantity zAlpha * minSigma is then the alerting threshold if the baseline is zero.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.
EarsC2
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
Computes a threshold for the number of counts based on values from the recent past.
This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
-
min_sigma
¶ If minSigma is higher than 0, zAlpha * minSigma is then the alerting threshold if the baseline is zero.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.
EarsC3
(alpha: float = 0.001, baseline: int = 7, min_sigma: float = 0)[source]¶ Bases:
epysurv.models.timepoint.ears._EarsBase
The EarsC3 model.
Computes a threshold for the number of counts based on values from the recent past. This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised. This method is especially useful for data without many historic values, since it only needs counts from the recent past.
-
alpha
¶ An approximate (two-sided)(1 − α) prediction interval is calculated.
-
baseline
¶ How many time points to use for calculating the baseline.
References
- 1
Fricker, R.D., Hegler, B.L, and Dunfee, D.A. (2008). Comparing syndromic surveillance detection methods: EARS versus a CUSUM-based methodology, 27:3407-3429, Statistics in medicine.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.
FarringtonFlexible
(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, weights_threshold: float = 2.58, alpha: float = 0.01, trend: bool = True, trend_threshold: float = 0.05, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3', past_weeks_not_included: int = 26, threshold_method: str = 'delta')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The extended Farrington algorithm.
For each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
reweight
¶ Boolean specifying whether to perform reweighting step.
-
weights_threshold
¶ Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).
-
alpha
¶ An approximate (one-sided) (1 − α) · 100% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1 − α) · 100% quantile serves as an upperbound.
-
trend
¶ Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.
-
trend_threshold
¶ Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).
-
past_period_cutoff
¶ Periods considered for suppression of low case numbers.
-
min_cases_in_past_periods
¶ The minimal number of cases in past periods such that an outbreak is considered. power_transform Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.
-
past_weeks_not_included
¶ Number of past weeks to ignore in the calculation.
-
threshold_method
¶ Method to be used to derive the upperbound. Options are - “delta” for the method described in Farrington et al. (1996) - “Noufaily” for the method described in Noufaily et al. (2012) - “muan” for the method extended from Noufaily et al. (2012)
References
- 1
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
- 2
Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012): An improved algorithm for outbreak detection in multiple surveillance systems. Statistics in Medicine, 32 (7), 1206-1222.
- 3
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.
Farrington
(years_back: int = 3, window_half_width: int = 3, reweight: bool = True, alpha: float = 0.01, trend: bool = True, past_period_cutoff: int = 4, min_cases_in_past_periods: int = 5, power_transform: str = '2/3')[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
The Farrington algorithm.
For each time point uses a GLM to predict the number of counts according to the procedure by Farrington et al. (1996). This is then compared to the observed number of counts. If the observation is above a specific quantile of the prediction interval, then an alarm is raised.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
reweight
¶ Boolean specifying whether to perform reweighting step.
-
alpha
¶ An approximate (two-sided) (1 − α) prediction interval is calculated.
-
trend
¶ Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then no trend is fit.
-
past_period_cutoff
¶ Periods considered for suppression of low case numbers.
-
min_cases_in_past_periods
¶ The minimal number of cases in past periods such that an outbreak is considered.
-
power_transform
¶ Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996). Use either - “2/3” for skewness correction (Default) - “1/2” for variance stabilizing transformation - “none” for no transformation.
References
- 1
Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996): A statistical algorithm for the early detection of outbreaks of infectious disease. J. R. Statist. Soc. A, 159, 547-563.
-
-
class
epysurv.models.timepoint.
GLRNegativeBinomial
(alpha: float = 0, glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases', x_max: float = 10000.0)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Generalized likelihood ratio algorithm using negative binomial distribution.
-
alpha
¶ The (known) dispersion parameter of the negative binomial distribution, i.e. the parametrization of the negative binomial is such that the variance is mean + alpha ∗ mean2. Note: This parametrization is the inverse of the shape parametrization used in R – for example in dnbinom and glr.nb. Hence, if alpha=0 then the negative binomial distribution boils down to the Poisson distribution and a call of algo.glrnb is equivalent to a call to algo.glrpois. If alpha=NULL the parameter is calculated as part of the in-control estimation. However, the parameter is estimated only once from the first fit. Subsequent fittings are only for the parameters of the linear predictor with alpha fixed.
-
glr_test_threshold
¶ Threshold in the GLR test, i.e. cγ.
-
m
¶ Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.
-
change
¶ A string specifying the type of the alternative. The two choices are “intercept” and “epi”.
-
direction
¶ Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.
-
upperbound_statistic
¶ A string specifying the type of upperbound-statistic that is returned. - “cases” for the number of cases that would have been necessary to produce an alarm - “value” for the GLR-statistic
-
x_max
¶ Maximum value to try for x to see if this is the upperbound number of cases before sounding an alarm (Default: 1e4). This only applies only when
upperbound_statistic == "cases"
.
References
- 1
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
-
class
epysurv.models.timepoint.
GLRPoisson
(glr_test_threshold: int = 5, m: int = -1, change: str = 'intercept', direction: Union[Tuple[str, str], Tuple[str]] = ('inc', 'dec'), upperbound_statistic: str = 'cases')[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
Generalized likelihood ratio algorithm using Poisson distribution.
-
glr_test_threshold
¶ Threshold in the GLR test, i.e. cγ.
-
m
¶ Number of time instances back in time in the window-limited approach, i.e. the last value considered is max(1, n − m). To always look back until the first observation use -1.
-
change
¶ A string specifying the type of the alternative. The two choices are “intercept” and “epi”.
-
direction
¶ Specifying the direction of testing in GLR scheme. - (“inc”,) only increases in x are considered in the GLR-statistic - (“dec”,) only decreases are regarded - (“inc”, “dec”) both increases and decreases are regarded.
-
upperbound_statistic
¶ a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed.
References
- 1
Höhle, M. and Paul, M. (2008): Count data regression charts for the monitoring of surveillance time series. Computational Statistics and Data Analysis, 52 (9), 4357-4368.
- 2
Salmon, M., Schumacher, D. and Höhle, M. (2016): Monitoring count time series in R: Aberration detection in public health surveillance. Journal of Statistical Software, 70 (10), 1-35. doi: 10.18637/jss.v070.i10
-
change
= 'intercept' a string specifying the type of the alternative. Currently the two choices are intercept and epi. See the SFB Discussion Paper 500 for details
-
direction
= ('inc', 'dec') Specifying the direction of testing in GLR scheme. With “inc” only increases in x are considered in the GLR-statistic, with “dec” decreases are regarded.
-
glr_test_threshold
= 5 threshold in the GLR test, i.e. cγ.
-
m
= -1 number of time instances back in time in the window-limited approach, i.e. the last value considered is max 1, n − M. To always look back until the first observation use M=-1.
-
upperbound_statistic
= 'cases' a string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm or with “value” the GLR-statistic is computed (see below)
-
-
class
epysurv.models.timepoint.
HMM
(n_observations: int = -1, n_hidden_states: int = 2, trend: bool = True, n_harmonics: int = 1, equal_covariate_effects: bool = False)[source]¶ Bases:
epysurv.models.timepoint._base.DisProgBasedAlgorithm
Hidden Markov model for outbreak detection.
-
n_observations
¶ number of observations back in time to use for fitting the HMM (including the current observation). Reasonable values are a multiple of observations per year, the default is -1, which means to use all possible values - for long series this might take very long time!
number of hidden states in the HMM – the typical choice is 2. The initial rates are set such that the noStates’th state is the one having the highest rate. In other words: this state is considered the outbreak state.
-
trend
¶ The two choices are “intercept” and “epi”.
-
n_harmonics
¶ Number of harmonic waves to include in the linear predictor.
-
equal_covariate_effects
¶ If set then all covariate effects parameters are equal for the states.
References
- 1
Le Strat and F. Carrat, Monitoring Epidemiologic Surveillance Data using Hidden Markov Models (1999), Statistics in Medicine, 18, 3463–3478
- 2
I.L. MacDonald and W. Zucchini, Hidden Markov and Other Models for Discrete-valued Time Series, (1997), Chapman & Hall, Monographs on Statistics and applied Probability 70
-
-
class
epysurv.models.timepoint.
OutbreakP
(threshold: int = 100, upperbound_statistic: str = 'cases', max_upperbound_cases: int = 100000)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The OutbreakP model.
-
threshold
¶ The threshold value. Once the outbreak statistic is above this threshold an alarm is sounded.
-
upperbound_statistic
¶ A string specifying the type of upperbound-statistic that is returned. With “cases” the number of cases that would have been necessary to produce an alarm (NNBA) or with “value” the outbreakP-statistic is computed.
-
max_upperbound_cases
¶ Upperbound when numerically searching for NNBA. Default is 1e5.
References
- 1
Frisén, M., Andersson and Schiöler, L., (2009), Robust outbreak surveillance of epidemics in Sweden, Statistics in Medicine, 28(3):476-493.
- 2
Frisén, M. and Andersson, E., (2009) Semiparametric Surveillance of Monotonic Changes, Sequential Analysis 28(4):434-454.
-
-
class
epysurv.models.timepoint.
RKI
(years_back: int = 0, window_half_width: int = 6, include_recent_year: bool = True)[source]¶ Bases:
epysurv.models.timepoint._base.STSBasedAlgorithm
The old algorithm from the Robert Koch Institute.
-
years_back
¶ How many years back in time to include when forming the base counts.
-
window_half_width
¶ Number of weeks to include before and after the current week in each year.
-
include_recent_year
¶ Is a boolean to decide if the year of timePoint also contributes w reference values.
-