Correlated Time Series

Functions for the analysis of correlated time series.

Note that if the inherent timescales of the system are long compared to duration of the time series being analyzed, then results will be inaccurate and unreliable.

If time series have initial transients should detected (with ‘detect_equilibration’) and removed before further analysis.

Refs:

[1] Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105, 2008 http://dx.doi.org/10.1063/1.2978177

[2] J. D. Chodera, W. C. Swope, J. W. Pitera, C. Seok, and K. A. Dill. Use of the weighted histogram analysis method for the analysis of simulated and parallel tempering simulations. JCTC 3(1):26-41, 2007.

Kudos:

Much of this module is a re-implementation (in jax) of the timeseries module in pymbar https://github.com/choderalab/pymbar/blob/master/pymbar/timeseries.py

Equilibration and decorrelation

thermoflow.subsample_time_series(time_series: Array, transient: bool = False) Array

Extract uncorrelated samples from correlated timeseries data.

Parameters:
  • time_series – A jax array of shape [T]

  • transient – If True initial transients will be detected and removed using the detect_equilibration function.

Returns:

An array of uncorrelated subsamples.

thermoflow.detect_equilibration(A_t: Array, nodes: int = 16) Tuple[int, float, float]

Detect initial transient region of an equilibrating time series using a heuristic that maximizes the number of effectively uncorrelated samples.

We evaluate the statistical inefficiency on a sequence of exponentially spaced time points, and search for the time point that maximizes the effective number of uncorrelated samples after that time. We iterate with finer grids until a local maximum is located. Since the data is noisy we may not locate the global maximum.

Parameters:
  • A_t (Float[Array, "T"]) – time series

  • nodes (int) – Number of search nodes at each iteration.

Returns:

t, g, Neff

start index of equilibrated data, statistical inefficiency of equilibrated data, Effective number of uncorrelated samples after time t.

Refs:

[1] J. D. Chodera, A Simple Method for Automated Equilibration Detection in Molecular Simulations, J. Chem. Theory Comput. 12:1799 (2016) http://dx.doi.org/10.1021/acs.jctc.5b00784

Kudos:

Adapted from pymbar/timeseries.py::detect_equilibration_binary_search

Correlation functions

We provide two methods for calculating autocorrelation times, ‘ips’ (default) and ‘batchmean’

(TODO: Explain methods)

thermoflow.autocorrelation_time(time_series: Array, method: str | None = None) float

Compute the autocorrelation_time of a correlated time series.

Parameters:
  • time_series – Array of shape [T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

tau, autocorrelation time

thermoflow.autocorrelation_time_stderr(time_series: Array, method: str | None = None, subseries: int = 16) float

Compute the standard error of the estimated autocorrelation time of a correlated time series.

Parameters:
  • time_series – Array of shape [T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

stderr

thermoflow.crosscorrelation_times(multiple_time_series: Array, method: str | None = None) Array

Compute the crosscorrelation_time of a series of correlated time series.

Parameters:
  • multiple_time_series – Array of shape [N, T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

tau, array of shape [N, N]

thermoflow.crosscorrelation_times_stderr(multiple_time_series: Array, method: str | None = None, subseries: int = 16) Array
thermoflow.crosscorrelation_functions(multiple_time_series: Array) Array

Compute the crosscorrleation functions for a sequence of corrleated time series, using the fast Fourier transform.

Parameters:

multiple_time_series – Array of shape [N, T]

Returns

Array of shape [N, T]

thermoflow.autocorrelation_function(time_series: Array) Array

Compute the autocorrleation functions for a corrleated time series, using the fast Fourier transform.

Parameters:

time_series – Array of shape [T]

Returns

Array of shape [T]

Statistical inefficiency

The statistical inefficiency of correlated time series is defined as g = 1 + 2 tau, where tau is the correlation time (measured in unit steps). We enforce a minimum g>=1.

Refs:
[1] J. D. Chodera, W. C. Swope, J. W. Pitera, C. Seok, and K. A. Dill. Use of the weighted

histogram analysis method for the analysis of simulated and parallel tempering simulations. JCTC 3(1):26-41, 2007.

thermoflow.statistical_inefficiency(time_series: Array, method: str | None = None) float

Compute the statistical inefficiency of a correlated time series.

Parameters:
  • time_series – Array of shape [T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

g, the estimated statistical inefficiency

thermoflow.statistical_inefficiency_stderr(time_series: Array, method: str | None = None, subseries: int = 16) float

Compute the standard error for the estimated statistical inefficiency of a correlated time series.

Parameters:
  • time_series – Array of shape [T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

Standard error

thermoflow.cross_statistical_inefficiency(multiple_time_series: Array, method: str | None = None) Array

Compute the cross statistical inefficiency of a collection of correlated time series.

Parameters:
  • multiple_time_series – Array of shape [N, T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

g, the estimated statistical inefficiency

thermoflow.cross_statistical_inefficiency_stderr(multiple_time_series: Array, method: str | None = None, subseries: int = 16) Array

Compute the standard error for the estimated statistical inefficiency of a correlated time series.

Parameters:
  • time_series – Array of shape [N, T]

  • method – Either ‘ips’ (defualt) or ‘batchmean’

Returns:

Standard error, array of shape [N]

Kirkwood coefficients

thermoflow.kirkwood_coefficient(time_series: Array, method: str | None = None) float

The Kirkwood coefficients for a correlated time series.

The Kirkwood coefficient is the integrated correlation functions, or the variance times the correlation time.

Parameters:
  • time_series – An array of shape [T]

  • method – Method for estimating correlation times, ‘ips’ (Defualt) or ‘batchmean’

Returns:

Kirkwood coefficient

thermoflow.kirkwood_coefficient_stderr(time_series: Array, method: str | None = None) float

Estimate of the error of a Kirkwood coefficients for a correlated time series.

Parameters:
  • time_series – An array of shape [T]

  • method – Method for estimating correlation times, ‘ips’ (Defualt) or ‘batchmean’

Returns:

stderr

thermoflow.kirkwood_tensor(multiple_time_series: Array, method: str | None = None, min_eigenvalue: float = 0.0) Array

Compute the Kirkwood tensor for a sequence of correlated time series.

The elements of the Kirkwood tensor are the Kirkwood coefficients, (The integrated correlation functions, or the variance times the correlation times). Within the thermodynamic geometry of linear response, the Kirkwood tensor is the friction, and acts as the metric tensor

This tensor should be symmetric and positive semi-definite, but may not be due to statistical errors. We return the nearest symmetric positive semi-definite matrix in the Frobenius norm with eigenvalues at least min_eigenvalue https://nhigham.com/2021/01/26/what-is-the-nearest-positive-semidefinite-matrix/

Parameters:
  • multiple_time_series – An array of shape [N, T]

  • method – Method for estimating correlation times, ‘ips’ (Defualt) or ‘batchmean’

  • min_eigenvalue – Minimum eignevalues of the Kirkwood tensor, default zero.

Returns:

An array of shape [N, N]

Refs:

TODO

thermoflow.kirkwood_tensor_stderr(multiple_time_series: Array, method: str | None = None, min_eigenvalue: float = 0.0, subseries: int = 16) Array

Estimated standard errors for the coefficients in the the Kirkwood tensor..

Parameters:
  • multiple_time_series – An array of shape [N, T]

  • method – Method for estimating correlation times, ‘ips’ (default) or ‘batchmean’

  • min_eigenvalue – Minimum eignevalues of the Kirkwood tensor, default zero.

  • subseries – TODO

Returns:

An array of shape [N, N]

Generation

thermoflow.correlated_time_series(key: Array, tau: float, steps: int, initial: float | None = None) Array

Generate time series data with given correlation time, drawn from an autoregressive model of order 1.

Note if you generate multiple series with the same random noise (same key), then those series are correlated with a cross-correlation time equal to the mean of the correlation times of each series.

Parameters:
  • key – A jax PRNG key

  • tau – Correlation time of the generated time series

  • steps – length of the generated time series

  • initial – Initial value for the auto-regression model. Provide the last value of a previously generated time series to extend the series.

Returns:

Correlated time series, size [steps]

Ref:

https://en.wikipedia.org/wiki/Autoregressive_model#Example:_An_AR(1)_process