Mutual information is used to automatically select the optimal time delay τ for phase space reconstruction. Its core value lies in quantifying the nonlinear dependence between a time series and its delayed copy, avoiding redundancy when the delay is too small and distortion when the delay is too large. Keywords: MATLAB, mutual information, phase space reconstruction.
This method provides an automatic delay-selection workflow for nonlinear time series
| Parameter | Details |
|---|---|
| Language | MATLAB |
| Target Problem | Automatic selection of the optimal delay τ |
| Core Method | Mutual Information + first local minimum criterion |
| Fallback Criterion | 1/e decay rule |
| Typical Scenarios | Chaotic systems, phase space reconstruction, nonlinear dynamics |
| Stars | Not provided in the original input |
| Core Dependencies | histcounts2, findpeaks, autocorr, pdist2 |
Mutual information is one of the standard methods for phase space reconstruction. Instead of only measuring linear correlation, it directly quantifies the information overlap between the original series x(t) and the delayed series x(t+τ), which makes it better suited to chaotic systems, marginally nonstationary data, and nonlinear systems.
The goal is not to make the two components completely independent. Instead, it is to find a balance that preserves dynamic coupling while reducing redundancy. In practice, engineers typically take the first local minimum of the mutual information curve as the optimal delay τ_opt.
The mutual information criterion can be expressed as a clear automated workflow
For any candidate delay τ, first construct X=x(1:end-τ) and Y=x(1+τ:end), then estimate the joint probability p(x,y) and marginal probabilities p(x) and p(y), and finally compute the mutual information I(τ). When I(τ) reaches its first local valley, redundancy has been effectively compressed.
function [tau_opt, tau_list, MI_list] = auto_mutual_information_delay(data, max_tau, nbins)
% Automatically select the optimal delay tau
if nargin < 2 || isempty(max_tau)
max_tau = floor(length(data)/10); % Default search range is 1/10 of the series length
end
if nargin < 3 || isempty(nbins)
nbins = 32; % Default number of bins, suitable for most medium-sized samples
end
data = data(:); % Convert to a column vector to avoid dimension errors
tau_list = 1:max_tau;
MI_list = zeros(size(tau_list));
for k = 1:length(tau_list)
tau = tau_list(k);
X = data(1:end-tau);
Y = data(1+tau:end);
MI_list(k) = mutual_info_hist(X, Y, nbins); % Compute mutual information for each delay
end
[~, locs] = findpeaks(-MI_list); % Find peaks on negative mutual information, i.e. valleys on the original curve
if ~isempty(locs)
tau_opt = tau_list(locs(1)); % Prioritize the first local minimum
else
idx = find(MI_list <= MI_list(1)/exp(1), 1);
tau_opt = tau_list(min(max(idx,1), length(tau_list))); % Fall back to the 1/e criterion when no minimum exists
end
end
function MI = mutual_info_hist(X, Y, nbins)
Pxy = histcounts2(X, Y, nbins);
Pxy = Pxy / sum(Pxy(:)); % Normalize to obtain the joint probability
Px = sum(Pxy, 2);
Py = sum(Pxy, 1);
MI = 0;
for i = 1:size(Pxy,1)
for j = 1:size(Pxy,2)
if Pxy(i,j) > 0
MI = MI + Pxy(i,j) * log2(Pxy(i,j) / (Px(i) * Py(j))); % Accumulate mutual information
end
end
end
end
This code completes mutual information estimation, first-local-minimum search, and the 1/e fallback mechanism.
This implementation improves robustness with a dual-criterion strategy
The first local minimum is the preferred theoretical choice because it often corresponds to the first significant drop in information redundancy between adjacent embedding components. However, real-world data often contains noise, short sample lengths, or local fluctuations, which can obscure the valley structure. In those cases, the 1/e rule provides a stable engineering alternative.
If the series contains trends, sampling drift, or outliers, the mutual information curve may become flattened. Before computing mutual information, you should remove the mean, detrend the series, standardize it, and denoise it when necessary before estimating the probability distribution.
Chaotic time series provide a typical validation scenario for this method
The Lorenz system is commonly used to validate phase space reconstruction parameters. It has clear nonlinear dynamical characteristics, and the τ_opt obtained by mutual information is often better aligned with reconstruction needs than the zero-crossing point of the autocorrelation function.
% Generate a Lorenz system and estimate the optimal delay
sigma = 10; rho = 28; beta = 8/3;
dt = 0.01; T = 100; steps = floor(T/dt);
x = zeros(steps,1); y = zeros(steps,1); z = zeros(steps,1);
x(1)=1; y(1)=1; z(1)=1;
for i = 1:steps-1
dx = sigma*(y(i)-x(i));
dy = x(i)*(rho-z(i))-y(i);
dz = x(i)*y(i)-beta*z(i);
x(i+1) = x(i) + dx*dt; % Update x using Euler integration
y(i+1) = y(i) + dy*dt;
z(i+1) = z(i) + dz*dt;
end
data = x(1:10:end) + 0.01*randn(floor(steps/10),1); % Downsample and add weak noise
[tau_opt, tau_list, MI_list] = auto_mutual_information_delay(data, 50, 32);
fprintf('tau_opt = %d\n', tau_opt);
plot(tau_list, MI_list, 'b-o'); grid on;
This example shows the complete workflow from Lorenz series generation to automatic estimation of τ_opt.
Parameter selection directly determines how interpretable the mutual information curve will be
max_tau is typically set to 1/10 of the data length, or around 1/4 of the dominant period. If the range is too small, you may miss the local minimum. If it is too large, the number of usable samples decreases, which amplifies probability estimation error.
nbins is one of the most sensitive parameters. Too few bins will oversmooth the result, while too many bins will make the joint distribution sparse. The Freedman-Diaconis rule mentioned in the input is a reasonable default and works well in automated scenarios.
The histogram method is the most practical baseline approach
Kernel density estimation is theoretically smoother, but in MATLAB the cost of implementing two-dimensional joint density integration is much higher and more sensitive to bandwidth selection. If your goal is to stably select τ rather than to estimate absolute mutual information with high precision, the histogram method is usually sufficient.
% Preprocessing recommendations
data = data(:);
data = detrend(data); % Remove the linear trend
data = data - mean(data); % Remove mean offset
data = data / std(data); % Standardize to unit variance
This preprocessing step can significantly improve the stability and comparability of the mutual information curve.
The optimal delay is only the first step in full phase space reconstruction
After obtaining τ_opt, you still need to determine the embedding dimension m. A common practice is to combine this method with FNN (False Nearest Neighbors) or the Cao method to find the minimum dimension needed to unfold the attractor structure, and then proceed to nonlinear analyses such as correlation dimension and Lyapunov exponents.
By chaining mutual information with FNN, you can build a reusable analysis pipeline: preprocessing → select τ → select m → reconstruct phase space → detect chaos. This is a classic engineering workflow in nonlinear time series analysis.
The run button shown in the image only indicates platform capability, not algorithmic output
![]()
This image is the platform entry icon for its “AI Code Writing / Run” feature. It does not display the mutual information curve, delay scatter plots, or attractor structure, so it does not carry any experiment-level algorithmic information.
Mutual information is more suitable than autocorrelation for modeling nonlinear dependence
The autocorrelation function only captures linear relationships, so it is not sensitive to higher-order coupling in chaotic systems. Mutual information, by contrast, measures both linear and nonlinear dependence in a unified way, making it easier to identify physically meaningful delays in complex dynamical systems.
If you work with periodic signals, mechanical vibration, EEG, meteorological, or financial time series, and your next step is phase space reconstruction, mutual information should usually be your default starting point rather than autocorrelation.
FAQ
FAQ 1: Why is the optimal delay not the global minimum of mutual information, but the first local minimum?
Because the global minimum often corresponds to excessive decoupling between components, which destroys the dynamical relationship. The first local minimum better balances information redundancy and the introduction of new information, making it more suitable for phase space reconstruction delay selection.
FAQ 2: What should I do if the mutual information curve has no obvious valley?
First check preprocessing, sample length, and nbins. If there is still no clear minimum, you can use the empirical rule I(τ) <= I(1)/e and cross-validate it with autocorrelation or domain-specific prior knowledge.
FAQ 3: Can the mutual information method directly determine the embedding dimension m?
No. The mutual information method is mainly used to determine the delay τ. The embedding dimension m is generally determined with FNN or the Cao method. You need both to complete full phase space reconstruction.
AI Readability Summary: This article reconstructs a MATLAB-based mutual-information workflow for delay selection, explains the physical meaning of the optimal delay τ, compares the first-local-minimum criterion with the 1/e fallback rule, and provides reusable automation code, a Lorenz chaos example, parameter-tuning guidance, and integration with the FNN embedding-dimension workflow.