The length of the time series must be large enough to obtain
a sufficiently large number of
independent distances for the
expressions for the estimators and their asymptotic variances
to be useful.
Correlations between distances arise if overlapping
collections of points in phase space are being used;
these can be avoided by taking [Ellner, 1988]
The -variable was used to construct a time series of length (and = 1); the first iterates were discarded to avoid transient effects (caused by the initial conditions of the differential equations). Correlation integrals were computed using the MLDK2 program (see section 5.7) from distances between randomly chosen pairs of points. The scaling regions were chosen by visual inspection of the correlation integrals, and justified by goodness-of-fit tests. Because it computed the distances using the supremum norm, the scaling region hardly moves to a higher distance range as the embedding dimension increases. Therefore, we used the same and for every embedding dimension. For the entropy estimator, we used 1, 2, and 3. Confidence intervals were obtained from the asymptotic variances, with the estimated values of , and , so eqs (5.38), (5.56) and (5.57). This procedure was repeated 100 times with independent time series using the MLDK2MC program (see section 5.7) to compute mean dimensions and entropies, their standard errors, and coverage frequencies.
In figure 5.3 (figure 5.3b) and table 5.4 the results from the MLDK2 program using a time series of the Hénon map are shown. Note that we violate the condition eq. (5.74): the length of the time series is too short for such a number of distances. Below, we will show that the consequences are minimal. We chose the scaling region to be , but probably could have been chosen smaller. Most ``test sizes'' (see section 5.5) are larger than 0.05. So it is reasonable to assume that the distribution of distances is indeed like eq. (5.1). As expected, we see that sD2l sD2T sD2E sD2tr, where l=linear, T=Takens, E=Ellner, and tr=doubly truncated. (For all practical purposes D2E is the same as the doubly censored case, see section 5.2.4.) When successive dimension estimates do not significantly differ the entropy estimates are useful (the D2K are the ``double'' dimension estimates). The results from repeating this procedure 100 times are shown in figure 5.4 and table 5.5. They can be compared with figure 5.2 and table 5.3. The ``single'' dimension estimates now show small but statistically significant deviations from the literature value. Since is large enough ( 50), these systematic errors are not due to the use of the maximum likelihood estimator persé, but to another source, e.g. lacunarity [Theiler, 1988]). In general, the two-sided coverage frequencies are too low and the lower and upper coverage frequencies are asymmetric; these should all be . Since the asymptotic variances are accurate, these results are most probably due to systematic errors in the estimated correlation dimension. This is further illustrated by the fact that if we use the mean dimension estimate from the Monte Carlo trials as the ``true'' value in eq. (5.69), the coverage frequencies fluctuate closely around 0.95. The entropy estimates converge to the literature value, but rather slowly.
We emphasize that in contrast to Ellner's numerical results [Ellner, 1988], our computed confidence intervals are not ``conservative''. The reason is that he computes the asymptotic variances using a substituted value of which is given by eq. (5.74), but uses much more distances to stabilize his dimension estimates.
We repeated this Monte Carlo experiment for the case that we do not violate eq. (5.74) (with a slightly different implementation of MLDK2MC which chooses vector indices without replacing). The results, given in figure 5.5 and table 5.6, show that there is hardly any difference. In appendix E we studied the effects of violiating the condition more carefully. There we concluded that the the dimension and entropy estimates become more precise as the number of distances increases, though the variance diverge from the Cramér-Rao lower bound. However, they remain sufficiently close if number of distances is less than about 10 times as large as the length of the time series.
In figure 5.6 (figure 5.6b) and table 5.7 we present the results of MLDK2 for the logistic map. The scaling region plotted is ; one might object that it should have been at a higher range. From the Monte Carlo simulations in figures 5.7 and 5.8 and tables 5.8 and 5.9 we see that there are systematic errors from the literature values, especially in the case where the scaling region is . Because the variances of the estimated dimensions and entropies decrease due to repeating the estimation, other sources of error (probably lacunarity, or lack of self-similarity) become more important than errors due to the finite sample size.
In figures 5.9 (figure 5.9b) and 5.10 and tables 5.10 and 5.11 the results for the sine wave are shown. The dimension and entropy converge to one and zero respectively. From the coverage frequencies we conclude that the correlations due to the strong deterministic nature of the signal are of no consequence. With the Monte Carlo simulations, we used a different initial phase for every realization, but choosing a slightly different frequency every time does not change this conclusion.