next up previous contents
Next: Monte Carlo simulations Up: Goodness-of-fit tests Previous: The number of classes   Contents

The goodness-of-fit test and the correlation integral

From the previous subsections it follows that the class boundaries and the number of classes can be chosen optimally only if we know the dimension and the number of distances within the scaling region. However, for efficiency reasons, the computed distances should be grouped at an earlier stage. In the current implementation (computer program) of the estimators (see section 5.7) the entire possible distance range $[0,1]$ is divided in ``bins'' as follows:
\begin{displaymath}
\ln(r_{ibin}) = \frac{\ln(r_{min})(N_{bin}-ibin)} {(N_{bin}-1)}
\end{displaymath} (5.63)

where $r_{min}$ is the right boundary of the first bin and $N_{bin}$ is the number of bins. The bin boundaries ($r_{ibin}$) do not satisfy eq. (5.61), and $N_{bin}$ has to be specified at the beginning of the computer program, so this procedure does not lead to optimal classes. It would be possible however, to do the computations twice: first compute the correlation integrals and identify the scaling region and estimate the dimension and then choose the bins according to eq. (5.61) and eq. (5.62).

For each of the different dimension estimators, i.e. linear least squares and the maximum likelihood estimators, the $\chi^2$ test can be applied. The number of degrees of freedom of $X^2$ (see eq. 5.59) depends on the included parts outside the scaling region and the number of estimated parameters. A summary is given in table 5.1.
Estimator Estimated parameters Included regions Degr. of freedom
linear least squares dimension, y-intercept $[0,r_l]$ region (in $C(r_l)$) $k+1-2-1$
Takens dimension $k-1-1$
Ellner dimension $[0,r_l]$ region $k+1-1-1$
double truncation dimension $k-1-1$
double censoring dimension and $\phi$ $[0,r_l]$ and $]r_u,1]$ regions $k+2-2-1$


Table 5.1: Summary of $\chi^2$ test options.


Here $k$ is the number of bins in the scaling region. For the Takens estimator, the class boundaries can be chosen in an appropriate way (see section 5.7.6) in the range $[0,r_u]$, even if we chose $r_l > 0$, to be able to perform all tests on the same set of data simultaneously.


next up previous contents
Next: Monte Carlo simulations Up: Goodness-of-fit tests Previous: The number of classes   Contents
webmaster@rullf2.xs4all.nl