The Italian, original version of this paper has been published at ClimateMonitor.it as Part I, Part II and Part III. The present version includes new applications and a "proof" of the goodness of my choices. fz. |
Riassunto:
Sostituisco il dataset iniziale con le sue differenze prime (o con le
derivate numeriche in caso di passo variabile) e
verifico sperimentalmente se l'esponente di Hurst H, cioè il livello di persistenza,
cambia nel senso che si avvicina maggiormente al valore 0.5 mentre il
dataset trasformato mantiene l'informazione spettrale del dataset originale.
Applico la modifica alle medie annuali NOAA di anomalia di temperatura, ad
all'ultimo dataset mensile, sempre NOAA, e al livello del lago Vittoria i
cui dati non sono a passo costante.
Abstract:
I change the original dataset with its differences or its numerical derivatives
in case of a variable step and look at the Hurst exponent H. If its value is
lowered by this procedure, I verify if the new dataset contains again the
spectral information of the original one by comparing their spectra.
Such a procedure has been applied to the yearly global temperature anomaly
and to the last available monthly data from the NOAA GHCN cag-site. Also the
lake Victoria levels (at variable time step) has been used to test the
procedure.
We did note many times that the persistence affects several datasets we normally use in climatology (and not only in it). Persistence concerns measures that tend to reproduce previous results, shows autocorrelation among them and a probable dependence. The autocorrelation function at lag 1 (i.e. acf(1)) assumes values greater than 0.5 and denotates that "normal" statistics can no more be used, being based on independent data.
I always remember, mainly to myself, that independent data are uncorrelated, while the opposite is untrue: correlated data are not necessarily dependent. If data are correlated, then their "physical" dependence must be proven by another method or in another way. |
As an example, the standard deviation of the mean of a sample
Hurst exponent derives from an estimate in a simplified procedure,
described e.g. in Koutsoyiannis (2002, 2003), not so simple for me.
So, as an estimate of H, I use equation (5) of Koutsoyiannis (2003) or, being
the same, equation (17) of Koutsoyiannis (2002)
In such a way I can get an estimate of H from the acf(1) (what of course implies the acf computation). I need to note acf is a positive function between 0 and 1; if a numerical procedure produces a negative values for acf(1), equation (4) gives an indefinite result (NaN, not a number). We can assume negative results as fluctuations around zero and assign to them an average zero-value, so equation (4) gives H=0.5 (i.e. uncorrelated data).
In what follows, I will define a procedure that, hopefully, reduces or cancels the persistence in a climate variable and apply spectral analysis to the "corrected" dataset (i.e. the one at reduced/null persistence). The last operation implies the "corrected" data will conserve the information content of the original data (or, at least, the spectral information): I am not able to demostrate such an hypothesis in a general case, so, after the first example, I'll apply the same procedure to several datasets in order to verify for any single situation the so-called conservation of the (spectral) information content after a transformation.
Annual NOAA-NCDC Temperature Anomaly
The first example where I apply the above mentioned procedure is the annual
average anomaly of NOAA global (earth+ocen) temperature. I own data from
2011 through 2017 and show here the 2017 ones.
Computing the acf and the Hurst exponent via equation (4) gives a
H_{obs}=0.975, a large value implying a strong persistence (autocorrelation)
among the data.
An article by Roman Mureka at WUWT shows that the differences
between successive values of a dataset can reduce dependence (his words are:
"... it might not be unreasonable to assume that the annual changes are
independent of each other and of the initial temperature") and I add that
the difference dataset looks like "more casual" or "less structured" (whatever
such terms could mean) than the original data.
The method used by Mureka (by the way, applied to the same annual anomaly but
in another context) appeared interesting to me and the procedure itself easy
to be implemented, so decided to apply it to the 2017 annual NOAA anomaly
(hereafter noaa-17). Result is in figure 1
Fig.1. upper panel: 2017 annual average
anomaly, NOAA. central panel: Differences d(i)=t(i+1)-t(i).
bottom panel: Detrended values, computed from the line in the upper
panel compared to a fixed sine-wave.
The comparison in figure 2 between the autocorrelation functions
(original vs. differences) shows an impressive improvement (reduction) of
the persistence.
Fig.2. Comparison of observed and difference
acfs. H_{obs}=0.975; H_{diff}=0.5.
Now I can apply spectral analysis (say MEM [Childer 1978; Press et al.
2009] or LOMB [Lomb, 1976; Scargle, 1982] methods, which is my
final scope) to the difference in order to obtain a more reliable spectral structure
for annual anomaly if and only if the differences conserve the
(spectral, at least) information content of the original data. As stated
above, I'm not able to prove the hypothesis, so I'll
compare the spectra of both datasets shown in the next two plots (figure 3 e
and figure 4)
Fig.3. Original NOAA global anomaly and its
MEM spectrum.
Fig.4. As in figure 3, for the differences of
anomaly.
A direct comparison between spectra shows they are very similar in
spectral peaks positions (periods), the only variety being the ratio among peaks
height and the better definition of the long-period maxima in the difference
spectrum. This one cannot be a proof, but surely may be a strong suggestion
about the conservation of the information content od the differences
dataset. Also, the above plots indicate the persistence does not affect the
spectrum, at least in this case of annual global anomaly.
Actually, this hypotized mantainance of the information must be
experimentally confirmed for any dataset, before whatever conclusion can
be drawn.
A synthetic summary of the section is:
Lake Victoria level
The lake Victoria series has a Hurst exponent H=0.962, so it is good
choice
for the actual procedure, the main difference with the NOAA data being the
variable step of data. This implies the differences must be computed per
unit of the abscissa, i.e. it needs the ratio Δy/Δx or the
numerica derivative of the dataset (the same computed above, but with
Δx=1).
The present one being the first time I use the procedure in a
derivative/differences context, I do include both the transformations of
the lake Victoria series, so that they can be compared with each other, as
"obs", "deriv" or "der" and "diff" outputs.
In the meanwhile I note that H_{deriv}=0.781 and H_{diff}=0.638, two values large
enough to push toward the little effectiveness of the method in reducing or
nullify persistence. Nevertheless, the spectral analysis could be affected
in a positive sense by the lower Hurst exponents.
Figure 5 shows the comparison among the acfs of the series.
Fig.5. acfs of lake Victoria. black:
Original data. blue: Absolute differences. red:
Numerical derivatives. Both transformations show well visible improvement
of the original autocorrelation. It should be noted that at lag 1
the acf of the derivatives is more than the double of the differences acf.
Here H_{obs}=0.962; H_{deriv}=0.781; H_{diff}=0.638.
Fig.6. Original serie of lake Victoria level
and its LOMB spectrum. In what follows (figures 7 and 8) the main spectral
feature at about 78 year appears as a macroscopic spurious shape due to the
persistence, while the lowest periods remain also in the "transformed"
spectra.
Fig.7. Numerical derivatives
(Δy/Δx) of lake Victoria and its LOMB spectrum.
Fig.8. Absolute difference (i.e. not referred
to a time-base) between lake Victoria levels. The peak at 34.4 year
doesn't appear in figure 7 and some minor variety is visible around 3-4
year.
Also in lake Victoria (strong) differences among power ratios of near-period peaks appear.
Monthly NOAA-NCDC Temperature Anomaly
We can suppose a resemblance between annual and monthly NOAA data but it
is better to directly verify such possible common behaviour. So, I use here
the last available monthly dataset at NOAA cag (climate at a glance) site:
the series referred to December 2017 (here named 1712t.dat), from which I
computed the differences and, from both, the acfs of figure 9.
Fig.9. acf of observed (black) and differences
(blue) of December 2017 NOAA monthly anomaly. The enhancement of the the
persistence is evident. Here H_{obs}=0.983 and H_{diff}=0.5.
The persistence has been totally resetted by the transformation and the
spectra in figures 10 and 11 cleanly show that the spectral information has
been mantained through the transformation procedure. In short, we can read
here the same novel as above, for the annual data: we observe the same
spectral structure and a sharpening of the ~60-year peak.
Fig.10. Global monthly anomaly othrough December
2017. A comparison with the black line of figure 9 shows how much thje
persistence is strong here, much more than in annual data. In the central
frame we can note the weakness of ythe identification of the ~60-year peak.
Fig.11. Differences of 1712t.dat monthly
anomaly and its MEM spectrum. The peak at ~60 year is well visible here.
From the spectral analysis of monthly data we can derive the same
spectral structure as the annual data and the confirmation that, with the
generic NOAA dataset, the persistence has little (if not none at all) effect
on the spectrum; correcting autocorrelations acts only on a better
definition of the ~60-year spectral maximum. Again, the differences works
well in nullifying the persistence.
In some a way the above three tests define fixed point within the
present work, so allowing some
Intermediate thougths.
While the above statements for NOAA data hold also in the general case,
I must outline that the enhancement of the persistence is not the same in
any dataset and for any climate variable. Lake Victoria levels shows the
differences nd derivatives did not cancel the persistence at all, but in any
case give rise to a noticeablenrestoration of the spectral structure with
very good resemblance between the spectra and the not-significative variety
of the periods.
In the same time, the applied transformation allows to cancel spectral
peaks (like the ~40 and ~78 year ones) whose significance has been discussed
without understand their origin.
It should be also noted that, in spite of a diversity for longer periods of the "observed" spectrum of lake Victoria with respect of the other two ones, the shortest periods are common to all spectra, perhaps suggesting the peristence differently acts along the spectrum.
I think the present procedure, i.e. the use of differences/derivatives, generates uncorrelated data which caontains the information (at least the the spectral one) of the original data, allowing a more reliable spectral analysis. At the sem time, I think we need to verify the improvment in any singlie situation.
Nile: annual minimum level, 622-1469 CE The Nile river series is linked to teh lake Victoria leveĆ² because tha lake is the source of the White Nile, while the Blue Nile takes its origin from the Aethiopian Highlands. They converge near Khartoum, Sudan and the two arms become "the Nile".
The series of the Nile annual minimum level (site visited 20 November, 2017)
has a Hurst exponent H_{obs}=0.833, high enough to justify the use of
differences. Figure 12 shows the transformation can nullify the persistence.
Fig.12. Observed and difference acfs of the
Nile minimum level, 622-1469 CE. H_{obs}=0.833; H_{diff}=0.5.
In the next figures 13 and 14 "observed" and "difference" spectra will be
compared.
Fig.13. Observed annual minimum level of the
Nile river and its MEM spectrum.
Fig.14. Differences between succesive values
of the Nile series and their spectrum.
A comparison between the spectra show that:
TPW: Total Precipitable Water
The climate variable TPW is strongly linked to temperature and gains
its largest values along the Pacific equatorial belt, mainly in the areas of
the Indonesian sea, named the "warm pool", where the Pacific water, pushed
by the Trade Winds accumulate during El Niño events. The
above-mentioned strong link is outlined in figure 15
Fig.15. The HadCrut4 global temperature anomaly
compared with a scaled TPW. Within any evidence, the plots refer to the same
phenomenon.
TPW data is available for two latitude belts:
±20° and ±60°. Here, only the wider belt has been used.
The shapes of figure 15 and the NOAA anomaly of figure 1 push to think
about persistence and the use of the difference, so figure 16 appears a
natural process for TPW.
Fig.16. TPW Observed and difference acfs. No need
to highlight the enhancement. Here H_{obs}=0.968 and
H_{diff}=0.5.
Spectral analysis of observed and difference data confirms again the
little or null influence of pthe persistence on the spectral periods of such
a kind of data. Also a ratio among the powers (heights) of observed and differences
spectral peaks appears, as usual. Here, some doubt can rise, due tothe short
time extension of the dataset (30 years) but I think they can be resolved in
the best way by a comparison with the above NOAA data analysis.
Fig.17. TPW observed data for the
±60° belt and its MEM spectrum
Fig.18. TPW differences and their MEM
spectrum.
OHC: Ocean Heath Content (0-700m)
I use here only the data relative to the global ocean (0-700m). The
constant data step is 1 year and the range 1955-2015 is covered.
Hurst
exponent for observed data is H_{obs}=0.970 and becomes
H_{diff}=0.468 after the transformation, as in figure 19.
Fig.19. acfs of OHC and its differences.
H_{obs}=0.970 and H_{diff}=0.468.
This is another situation where the persistence is high in the observed
data and null after the differences. Both spectra, figures 20 and 21, appear
similar in their structure, with some minor variety in evidence: 4.1 year
not present in the observed spectrum and the shallow 30.5 year in the observed
is a "desaparecido" in the differences.
Fig.20. Observed OHC (0-700m) and its MEM
spectrum
Fig.21. Differences of OHC and their MEM
spectrum
Dendrology: tree rings, russ243, 1540-2004 CE
Here the so-called "observed data" is the average over the 45 available cronologies measured at the Skahalin Island (Russia).
Its acf, along with that of the differences is in figure 22.
Fig.22. acf of the dendrological series
russ243mm and the one of its differences. Observed acf(1) tells us about a
weak persistence (H_{obs}=0.809 that, nevertheless, is totally
cancelled by the differences (H_{diff}=0.5).
Here the climate variable is the ring width (in microns) and NOT
the temperature, due to large and well known problems of the calibration
process, because ring growth doesn't depend only on temperature but on many
meto-climatic and geological factors.
Comparing the observed, figure 23, and the difference, figure 24,
spectra tells us that the persistence can be corrected also through a
450-year time range and that, again, the infromation content doesn't change
after the transformation.
Fig.23. Average dendrology of russ243 and its
MEM spectrum.
Fig.24. Differences d(i)=w(i+1)-w(i) of the
average dendrology russ243 and its MEM spectrum.
The actual spectrum shows the most severe difference of all the up-to-now transformation processes, also if some spectral maxima appear in both spectra and some period variety doesn't seem to be significative.
Kinderlinskaya Cave, Russia, Souther Urals
This is a δ
Fig.25. acf of δ
From the spectra in figures 26 and 27 the following information can be derived:
Fig.27. Difference δ
Stockholm tide gauge 1774-2000 CE
The Stockholm tide gauge is the longest series in the world; data, available
at the PSMSL site (Permanent Service for Mean Sea Level),
include monthly values and the annual means I use here. They show some
breaks at the beginning of the dataset, so derivatives have been used as
transformation function. The respective acfs are in figure 28.
Fig.28. Observed and derivatives acfs. ACF(1)
of "deriv" is zero. Persistence has been cancelled by derivatives also if
with large oscillations. H_{obs}=0.950 and H_{diff}=0.523.
Spectral comparison (figures 29 and 30) shows again the above-mentioned
power ratio among peaks, also if the spectra are very similar. An
exception is the maximum at 94.2 year, present oncly in the derivatives,
without any signal in the observed data.
Fig.29. Sea level at Stockholm, annual means,
and its LOMB spectrum.
Fig.30. Numerical derivatives of the Stockholm
tide gaude and their LOMB spectrum. Logest periods (94, 140-160 year)show
some differences while the shortest ones, mainly those "El Niño"
like, are mantained in both spectra.
Concluding remarks
Actually, mainy due to the lack of opposite proofs, the use of the
differences/derivatives with the scope to eliminate the persistence and
bring the Hurst exponent to H=0.5 (i.e. uncorrelated data) appears a really
effective method.
I'm not able to prove the general statement that differences conserve the information content of the original data, so tried at least to verify that this is true in a variety of concrete situations which covered: various steps, time extension, climate variables and persistence content.
The enhancement of the persistence pushes me to think the actual procedure allows to overcome the problem of the autocorrelation, at least as far as spectral analysis is concerned.
After the due tests, I can suggest the differences/derivatives spectrum gives the best available results.
After
this paper had been written, I did find the site https://terpconnect.umd.edu/~toh/spectrum/Differentiation.html,
part of A Pragmatic Introduction to Signal Processing by Prof. Tom O'Haver , Professor Emeritus,
Department of Chemistry and Biochemistry, The University of Maryland at College
Park.
From this site I quote the phrase:"The derivative of a periodic signal containing several sine components of different frequency will still contain those same frequencies, but with altered amplitudes and phases." So, I assume that this statement is a proof of the conservation of spectral information, widely discussed above, within the paper, during the transition from observed to difference/derivative series also if I did not check it in a Signal Theory textbook. It also justify the ratio between observed and difference spectral peaks. |
All plots and data relative to this article can be found at the support site here. |
References