Comment about my own break points finding
I'm of course aware of Bai & Perron (2003) paper and know the
implementation of their method in the R framework, but:
So, I tried to build my own system which, at some level of objectivity, can
give me the position of the break points (BP) in a time series. Here I used the
yearly-averaged NOAA global anomalies through August 2013.
- Find difficult to write the code with all the statistical tests and
the controls associated to the method.
- Don't want to learn (and download and install) R-code, also if a tutorial
of Bai&Perron's method is available on line
My procedure is:
I used existing code (Bongo) to compute all these phases, so after any
segment I must manually extract the number of data, rms, CC and slope
of any piece from the fit output file (poly-bai.app) and put all them in a
separate file (bai.dat) used for the plots.
- starting from the first data point in the input file (actually 0813.dat)
I linearly fit the first 10, 11, 12, ...,56... data points,
the last two or three ones
beeing well within the next segment (defined by eye).
- For any of the fits, rms of residuals and correlation
coefficients (CC) are
plotted (c and d frames in the
figure) vs. the number of
points of any fit.
- When the couple rms-CC appears to have minimum and maximum values
choice of the break point is for the last ascissa (#Data in the plots) of
the fitted piece.
- At that point, I read the original data file (0813.dat) and make the
correspondence between #Data and the real year (and also the row number,
say R1 of the break point (say the first break point).
- The procedure starts again (GOTO 1.), fitting R1+1 through 10,11,...,56
pieces within the segment which starts at line R1+1 of the input file. At
the end a new break point will be defined (say R2, the second break point
and also the corresponding
year and the row number in the input file), and so on through the last data
I've also plotted (frame a) the slopes vs. the number of data of
It must be noted that, while by eye I defined 4 break points (the
last one beeing at the year 2001), with the present procedure no way exists to
overcome 3 break points. Does that means the procedure is more objective
than eye? May be, but I don't really know.
After plotting RMS,CC,SLOPE on the same scale (as in this plot), I can find new breakpoints (secondary BPs)
at about 1890, 1893, 1953, 1963, 2007.
But: I cannot find the (expected) main BP at about 2001. In my opinion,
the method works fine when the fits, starting from a BP at the beginning of
a given segment, go well within the
next segment so that it can modify fitting parameters. Presently, the (hypothetical)
final 5.th segment is too short to influence the fits, so the BP is
hill-defined (e.g. 2007) or not defined at all.
Note that: the "#Data" used as abscissa in the a), c) and
represents the number of data points AFTER the last BP (the beginning of the
file, for the first segment). The colors define the order of the segments.
So, the value 10 represents 10 points after the beginning of file for
the 1.st (black) segment; 10 points after the 1913 BP for the 2.nd (blue)
segment; 10 points after the 1944 BP for the 3.rd (green) segment and 10
points after the 1974 BP for the 4.th (red) segment.
An alternative representation of the fits is this, where "Years" instead of "#Data" are used as
abscissa, but it seems to me it is more chaotic than the former one.
Page written December 12, 2013
Last update: December 15, 2013