Comment about my own break points finding

I'm of course aware of Bai & Perron (2003) paper and know the implementation of their method in the R framework, but:
  1. Find difficult to write the code with all the statistical tests and the controls associated to the method.
  2. Don't want to learn (and download and install) R-code, also if a tutorial of Bai&Perron's method is available on line
So, I tried to build my own system which, at some level of objectivity, can give me the position of the break points (BP) in a time series. Here I used the yearly-averaged NOAA global anomalies through August 2013.

My procedure is:

  1. starting from the first data point in the input file (actually 0813.dat) I linearly fit the first 10, 11, 12, ...,56... data points, the last two or three ones beeing well within the next segment (defined by eye).
  2. For any of the fits, rms of residuals and correlation coefficients (CC) are plotted (c and d frames in the figure) vs. the number of points of any fit.
  3. When the couple rms-CC appears to have minimum and maximum values respectively, my choice of the break point is for the last ascissa (#Data in the plots) of the fitted piece.
  4. At that point, I read the original data file (0813.dat) and make the correspondence between #Data and the real year (and also the row number, say R1 of the break point (say the first break point).
  5. The procedure starts again (GOTO 1.), fitting R1+1 through 10,11,...,56 pieces within the segment which starts at line R1+1 of the input file. At the end a new break point will be defined (say R2, the second break point and also the corresponding year and the row number in the input file), and so on through the last data point.
I used existing code (Bongo) to compute all these phases, so after any segment I must manually extract the number of data, rms, CC and slope of any piece from the fit output file (poly-bai.app) and put all them in a separate file (bai.dat) used for the plots.

I've also plotted (frame a) the slopes vs. the number of data of any pieces.

  • It must be noted that, while by eye I defined 4 break points (the last one beeing at the year 2001), with the present procedure no way exists to overcome 3 break points. Does that means the procedure is more objective than eye? May be, but I don't really know.

    After plotting RMS,CC,SLOPE on the same scale (as in this plot), I can find new breakpoints (secondary BPs) at about 1890, 1893, 1953, 1963, 2007.

  • But: I cannot find the (expected) main BP at about 2001. In my opinion, the method works fine when the fits, starting from a BP at the beginning of a given segment, go well within the next segment so that it can modify fitting parameters. Presently, the (hypothetical) final 5.th segment is too short to influence the fits, so the BP is hill-defined (e.g. 2007) or not defined at all.
    Note that: the "#Data" used as abscissa in the a), c) and d) frames represents the number of data points AFTER the last BP (the beginning of the file, for the first segment). The colors define the order of the segments.
    So, the value 10 represents 10 points after the beginning of file for the 1.st (black) segment; 10 points after the 1913 BP for the 2.nd (blue) segment; 10 points after the 1944 BP for the 3.rd (green) segment and 10 points after the 1974 BP for the 4.th (red) segment.
    An alternative representation of the fits is this, where "Years" instead of "#Data" are used as abscissa, but it seems to me it is more chaotic than the former one.
    Page written December 12, 2013               Last update: December 15, 2013