Time Series Analysis
Southern Oscillation Index
and Northern Virginia Temperature
Aubrey Weese
School of Computational Sciences,
George Mason University, Fairfax VA
May 14, 2003
Problem 1.
Collect SOI (Southern Oscillation Index) data through the network. The minimum
temporal coverage is January, 1866--December, 2000.
The Southern Oscillation Index (SOI) is defined as the normalized pressure difference between Tahiti and Darwin. I collected SOI data from the Climatic Research Unit at the University of East Anglia, Norwich. The data set can be downloaded at http://www.cru.uea.ac.uk/cru/data/soi.htm. They calculated the SOI based on the method given by Ropelewski and Jones (1987). The data set gives monthly values for a time period from 1866 through 2002.
Ropelewski, C.F. and Jones, P.D., 1987: An extension of the Tahiti-Darwin Southern Oscillation Index. Monthly Weather Review 115, 2161-2165
1. Display the SOI as time series in graph.
2. Compute climatological values for each month and plot the result. Are they very close to zero?
jan = 0.0184
feb = -0.0562
mar = -0.1476
apr = -0.0091
may = -0.0205
jun = -0.1096
jul = 0.0538
aug = -0.1815
sep = -0.0141
oct = -0.1423
nov = -0.0951
dec = 0.0358![]()
These values are not close to zero considering the scale we are working with. Therefore, they need to be removed from the data.
3. If the climatological values are not zero, take off the seasonal effects from the original data and display the new time series data again.
4. Do you think the original data show any trend?
5. Do you think this time series is stationary?
No, the data does not show a trend, and yes, the time series is stationary. This is evidenced by the fact that the corrected data looks very similar to the uncorrected data, and the whole series simply oscillates back and forth around zero.
6. Compute autocorrelation coefficient and create the correlogram. Discuss your result.
An autocorrelation coefficient tells how similar the time series is to itself. If it is highly autocorrelated, past values can be used to forecast future ones. An auto correlation coefficient close to zero indicates low correlation, and a coefficient far from zero indicates high correlation. A correlogram shows these coefficients plotted for different time separations between measurements (lags). For this data set, the coefficients decrease from one at zero lag time to near zero at large lag times, exhibiting a damped oscillation. This indicates that the SOI values tend to be highly correlated with those measured a short time later, and less correlated with those measured a long time later. This is typical for earth science data sets, which are frequently autocorrelated close together because of inertia or carryover process in the physical system.
Problem 2.
For the same SOI anomaly data:
1. Use Fourier Transformation to compute the spectrum of the data. Plot the result and discuss your result. For example, Does your data show any periodic signal? What is the dominant period(s) of your data?
It is a little hard to tell from this graph, but you can see that several large amplitudes appear in the low frequency area. (The data on the right hand half of the graph is unimportant as it is just a mirror image). Here I have produced a table of the 10 highest amplitudes so we can see at exactly what frequency they occur:
Amplitude Frequency 167.8889 0.0147 168.5203 0.0061 168.7857 0.0362 168.9990 0.0184 173.3467 0.0116 200.4529 0.0245 203.5556 0.0092 207.9840 0.0221 213.2176 0.0129 240.1005 0.0288
The highest three amplitudes correspond to frequencies that are repeated with slight variations on up the table. Therefore, taking the three dominant frequencies to be 0.0288, 0.0129 and 0.0221, I calculate that the SOI has a periodicity of 2.9, 6.5 and 3.8 years.
2. Use wavelet analysis to get the time-frequency(period)-energy information. Display the result and discuss it. What new information can you find in wavelet analysis which is not in the Fourier Analysis?
Like the fourier transformation, the wavelet transformation converts time-domain data into the frequency domain. Wavelets have advantages over Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. The wavelet transform assumes that the frequency spectrum is changing over time (in other words, that the signal is non-stationary). Because our signal is stationary, wavelet transformation is not really needed. Light and dark banding in the image above indicates a strong periodic signal. However, it is more difficult to get the exact frequency of this strong signal from the wavelet transform than it is to get it from the fourier transform.
Problem 3.
Find another time series data you are interested in. The new data may be
a time series of a station observation, a spatial average over a area such
as average precipitation over a specific region, or a PCA time series from
a Principal Component Analysis (EOF), etc.
1. Make your data and the SOI data be of the same temporal resolution.
I decided to find out if the SOI effects us close to home. So I collected monthly mean temperature for the northern Virginia area (Virginia climate division 4) from the National Climatic Data Center http://www.ncdc.noaa.gov/. The data ranges from 1895 to 2001 and the temperatures are in Fahrenheit.
2. Repeat all procedures in problem 1 and problem 2 on the new data set.
jan = 32.3626 feb = 33.8991 mar = 42.5692 apr = 52.4869 may = 62.2907 jun = 70.3140 jul = 74.4953 aug = 72.8523 sep = 66.2598 oct = 55.1888 nov = 44.5280 dec = 34.8869
These values are definitely not close to zero, and need to be removed from the data.
This new graph is basically a time series of the variance from normal temperature, either positive or negative. The data looks stationary and does not appear to be showing a trend.
This correlogram exhibits the same damped oscillation behavior that the soi correlogram did. However the amplitude is smaller, indicating a small overall auto correlation of temperature variance. In other words, if the temperature is abnormally high or low one month the chances are not very high it will be abnormally high or low the next month.
This graph is does not have such obvious peaks as the fourier transformation of the SOI data produced. Here is the table of the ten highest amplitudes:
Amplitude Frequency 241.3510 0.0857 241.4043 0.0413 241.8263 0.0872 243.1569 0.1565 246.5269 0.3178 250.5245 0.1106 255.6386 0.2827 258.5363 0.1550 286.2667 0.0693 321.2366 0.0008
This table seems to show some repetition of the frequencies corresponding to the four highest amplitudes. These frequencies (0.0008, 0.0693, 0.1550 and 0.2827) compute to periodic cycles of 104.2 years, 1.2 years, 6.5 months and 3.5 months. None of these are close to the periodic cycles for the SOI data. The fact that the highest amplitude corresponds to a period that is almost the same as the length of the entire time series most likely means that none of these periods are statistically significant.
The wavelet transformation does not show strong light or dark banding, reflecting the relative homogeneous nature of the Fourier transformation.
3. Study the relationship between SOI and your time series.
Minimum requirement: compute Pearson correlation coefficient (CC), Spearman's
rank CC, and Kendalls's (tau) rank CC. You should also test the confidence
level of your Pearson CC.
I used SPLUS to compute the correlation coefficients between the two time series, and it returned the following results:
cor.test(soi, va, method="pearson")
Pearson's product-moment correlation
data: va and soi
t = -0.8666, df = 1282, p-value = 0.3863
alternative hypothesis: true coef is not equal to 0
sample estimates:
cor
-0.02419578
--
cor.test(va, soi, method="spearman")
Spearman's rank correlation
data: va and soi
normal-z = -0.6171, p-value = 0.5372
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.01722729
--
cor.test(va, soi, method="kendall")
Kendall's rank correlation tau
data: va and soi
normal-z = NA, p-value = NA
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.01133927
The absolute value of all three of these coefficients (cor, rho and tau) is less than 0.20. Consulting the table from Williams below
| Coefficient | Correlation | Interpretation |
| less than .20 | slight correlation | almost no relationship |
| .20 to .40 | low correlation | small relationship |
| .40 to .70 | moderate correlation | substantial relationship |
| .70 to .90 | high correlation | marked relationship |
| .90 and above | very high correlation | solid relationship |
(Williams, 1992, p. 137).
I see that this means there is almost no relationship between the Southern Oscillation Index and the monthly average temperature in Northern Virginia.
Williams, F. (1992). Reasoning with statistics: How to read quantitative research (4th ed.). Fort Worth: Harcourt Brace Jovnovich College Publishers.
See my matlab code for this assignment. This is the code I used for the SOI time series. The VA time series code is almost identical, only modified slightly due to the differing length of the series.