Correlation – The Need For ‘Stationary’ Data

How correlated are Intel and Google’s stock  prices? The below graph shows the daily close of prices of both from mid 2009 to mid 2011.

GOOG INTL Stock Prices

From first impressions, it certainly looks like the two price series move in tandem and should have a high correlation. Indeed it turns out that correlation coefficient of the two stock price series is 0.88 , indicating a high 88% correlation  between the Google and Intel stock prices.

However, this is totally misleading – in reality the correlation between the two is a mere 36%.

Correlation, in common with most time-series data analysis techniques requires ‘stationary’ data as an input. To be stationary the data must have a constant variance over time and be mean reverting. Stock price data (and many other economic data series) exhibit trending patterns which violates the criteria of stationarity. Transformation to stationary data is quite simple, however, as converting the daily price closes into daily returns will normally be sufficient. The return series of a stock is usually considered as stationary for time series analysis purposes, since it is mean-reverting (as the daily returns oscillate above and below and constant mean) and has a constant variance (the magnitude of the returns above and below the mean will be relatively constant over time despite numerous spikes).

The daily series of returns (ie percentage price changes) for both Google and Intel stocks can be seen below. Not that there is no trend to the series which moves above and below a constant mean – which for daily stock price returns is almost always very close to 0%.

GOOG INTL Stationary Series

The requirement for stationary data in calculating correlation can also be explained intuitively. Imagine you were looking to hedge a long position in Google stock with a short position in Intel, you would want the return on the Google stock to the match the return on the Intel stock. Hence correlating the prices would be irrelevant, in such a scenario you would want to know the correlation between the two sets of returns as this is what you would essentially be attempting to match with the hedge.

 Correcting For Drift And Seasonality

In correlating stock price data, transforming the raw price data to returns is usually considered sufficient, however , to be more rigorous any additional trends could be stripped out of the data. Most models of stock price behaviour include the risk free interest rate plus a required rate of return as a constant drift over time – the argument being that stock investors require this return for holding the stock and over the long term the stock should deliver that return. Thus, the this return could be backed out of the series before calculating correlation. In practice, since we are dealing with daily returns, the long term drift as a minimal impact on the calculation of correlation.

Some economic data series such as durable goods orders exhibit strong effects of seasonality. When raw durable goods orders data is transformed into percentage changes, it is indeed mean reverting with a constant mean. However, the series will still not be stationary due to the strong seasonality effects – orders will be much much higher during the Christmas shopping season and so the percentage changes will always spike at the time resulting in a non constant variance.

Seasonality can be dealt with by cleaning the data series using another series which exhibits the same seasonality. In the case of durable goods orders, the raw CPI index (note: not the percentage change in CPI) would be such as series since the CPI index will typically spike during shopping seasons. Thus the durable goods orders could be divided by the CPI to arrive at a ‘deflated’ durable goods series which could then be made stationary by transforming it into percentage changes between periods.