Sunday, July 27, 2014

Pair trading using PCA


In a previous post of mine, I analyzed how PCA can be used to identify market characteristics. In this post we will take a bottom up approach to identify pair trading opportunities. Any pair trading model has two components to it. The first step is the identification of good pairs. The second step is identification of divergence in these good pairs to initiate a trade. We will see how PCA can be used to perform these steps:

Methodology:

The following steps are applied on all the possible intra-sector pairs:
  • Demeaned daily returns of the stock in each pair are calculated. A matrix of sized 400*2 is constructed where 400 is the number of observations(days) and 2 is the number of stocks(a pair). PCA is performed on this matrix to get the principle components.
  • The variance explained by the first principal component is the first short listing parameter. The higher is this variance the more related the stocks are.  80%+ variance explained by the first component is generally considered good.
  • The next step is to calculate the distribution of returns around the first component. This is called the daily error. The auto-correlation of this error is the second parameter. Values less than -0.1 are favorable. Negative auto-correlation signifies that the error is mean reverting.
  • To check for divergence, we look at sum of last N days daily error. N=4 is generally good. If the sum of last N day daily error is above a threshold than it is a good entry point.
  • Book profit, stop loss and maximum holding period criteria are applied to exit a pair once it has been entered.


Example:

I have taken  ICICBANK-AXISBANK pair to illustrate the method. The data is from October 2012 to May 2014.  A total of 400 days. Following is the plot of cumulative returns of these stocks since Oct 2012. We can see these stocks tend to move together. 

The following is the plot of normalized difference in the cumulative returns for these two stocks: 

This spread looks mean reverting. Using ADF test we can see that the spread is stationary at 99% level. Now we apply PCA to this pair. Demeaned daily returns of  these stocks are calculated and the principal components are estimated. Following is the plot of principal components for the given pair:

We see that the variance explained by the primary component is around 86%. This is high value. The auto-correlation of daily error is around -0.1(significant at 95% levels). This means that the error is oscillating in nature. We can conclude that these stocks form a good pair.
To identify trade entry points we look at the distribution of returns around the primary principal component:


Whenever the last four day cumulative error(shown in read) goes above a threshold(shown in green) the corresponding mean reverting position is established in the pair. 

As per back test the above algorithm seems promising. Still there are some things we need to keep in mind which can undermine the accuracy of our trading model:
  • As the PCA ignores the mean value of returns, we might end up trading on a non-stationary spread. This can be handled by ignoring pairs in which the constituent stocks have significantly different average returns over the look back period.
  • Also this approach looks only at short term divergences only. It ignores traditional long term divergences around which many co-integration based pair trading models are based. This can be partially tackled by using multiple look back (longer and shorter)  for error identification.
  • The correlation of spread with market needs to be taken into account before entering any position.

Monday, June 9, 2014

Momentum and Volatility factors

In a previous post on Fama-French factors we have discussed Market, HML (returns of High B/M stocks over low B/M stocks) and SMB (returns of small market cap stocks over large market cap stocks) factors. In this post we will discuss two factors which are very popular among the Quant traders. These factors have been religiously studied by academicians as well.

Momentum factor (MOM):
This factor represents returns continuation over medium term horizon. In simplest terms, stocks which have outperformed over the last year are going to continue outperforming in recent future. This factor was first discussed by Jegadeesh and Titman in 1993. Since then a lot of research has been done on this factor. Numerous studies have claimed that over long horizon, this factor tends to generate considerable alpha.

Volatility factor (VOL):
This factors talks about future performance of stocks based on their historical volatility. Stocks with high level of volatility under perform stocks with low level of volatility. This factors goes against the traditional wisdom that high returns is compensation for high risk (volatility). In my view this is a very useful factor for stock selection strategies.

Methodology for factor calculation:
  1. Everyday top 200 stocks( in terms of market capitalization) are chosen. This is done to avoid survivorship bias.
  2. These chosen stocks are sorted based on their past 252 trading day returns. Top and bottom quartile of stocks are selected from this list.
  3. The equal weighted out performance of the top stocks (highest returns) over bottom stocks (lowest returns) for the next day is the return of MOM factor for the next day.
  4.  Above three steps are repeated for all the days to get a time series of MOM daily returns.
  5. The cumulative sum of  these daily returns is the MOM index.
Similarly to calculate VOL factor sorting is done based on past 252 days volatility. Daily returns are then the out performance of low volatility stocks (bottom 25%) over the high volatility stocks (top 25%).

Factor perfomance in Indian stock market:
Following is the behavior of these factors since 2006. First 253 days show zero returns due to the look back needed to compute these factors.



MOM has not generated any significant alpha over the last 6 years. There is a big draw down in the in 2009 when the market was recovering from the 2008 crash. This post crisis failure of momentum strategies is a well known phenomena known as Momentum Crashes.
VOL has been consistently generating alpha over the last 6 years. Similar to MOM factor, VOL factor tends to suffer whenever there is a spike in the market.


Randomness tests:
Following are the result of some randomness tests on these factors:

Test
Parameter
Random Walk
MOM factor
VOL factor
ACF
Lag 1
0
0.25
0.15
Runs test
Number of runs
913
794
803
Variance ratio test
Variance ratio(period=2)
1
1.25
1.15
Variance ratio test
Variance ratio(period=5)
1
1.55
1.28

    These factors show significant positive auto-correlation. The number of runs is also very less, indicating trending behavior. The variance ratio is much higher than 1 indicating mean averting properties. Looking at the results of the above tests we can safely assume that the MOM and VOL factors are trending in nature. This is a very important conclusion as it can be used to predict future market regime.

    Why bother with these factors?

  • The most important use of these factors is in alpha generation. With proper modifications these factors can be used to generate considerable returns in a capital neutral fashion.
  • These factors can be used for market regime identification. They can be also be used to affirm bullish or bearish trends in market(due to negative cross correlations with market). In a true bull market, these factors are going to show a drawdown. 
  • Returns of various trading strategies can be regressed against the returns of these factors to understand if the strategies are betting on a specific type of risk to generate alpha.
  • These factors show significant positive autocorrelation. So money allocation to the strategies which use these factors can be dynamically altered based on their recent performance. So a VOL factor long strategy should be allocated less money when VOL factor is falling as the trend is likely to continue.
  • Wednesday, May 7, 2014

    Fama French factors in Indian stock markets

    Fama French three factor model has been widely used through out the world to identify risk and returns characteristics of various investment strategies. It uses a three factor approach(contrasted against one factor approach of CAPM) to decompose the returns of stocks into the returns of the market as a whole. These three factors are:
    • Market returns(MKT)
    • Returns of High B/M stocks over low B/M stocks(HML)
    • Returns of small market capitalization stocks over large market capitalization stocks(SMB)
    Given these three factors the returns of a stock can be mathematically written as:


    Why bother with Fama French?
    • It can be used to measure and control portfolio risk in a more holistic manner as it takes into account two more factors apart from market beta.
    • It can be used to identify the investment styles of various fund managers by regressing the returns of their portfolios on factor returns
    • As investors think differently during different times, behavior of HML and SMB can be used for regime identification and classification
    • The time series properties of these factors can be used to create long short trading ideas to generate alpha

    Fama French in Indian context:
    The analysis is done using the data from Jan 2006 to Apr 2014. Everyday stocks are sorted based on their market capitalization. Top 200 stocks are picked up for further analysis. This step is done to minimize survivorship bias. Within this list, stocks are once again sorted on their B/M. The difference between the returns of top and bottom quartile of this sorted list is the return of HML index for the day. Similarly sorting based on market capitalization is performed on this list of 200 stocks to generate returns of SMB index for the day. These steps are repeated for all the days to generate the HML and SMB indices. CNX Nifty is used as a proxy for MKT. The following are the performance of these three factors since 2006.


    As we can see, these three factors are not independent of each other. Extreme movements in one factor typically correspond with the extreme movements in other factors. An example of this would be the rally of 2009 when high B/M stocks outperformed low B/M stocks. Also during the same time small market cap stocks heavily outperformed large market cap stocks. This means that even a market factor neutral portfolio can be very volatile during sharp moves in markets(as other factor exposure might not be zero). 

    Correlations matrix(values significant at 95% levels are marked in orange):


    MKT
    HML
    SMB
    MKT

    0.46
    -0.40
    HML


    -0.01
    SMB




    Auto-correlation function(values significant at 95% levels are marked in orange):


    Lag 1
    Lag 2
    Lag 3
    MKT
    0.05
    -0.01
    -0.03
    HML
    0.15
    0.06
    0.03
    SMB
    0.07
    -0.03
    -0.06

    It is clear that the factors(in particular HML) show significant positive serial correlation and hence, very likely to exhibit momentum characteristics. This means style ignorant short term reversion strategies can suffer during sharp trends in  these factors.

    K-means clustering based stock classification

    K-means clustering is one of the simplest techniques used for classification. It partitions n observations into k clusters in which each observation belongs to the cluster with nearest center. Mathematically, K-means clustering tries to find the set of μ such that the following expression should be minimized.
    Here d(x,y) is the distance function. Typical distance functions used are squared euclidean, sum of absolute differences and correlation. μi is the center(mean/median as per the definition of distance function) of the observations in Si.

    In line with my previous post on Factor analysis based stock classification, we will attempt to classify stocks into groups to uncover hidden trends if any exists.

    Classification of LIX15 stocks:
    LIX15 is an Indian equity market index that consists of 15 highly liquid stocks traded on NSE. The observations matrix consists of normalized daily returns of these 15 stocks sampled from February to November 2013. K-means clustering is applied on the data using squared euclidean distance function. Following is the result of a two cluster classification:

    Cluster 1
    Cluster 2
    AXISBANK
    CAIRN
    BANKBARODA
    MCDOWELL-N
    HINDALCO
    TATAMOTORS
    IDFC

    JINDALSTEL

    JPASSOCIAT

    JSWSTEEL

    MARUTI

    RCOM

    SBIN

    TATASTEEL

    YESBANK


    The result are clusters with disproportionate size and non obvious interpretations. Interestingly enough the stocks in cluster 2 are the stocks which do not show any significant loading on factors during the factor analysis. Hence prima facie k-means has classified LIX15 constituents into two groups, one that moved with the broad market and the other which exhibited heavy idiosyncratic movements during the analysis period. Following is the outcome of a three cluster classification:

    Cluster 1
    Cluster 2
    Cluster 3
    CAIRN
    AXISBANK
    MCDOWELL-N
    HINDALCO
    BANKBARODA
    TATAMOTORS
    JINDALSTEL
    IDFC

    JPASSOCIAT
    MARUTI

    JSWSTEEL
    SBIN

    RCOM
    YESBANK

    TATASTEEL



    The clusters roughly corresponds with sectorial themes. 


    Fundamental theme
    Cluster 1
    Metal stocks
    Cluster 2
    Financial services stocks
    Cluster 3
    Erratic/heavily idiosyncratic stocks

    Classification of BANKNIFTY stocks:
    As with the LIX15 analysis, a two cluster based classification is performed on the BANKNIFTY constituents.Following are the resulting clusters:

    Cluster 1
    Cluster 2
    AXISBANK
    BANKBARODA
    HDFCBANK
    BANKINDIA
    ICICIBANK
    CANBK
    INDUSINDBK
    PNB
    KOTAKBANK
    SBIN
    YESBANK
    UNIONBANK

    The fundamental interpretation of the resulting clusters is quite clear. 


    Fundamental theme
    Cluster 1
    Private sector banks
    Cluster 2
    Public sector banks

    Conclusion:
    Using clustering techniques, we have been able to group stocks. These grouping tend to convey a particular fundamental meaning. Among the LIX15 constituents the major classification is on the sectorial line. Among the BANKNIFTY constituents the classification lies along the public vs private ownership lines. These conclusions are in line with the one obtained from factor analysis based classification of stocks..

    HFT and algorithmic trading in India

    Algorithmic trading was introduced in India in 2009 and within a span of 5 years it has gained a considerable share in market. Some random facts and developments in the algorithmic trading scenario in India:
    • Order percentage: Orders received from co-location in cash market segment are around 70% of the total cash orders. In derivatives segment the number is even higher, around 95%. For the currency derivative segment the number is of the order of 25%. 
    • Volume percentage: In India, algorithmic trading comprises of about 20-30% of the total trading volume, which is much lower when compared to US(60-70%) and Europe(40-50%).
    • Latency: Round trip latency is the time taken from initiation of a order to receipt of its conformation. Round trip latency for co-location based connections is around 2 ms. This number should be contrasted against latency for leased line which is around 30 ms and that of VSAT connection which is around 700 ms. 
    • Strategies: In the initial days algorithmic trading was used to exploit arbitrage opportunities. Subsequently algotraders ventured into speculative trading (like market making, statistical arbitrage etc.). Lately, institutional investors have started using algorithmic trading platforms for efficient trade execution (buying/selling large quantities of stocks with minimal impact costs).
    • Regulations: SEBI has taken many proactive measures to regulate algorithmic trading. These measures include risk control checks at exchange and trader's end, half yearly audits of trading systems, pre-approval of strategies from exchanges, penalties on high daily order/trade ratio etc.