March 31, 2011

Information Criterion in STATA

As illustrated from yesterday's exercise, you might find yourself in a situation where you will wonder how many lags do you use when you come up with an autoregression (AR) model. This is an important issue in economic modeling because, as much as we like to put more variables in a model to capture realistically the behavior of the dependent variable, introducing more variables also introduce more errors. This modeling philosophy of parsimony is popularized by Box and Jenkins (1976, Time Series Analysis: Forecasting and Control, Holden-Day). So Box and Jenkins advocated using as few parameters as possible in modeling (popularity of the parsimony philosophy that time also coincided with the famous Robert Lucas' critique).

There are many lag-order selection statistics, or information criteria, that are out there. The most famous four are:

1. Final prediction error (FPE) created by Hirotsugu Akaike (1969, "Fitting autoregressive models for regression," Annals of the Institute of Statistical Mathematics, 21:243-47):


2. Akaike information criterion (AIC) also created by Akaike (1974, "A new look at the statistical model identification," IEEE Transactions on Automatic Control, 19(6):716-23):


3. Bayesian information criterion (BIC) created by Gideon E. Schwarz (1978):


4. Hannan-Quinn information criterion (HQC) created by Edward J. Hannan and B.G Quinn (1979):


For all formulas above, p is the number of AR lags, q is the number of moving average (MA) lags (yes, these statistics are applicable to ARMA models), and Lnn) is the log-likelihood value of the function.

The FPE is used primarily for AR models whereas the last three are for general ARMA models. As you can see, these three are similar in a sense they basically contain two terms: the first term captures the advantages of having more variables in that the model's fit goes up; but the second term captures the disadvantages--this is where the philosophy of parsimony kicks in. In all four, the lowest value indicates the most appropriate number of lags.

As I've shown yesterday, you can easily calculate these statistics in STATA after estimation with the use of the the command VARSOC. But if you want to calaculate these statistics directly in STATA, read on. To illustrate, suppose we simulate an ARMA(2,2) process exactly the same thing we did before:


set seed 1
sim_arma y, arcoef(.66666667 .11111111) macoef(.25 .25) et(e) nobs(600) sigma(1) time(time)


Since we simulated the data this way, we know that the correct model should have two AR lags and two MA lags. So let's check three variants of this model: the correct specification; a specification with only one MA lag; and a specification with three MA lags. So, throughout, we use the correct number of AR lags (that is, two).

We first estimate using the ARIMA command, which uses the maximum likelihood method:


arima y, ar(1/2) ma(1/2) nocons


You just interchange ma(1/2) with ma(1/1) for one lag and ma(1/3) for three lags. Then after each estimation, we calculate the information criteria. To calculate AIC:


di ((-2)*e(ll))+(2*(e(ar_max)+e(ma_max)))


To calculate BIC:


di ((-2)*e(ll))+((e(ar_max)+e(ma_max))*(ln(e(N))))


Finally, to calculate HQC:


di ((-2)*e(ll))+(2*(e(ar_max)+e(ma_max))*(ln(ln(e(N)))))


So these commands simply display results from the estimation stored in STATA. e(ll) is the maximum likelihood value, e(ar_max) is the number of AR lags, e(ma_max) is the number of MA lags, and e(N) is the number of observations. The resulting statistics are compiled in the table below:


Based on the results above, the model with two MA lags have the lowest value in all three criterions, which means that we should use two MA lags. This is not unexpected as the data we simulated do follow a two-lag AR/two-lag MA process.

In closing, it's also easy to create an information criterion of your own provided that your proposed formula should capture two things: advantages of having more variables; and a penalty for having more variables. I even created one of my own. I call it Newey-Akaike information criterion, or NAIC. It's a cool name since the acronym also looks like an anagram of my last name. The other reason for the name is because the criterion I propose is what a think a mix of the AIC and a formula for lag selection parameter I adopted from Newey and West (1994)--and so the "N" is for Newey and West and "A" is for Akaike:


NAIC captures the advantages of having more variables (the first term, which is no different from the others) and the disadvantage of having so (the second term). We can go through the same process again but we use the following command for calculating NAIC:


di ((-2)*e(ll))+(2*(e(ar_max)+e(ma_max))*ln(3*((e(N)/100)^(2/25))))


The result of using this proposed criterion is as follows:


As it turns out, above shows that NAIC is also a valid criterion. NAIC indicates that the appropriate number of MA lags is two, being the lowest value--as it should be.