Output     Options     Examples     References

PANEL obtains estimates of linear regression models for panel data (several observations or time periods for each individual). Total, between groups, within groups, and variance components may be obtained. In addition one and two-way random effects models may be estimated by maximum likelihood. The data may be unbalanced (different number of observations per individual). PANEL can also compute means by group and perform F tests between groups.

PANEL (ALL, BETWEEN, BYID, FEPRINT, HCOMEGA=BLOCK or DIAGONAL, HCTYPE=0 or 1, ID=<id series>, MEAN, PRINT, REG, REI, REIT, ROBUST, SILENT, T=<number of time periods>, TERSE, TIME=<time series>, TOTAL, VARCOMP, VBET=<between variance>, VSMALL, VWITH=<within variance>, WITHIN, Nonlinear options) <dependent variable> <list of independent variables> ;


The basic PANEL statement is like the OLSQ statement: first list the dependent variable and then the independent variables. C is optional; an intercept term is central to these models and will be added if it is not present. You may have as many independent variables as you like subject to the overall limits on the number of arguments per statement and the amount of working space, as well as the number of data observations you have available. The observations over which the models are computed are determined by the current sample. PANEL treats missing values, lags, and leads correctly. That is, lags and leads are applied only within an individual.

Your data must be set up with all the time periods for each individual together. Additionally, you must specify when the observations for one individual end and data for the next individual begins. The default method is to provide a series named @ID which takes on different values for each individual. If your data are balanced (the same number of time periods for every individual), the T= option can be used. If the data are not in this order, the SORT command can be used to reorder them; you could also sort the data by year and then individual if you wish to do variance components in the time dimension. Usually it is best to use the FREQ (PANEL) command at the top of your run to specify such ID variables, internal frequency and starting date, etc. Then these options will be used for all PANEL, AR1, GENR, etc. commands within the run.

The models you wish to estimate are specified in the options list. The default is to estimate the total, between, within, and variance components models.  For the VARCOMP (random effects) model, there are additional options that specify how to compute the variance components. Small- or large sample formulas may be used, or the user can supply the values directly. If negative variances are computed using the small sample method, the method switches over to the large sample formulas, which always result in positive values. PANEL also computes a Hausman test for correlated effects by comparing the WITHIN (fixed effects) and VARCOMP (random effects) estimators.

The REI and REIT options are used to obtain maximum likelihood estimates of the one and two-way random effects models.


The output begins with a title and a summary of the panel structure: number of individuals (NI), number of time periods (T), and total number of observations (NOB). If the data are unbalanced, TMIN and TMAX will be printed. For each estimator, a table of regression coefficients and their standard errors is printed, along with name of the dependent variable, the sum of squared residuals, standard error of the regression, mean and standard deviation of the dependent variable, R-squared, and adjusted R-squared.

Other output varies by estimator. If the data are unbalanced, the Ahrens-Pincus measure of the degree of unbalancedness is also printed; this measure is one for balanced data; values less than one provide an indication of how far the data is from balanced. See the method section for the definition of this statistic and the reference for details on its interpretation.

MEAN prints a table of means for each individual. @MEAN (#obs*#vars) is stored, and excludes any constant term.

BYID prints an F test vs. TOTAL (labelled F-stat for A,B=Ai,Bi), and an F test vs. WITHIN (labelled F-stat for Ai,B=Ai,Bi), in the output of the respective estimators. Only @COEFI (the individual coefficient estimates), @LOGLI, and @SSRI (the individual sum or squared residuals) are stored. Use the PRINT option to print @COEFI.

WITHIN prints an F test vs. TOTAL (labelled F-stat for A,B=Ai,B), and stores @FIXED effects vector.

VARCOMP prints the actual variance components, the method used to compute them, and the implied differencing factor (THETA). A Hausman specification test comparing VARCOMP (null hypothesis) and WITHIN is computed.

PANEL stores the standard regression results in data storage for later use using @names, but with B,  T, V, W, REI, and REIT appended to distinguish between the different estimators. For example, @COEFW is the within coefficients, @RESW are the within residuals, and @SSRV is the sum of squared residuals from VARCOMP. @RESB is a matrix.

In the table below, #vars is equal to the number of right hand side variables plus one (for the constant) for the T, B, W, and V estimators. For the REI estimator, #vars includes the estimate of RHO_I (the within group correlation) and SIGMA2 (the total standard error). For the REIT estimator, #vars includes the estimate of RHO_I, the estimate of RHO_T, the within time correlation, and SIGMA2 (the total standard error).








Name of the dependent variable




Sum of squared residuals (@SSRI=BYID, etc.)




Variance of residuals (@S2B=BETWEEN, etc.)




Standard error of the regression




Mean of the dependent variable




Standard deviation of the dependent variable




Number of observations




Ahrens-Pincus unbalancedness in i




Ahrens-Pincus unbalancedness in t








Adjusted R-squared




Number of coefficients




Number of identified coefficients

(number with non-zero standard errors )




LM heteroskedasticity test




P-value of LM heteroskedasticity test




Durbin-Watson autocorrelation test




Upper bound on P-value of DW




Lower bound on P-value of DW




value of the log likelihood




Schwarz-Bayes information criterion




Akaike information criterion




Hausman test value




Hausman test p-value




Hausman test degrees of freedom




List of names of right hand side variables




Coefficient estimates




Standard errors








Estimated fixed effects




Standard errors on fixed effects




t-statistics on fixed effects




p-values associated with @TAI




Fixed effect estimates as a series




Variance-covariance of estimated coefficients




Residuals = actual - fitted values of the dependent variable.


The model estimated is

PANEL computes means for each variable by individual. These are used directly in the BETWEEN regression. WITHIN subtracts the individual means from each variable and runs a regression on this transformed data (any variables which are constant over time for every individual are not identified).

VARCOMP does a transformation similar to WITHIN. (1-SQRT(theta)) times the mean is subtracted from each variable (including the constant term), where theta is given by

T does not have to be the same for each individual. The small and large sample formulas used for the variance components are:


small sample

large sample






(not used)




If the small sample formula produces a non-positive variance, PANEL switches over to the large sample formulas automatically. The large sample formulas are asymptotically correct if T is (becomes) large relative to NI (not usually the case); otherwise they will be biased. Note that if theta=1, this corresponds to a zero between variance and VARCOMP will produce the same estimates as TOTAL. If theta=0, this corresponds to a zero within variance, and VARCOMP will be the same as WITHIN.

For each F test (described under Output), a P-value and an alternative critical value are printed. The critical value has a size which becomes smaller as the number of observations grows -- this is an alternative to the conventional testing procedure, which is certain to reject all point null hypotheses when sample sizes become large. It is based on a Bayesian flat prior, and computed from the formula in the Leamer reference:

Where T = total number of observations, k = number of estimated parameters in the unrestricted model, and p = the number of restrictions.

All regressions are computed with the standard orthonormalized data matrices to insure accurate coefficients and variance estimates under possible multicollinearity (methods using moment matrices are less accurate).

The Durbin-Watson test and bounds on its P-values are computed following the Bhargava et al reference, extended to the unbalanced data case. The P-values are computed using the Farebrother-Imhof method, since there can be multiple equal eigenvalues.

The REI estimates are obtained with a grid search over RHO_I in order to avoid the problem of multiple local optima. Estimates are then refined to choose the global optimum and multiple optima are reported. RHO_I is bounded between -1/(Max(T)-1) and 1, where Max(T) is the maximum number of observations per individual. See Maddala and Nerlove (1971). The REIT estimates are obtained using the method of Davis (2002). The Ahrens-Pincus measure of unbalancedness in dimension i is defined as follows:

This can be interpreted as the ratio of the harmonic and arithmetic means of the T(i) over the sample of individuals. Note that AP is always less than or equal to 1 and that it equals one only when T(i)=T for all i.


ALL/NOALL turns all regressions on or off (equivalent to the combination of TOTAL, BETWEEN, WITHIN, VARCOMP, REI, REIT).

BETWEEN/NOBETWEEN selects the "between" estimator -- a regression on the means for each individual.

BYID/NOBYID does a separate regression for each individual, and computes F tests for equality with the TOTAL and WITHIN estimators.

FEPRINT/NOFEPRINT specifies that the fixed effect estimates are to be printed as well as stored.

HCOMEGA = BLOCK or DIAGONAL specifies the form of the Ω = E[uu'] matrix to use when computing ROBUST standard errors. Ordinarily, the default is BLOCK for PANEL, which allows for cross-time correlation of the disturbances within individuals. This feature can be used for any kind of grouped data, simply by ensuring that the relevant PANEL setup has been defined.

HCTYPE = 0 or 1 specifies whether to apply a degrees of freedom correction to the robust s.e.s (0 is no and 1 is yes).

ID= the name of a series which takes on a different value for each individual. The default is @ID; alternatives are the T= and TIME= options.

MEAN/NOMEAN causes the means for each individual to be printed in a table. This can be used in conjunction with the NOREG option to print means only (to suppress all the default regression models). These individual means are stored in the NI x (1+NX) matrix @MEAN, where the first column is the dependent variable.

PRINT/NOPRINT prints @COEFI in conjunction with BYID, and prints @FIXED for within.

REG/NOREG is used with the MEAN option above. To suppress some regression models, but print others, use the individual options -- NOBETW to suppress the BETWEEN output, etc.

REI/NOREI specifies that ML estimates of the one-way random effects model are to be obtained. @START may be used to supply starting values.

REIT/NOREIT specifies that ML estimates of the two-way random effects model are to be obtained. This requires the TIME= option for unbalanced data in FREQ(PANEL). @START may be used to supply starting values.

ROBUST/NOROBUST calculates heteroskedasticity-robust standard errors (HCTYPE=1; see OLSQ) for the WITHIN coefficients. If this option is used, the Hausman test comparing WITHIN and VARCOMP is not computed.

SILENT/NOSILENT can be used to turn off all the regression output.

T= the number of time periods for each individual (for balanced data only). For unbalanced data, use the ID= option.

TERSE/NOTERSE can be used to turn off most of the regression output, except the coefficients and standard errors.

TIME= the name of a time period series which increases in value for each individual and decreases between individuals. Alternatives are the ID= and T= options. Example: TIME=YEAR. This is not considered sufficient for identifying individuals, since the last time period for one individual may be less than the first time period of the next individual.

TOTAL/NOTOTAL selects the "total" or "pooled" estimator -- a plain OLS regression on the whole sample.

VARCOMP/NOVARCOMP selects the "variance components" or "random effects" estimator. The method of selecting the variance components is controlled with the VBET, VSMALL, and VWITH options described below. Unbalanced data are not a problem. For variance components in the time dimension, use the REIT option, or sort your data by time period and use time as the ID.

VBET= specifies the value of the "between" variance for VARCOMP.

VSMALL/NOVSMALL selects the small sample variance components formulas for VARCOMP (as opposed to the large sample formulas). Small sample formulas are unbiased but can result in negative variances, while large sample formulas are biased but always yield positive variances. To supply your own variance values, use VBET= and VWITH=.

VWITH= specifies the value of the "within" variance for VARCOMP.

WITHIN/NOWITHIN selects the "within" or "fixed effects" estimator (different intercepts for each individual).

Nonlinear options may be used for the REI and REIT estimators. See NONLINEAR.


Global FREQ (PANEL) command, with ID variable to identify individuals:



Estimate all models (7 years per individual, balanced data), and print individual means:


Print VARCOMP output only, using @ID or FREQ(PANEL) to distinguish individuals:


Estimate all models except BYID, use large sample formulas for VARCOMP:


Print individual means only:



Ahn, S.C., and P. Schmidt, “Efficient Estimation of Panel Data Models with Exogenous and Lagged Dependent Regressors,” Journal of Econometrics 68 (1995) 5-27.

Ahrens, H., and R. Pincus, "On two measures of unbalancedness in a one-way model and their relation to efficiency,” Biometric Journal 23 (1981), pp. 227-235.

Baltagi, Badi, Econometric Analysis of Panel Data, Wiley & Sons, New York, 1995 (first edition).

Bhargava, A., L. Franzini, and W. Narendanathan, “Serial Correlation and the Fixed Effects Model”, Review of Economic Studies XLIX (1982), pp.533-549.

Chamberlain, Gary, “Multivariate Regression Models for Panel Data,” Journal of Econometrics 18(1982), pp. 5 46.

Chamberlain, Gary, “Panel Data,” in Griliches and Intriligator (eds.), Handbook of Econometrics, Volume II, North Holland Publishing Co., Amsterdam, 1985.

Davis, Peter, "Estimating Multi-Way Error Components Models with Unbalanced Data Structures," Journal of Econometrics 106 (July 2002), pp. 67-95.

Farebrother, R. W., "Algorithm AS 256", Applied Statistics 39, 1990. Pascal code posted on StatLib.

Hsiao, Cheng, Analysis of Panel Data, Cambridge University Press, Cambridge, England, 1986.

Leamer, Edward E., Specification Searches: Ad Hoc Inference with Nonexperimental Data, Wiley, New York, 1978, p. 114.

Maddala, G. S., Econometrics, McGraw-Hill, New York, 1977, pp. 326-329.

Maddala, G. S., and M. Nerlove. Econometrica (1971).

Nerlove, Marc, Likelihood Inference in Econometrics, Academic Press, New York, 2000.