Generalized Method of Moments - Implementation

Implementation

One difficulty with implementing the outlined method is that we cannot take W = Ω−1 because, by the definition of matrix Ω, we need to know the value of θ0 in order to compute this matrix, and θ0 is precisely the quantity we don’t know and are trying to estimate in the first place.

Several approaches exist to deal with this issue, the first one being the most popular:

• Two-step feasible GMM:
• Step 1: Take W = I (the identity matrix), and compute preliminary GMM estimate . This estimator is consistent for θ0, although not efficient.
• Step 2: Take
where we have plugged in our first-step preliminary estimate . This matrix converges in probability to Ω−1 and therefore if we compute with this weighting matrix, the estimator will be asymptotically efficient.
• Iterated GMM. Essentially the same procedure as 2-step GMM, except that the matrix is recalculated several times. That is, the estimate obtained in step 2 is used to calculate the weighting matrix for step 3, and so on. Such estimator, denoted, is equivalent to solving the following system of equations: $\bigg(\frac{1}{T}\sum_{t=1}^T \frac{\partial g}{\partial\theta'}(Y_t,\hat\theta_{(i)})\bigg)' \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(i)})g(Y_t,\hat\theta_{(i)})'\bigg)^{\!-1} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(i)})\bigg) = 0$
Asymptotically no improvement can be achieved through such iterations, although certain Monte-Carlo experiments suggest that finite-sample properties of this estimator are slightly better.
• Continuously Updating GMM (CUGMM, or CUE). Estimates simultaneously with estimating the weighting matrix W: $\hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)g(Y_t,\theta)'\bigg)^{\!-1} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)$
In Monte-Carlo experiments this method demonstrated a better performance than the traditional two-step GMM: the estimator has smaller median bias (although fatter tails), and the J-test for overidentifying restrictions in many cases was more reliable.

Another important issue in implementation of minimization procedure is that the function is supposed to search through (possibly high-dimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.