Generalized Method of Moments - Description

Description

Suppose the available data consists of T iid observations {Yt }t = 1,...,T, where each observation Yt is an n-dimensional multivariate random variable. The data comes from a certain statistical model, defined up to an unknown parameter θ ∈ Θ. The goal of the estimation problem is to find the “true” value of this parameter, θ0, or at least a reasonably close estimate.

In order to apply GMM there should exist a vector-valued function g(Y,θ) such that $m(\theta_0) \equiv \operatorname{E}=0,$

where E denotes expectation, and Yt is a generic observation, which are all assumed to be iid. Moreover, function m(θ) must not be equal to zero for θθ0, or otherwise parameter θ will not be identified.

The basic idea behind GMM is to replace the theoretical expected value E with its empirical analog — sample average: $\hat{m}(\theta) = \hat{\operatorname{E}}\big \equiv \frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)$

and then to minimize the norm of this expression with respect to θ.

By the law of large numbers, for large values of T, and thus we expect that . The generalized method of moments looks for a number which would make as close to zero as possible. Mathematically, this is equivalent to minimizing a certain norm of (norm of m, denoted as ||m||, measures the distance between m and zero). The properties of the resulting estimator will depend on the particular choice of the norm function, and therefore the theory of GMM considers an entire family of norms, defined as $\| \hat{m}(\theta) \|^2_{W} = \hat{m}(\theta)'\,W\hat{m}(\theta),$

where W is a positive-definite weighting matrix, and m′ denotes transposition. In practice, the weighting matrix W is computed based on the available data set, which will be denoted as . Thus, the GMM estimator can be written as $\hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \hat{W} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)$

Under suitable conditions this estimator is consistent, asymptotically normal, and with right choice of weighting matrix asymptotically efficient.