In Bayesian statistics, a hyperprior is a prior distribution on a hyperparameter, that is, on a parameter of a prior distribution.

As with the term hyperparameter, the use of hyper is to distinguish it from a prior distribution of a parameter of the model for the underlying system. They arise particularly in the use of conjugate priors.

For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:

  • The Bernoulli distribution (with parameter p) is the model of the underlying system;
  • p is a parameter of the underlying system (Bernoulli distribution);
  • The beta distribution (with parameters α and β) is the prior distribution of p;
  • α and β are parameters of the prior distribution (beta distribution), hence hyperparameters;
  • A prior distribution of α and β is thus a hyperprior.

In principle, one can iterate the above: if the hyperprior itself has hyperparameters, these may be called hyperhyperparameters, and so forth.

One can analogously call the posterior distribution on the hyperparameter the hyperposterior, and, if these are in the same family, call them conjugate hyperdistributions or a conjugate hyperprior. However, this rapidly becomes very abstract and removed from the original problem.

Read more about Hyperprior:  Purpose