Model
Considering observations in the form of co-occurrences of words and documents, PLSA models the probability of each co-occurrence as a mixture of conditionally independent multinomial distributions:
The first formulation is the symmetric formulation, where and are both generated from the latent class in similar ways (using the conditional probabilities and ), whereas the second formulation is the asymmetric formulation, where, for each document, a latent class is chosen conditionally to the document according to, and a word is then generated from that class according to . Although we have used words and documents in this example, the co-occurrence of any couple of discrete variables may be modelled in exactly the same way.
So, the number of parameters is equal to . The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents.
Their parameters are learned using the EM algorithm.
Read more about this topic: Probabilistic Latent Semantic Analysis
Famous quotes containing the word model:
“When you model yourself on people, you should try to resemble their good sides.”
—Molière [Jean Baptiste Poquelin] (16221673)
“The playing adult steps sideward into another reality; the playing child advances forward to new stages of mastery....Childs play is the infantile form of the human ability to deal with experience by creating model situations and to master reality by experiment and planning.”
—Erik H. Erikson (20th century)
“...that absolutely everything beloved and cherished of the bourgeoisie, the conservative, the cowardly, and the impotentthe State, family life, secular art and sciencewas consciously or unconsciously hostile to the religious idea, to the Church, whose innate tendency and permanent aim was the dissolution of all existing worldly orders, and the reconstitution of society after the model of the ideal, the communistic City of God.”
—Thomas Mann (18751955)