Colorless Green Ideas Sleep Furiously - Statistical Challenges

Statistical Challenges

Fernando Pereira of the University of Pennsylvania has fitted a simple statistical Markov model to a body of newspaper text, and shown that under this model, "Furiously sleep ideas green colorless" is about 200,000 times less probable than "Colorless green ideas sleep furiously".

This statistical model defines a similarity metric, whereby sentences which are more like those within a corpus in certain respects are assigned higher values than sentences less alike. Pereira's model assigns an ungrammatical version of the same sentence a lower probability than the syntactically correct form demonstrating that statistical models can learn grammaticality distinctions with minimal linguistic assumptions. However, it is not clear that the model assigns every ungrammatical sentence a lower probability than every grammatical sentence. That is, "colorless green ideas sleep furiously" may still be statistically more "remote" from English than some ungrammatical sentences. To this, it may be argued that no current theory of grammar is capable of distinguishing all grammatical English sentences from ungrammatical ones.

Read more about this topic:  Colorless Green Ideas Sleep Furiously

Famous quotes containing the word challenges:

    A powerful idea communicates some of its strength to him who challenges it.
    Marcel Proust (1871–1922)