Hermann Moisl - Current Work

Current Work

Implementation of natural language understanding systems using dynamic attractor sequences

He have been developing a strictly sequential natural language understanding architecture that dispenses with two foundational principles of generative linguistics, mainstream cognitive science, and much of artificial intelligence—that natural language strings have complex syntactic structure processed by structure-sensitive algorithms, and that this structure is crucial in determining string semantics. This sequential architecture was originally stated in terms of standard automata theory as a system of cooperating finite state automata, but more recently I have become interested in neuroscientific work which identifies chaotic attractor trajectory in state space as the fundamental principle of brain function at a level above that of the individual neuron, and which indicates that sensory processing, and perhaps higher cognition more generally, are implemented by cooperating attractor sequence processes. Some relevant publications are:

  • Moisl, H., (1992) 'Connectionist finite state natural language processing', Connection Science 4, 67 - 91.
  • Moisl, H.,(1997) 'Recurrent neural networks and natural language processing', New Methods in Language Processing, ed. D. Jones & H. Somers, UCL Press, London, 69-82
  • Moisl, H.,(2000) Handbook of Natural Language Processing, Marcel Dekker (with R. Dale of Macquarie University and H. Somers of UMIST)
  • Moisl, H.,(2001) 'Artificial neural networks and natural language processing', Encyclopedia of Library and Information Science, ed. M. Drake, Marcel Dekker (in press)
  • Moisl, H.,(2001) 'Linguistic computation with state space trajectories', in Emergent Neural Computational Architectures based on Neuroscience, ed. Stefan Wermter, Jim Austin, David Willshaw, Springer, 2001
Natural language corpus creation

Together with Karen Corrigan of Newcastle University and Joan Beal of Sheffield University, He have recently completed the Newcastle Electronic Corpus of Tyneside English (NECTE), a corpus of dialect speech from Tyneside in North-East England. It is based on two pre-existing corpora, one of them collected in the late 1960s by the Tyneside Linguistic Survey (TLS) project, and the other in 1994 by the Phonological Variation and Change in Contemporary Spoken English (PVC) project. NECTE amalgamates the TLS and PVC materials into a single Text Encoding Initiative (TEI)-conformant XML-encoded corpus and makes them available in a variety of aligned formats: digitized audio, standard orthographic transcription, phonetic transcription, and part-of-speech tagged. This website describes the NECTE corpus in detail, and makes it available to academic researchers, educationalists, the media in non-commercial applications, and organisations such as language societies and individuals with a serious interest in historical dialect materials.

Exploratory multivariate analysis of text corpora

Since completion of the NECTE project he have been developing a methodology for sociolinguistic and dialectological study of the corpus, the aim of which is to attempt to identify interesting regularities in phonetic variation among informants in the corpus, and any correlations between such variation and associated social factors. The methodology is based on the one formulated by the originators of much of the NECTE corpus, the Tyneside Linguistic Survey (TLS). It was radical at the time and remains so today: in contrast to the then-universal and still-dominant theory driven approach, where social and linguistic factors are selected by the analyst on the basis of some combination of an independently-specified theoretical framework, existing case studies, and personal experience of the domain of enquiry, the TLS proposed a fundamentally empirical approach in which salient factors are extracted from the data itself and then serve as the basis for model construction. To implement its approach the TLS used a particular exploratory multivariate analytical technique, hierarchical cluster analysis, but its work never progressed beyond preliminary studies for a variety of theoretical and practical reasons. His development of the TLS methodology

  • i. uses a range of linear and nonlinear exploratory analytical methods in addition to hierarchical cluster analysis, such as multidimensional scaling and self organizing maps, and
  • ii. pays particular attention to issues in data creation which are crucial to the validity of analytical results: document length normalization, dimensionality reduction, and data nonlinearity.
Relevant publications are
  • Moisl, H. and Beal, J.C. (2001) ‘Corpus Analysis and Results: Visualization Using Self-Organizing Maps’, Corpus Linguistics 2001, Lancaster University, 386-391. Electronic Publication.
  • Moisl, H, Jones, V (2005) Cluster analysis of the Newcastle electronic corpus of Tyneside English: a comparison of methods. Centre for Telematics and Information Technology ; TR 2005/65) {A-53328} University of Twente.
  • Moisl, H, Jones V., (2005) 'Cluster analysis of the Newcastle Electronic Corpus of Tyneside English: a comparison of methods', Literary and Linguistic Computing 20, 125-46.
  • Moisl, H., Maguire W, Allen W., (2006) 'Phonetic variation in Tyneside: exploratory multivariate analysis of the Newcastle Electronic Corpus of Tyneside English'. In: F. Hinskens, ed. Language Variation. European Perspectives. Amsterdam: Meertens Institute.
  • Allen, W.H., Beal, J.C., Corrigan, K.P., Maguire, W. and Moisl, H.L. (2007) ‘A Linguistic ‘Time-Capsule’: The Newcastle Electronic Corpus of Tyneside English’ in Beal, J.C., Corrigan, K.P. and Moisl, H.L. (eds.) Creating and Digitising Language Corpora, Vol. 2: Diachronic Databases. Houndmills: Palgrave Macmillan, 16-48.
  • Moisl, H. (2007) Data nonlinearity in exploratory multivariate analysis of language corpora, Computing and Historical Phonology. Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, June 28, 2007, ed. J. Nerbonne, M.Ellison, G.Kondrak, Association for Computational Linguistics, 93-100.
  • Moisl, H. (2008) 'Exploratory Multivariate Analysis', In: Lüdeling A, Kytö M, ed. Corpus Linguistics. An International Handbook (Series: Handbücher zur Sprache und Kommunikationswissenschaft/Handbooks of Linguistics and Communication Science). Berlin: Mouton de Gruyter.
  • Moisl, H., Maguire, W. (2008) 'Identifying the Main Determinants of Phonetic Variation in the Newcastle Electronic Corpus of Tyneside English', Journal of Quantitative Linguistics 15, 46-69.
  • Moisl, H. (2008) 'Using electronic corpora to study language variation: the problem of data sparsity', currently being reviewed for Studies in Language Variation. Selected papers from ICLaVE 4, ed. P. Pavlou, University of Cyprus
  • Moisl, H. (2008) 'Using electronic corpora in historical dialectology research: the problem of document length variation', currently being reviewed for Proceedings of the Second International Conference on English Historical Dialectology, University of Bergamo, ed. M. Dossena
  • Moisl, H. (2008) Normalization for Variation in Document Length in Exploratory Multivariate Analysis of Text Corpora, Proceedings of INFOS2008: 6th International Conference on Informatics and Systems, Cairo University, 27-29 March 2008

Read more about this topic:  Hermann Moisl

Famous quotes containing the words current and/or work:

    I don’t see America as a mainland, but as a sea, a big ocean. Sometimes a storm arises, a formidable current develops, and it seems it will engulf everything. Wait a moment, another current will appear and bring the first one to naught.
    Jacques Maritain (1882–1973)

    Your work is to keep cranking the flywheel that turns the gears that spin the belt in the engine of belief that keeps you and your desk in midair.
    Annie Dillard (b. 1945)