Morphological and Orthographic Tools for English


Tools for inflectional morphological analysis and generation, and for determining the orthography of the indefinite article are now available.

The tools are:

morpha
a fast and robust morphological analyser for English based on finite-state techniques that returns the lemma and inflection type of a word, given the word form and its part of speech. (The latter is optional but accuracy is degraded if it is not present).
morphg
generates a word form given a specification of the lemma, part-of-speech, and the type of inflection required. Morphg is derived automatically from morpha, ensuring consistency and reversability of the tools. An option controls British English or American English behaviour with respect to consonant doubling.
ana
postprocesses text to insert the correct form of the indefinite article (i.e. a or an). Ana encodes a set of rules keying off the pronunciation of the next word (so an is produced if the following word starts with a vowel sound, and a otherwise). The tool handles plain text, part of speech-tagged text and SGML among other possible formats.

The tools are implemented using widely-available unix utilities. Ana is free for research purposes; for any proposed commercial use please contact John Carroll. Morpha and morphg are included in the open source RASP system.


A description of morpha and morphg is published in

A description of ana is published in

Please cite one of these papers when describing any research using the tools.

The tools were initially developed in the UK EPSRC-funded PSET project and were further developed on the RASP project.


Back to John A. Carroll's homepage