Marco Antonio Esteves da Rocha
This paper describes the results of a corpus-based cross-linguistic survey in which over six thousand cases of anaphora were manually analysed in dialogues in English and Portuguese, each language accounting for roughly half of the sample. The analytical tool used was an annotation which classified each case of anaphora according to four properties - namely, type of anaphor; type of antecedent; topical role of the antecedent; and processing strategy - described in the paper. A combination of statistical analysis, observed regular collocations, and specific context features was used to build a theory, called the antecedent-likelihood theory, which organises the information concerning the different types of anaphor in algorithm-like entries. The paper describes the guidelines under which the theory was built, together with the results of subsequent tests carried out for the English and Portuguese versions of the theory in dialogues previously annotated and set apart for the purpose. Possible ways partially or fully to automate the annotation process and the resolution procedures specified in the antecedent-likelihood theory are also discussed.
Download compressed postscript file