The University of Sussex

A description of an annotation scheme to analyse anaphora in dialogues

Marco Antonio Esteves da Rocha

This paper describes an annotation scheme designed for the analysis of anaphoric relations in dialogues. The scheme was developed by annotating a relatively large number of anaphora cases in English and Portuguese, using dialogue corpora. The corpora used as sources of data were the reformatted version of the London Lund Corpus, as stored in the School of Cognitive and Computing Sciences, at the University of Sussex, for the dialogues in English; and a corpus of dialogues in Portuguese collected for the purposes of this research, named the Rio de Janeiro Clinical Dialogues Corpus. The annotation scheme is intended as an analytical tool which attempts to show the relations between anaphors, as they appear verbatim in spoken language, and the required processing for the identification of the antecedent. Each case of anaphora is classified according to four properties, namely: type of anaphor, type of antecedent, topical role of the antecedent, and processing strategy. The set of categories used to classify the anaphora cases according to these properties is described in the paper. The rationale underlying the choice of properties is also discussed. The annotation scheme is thought to be useful for the purposes of encoding discourse relations in text, as well as a way of supporting anaphora resolution in natural language processing.

