Corpus-Based Approaches to Discourse and Dialogue

Nancy Ide

The availability of large-scale corpora in recent years has impacted work in all areas of NLP. Corpus-based work on discourse and dialogue has developed slightly more slowly than other areas, due to the lack of appropriately annotated corpora and the labor required to create them. However, significant work employing corpus-based methods on discourse and dialogue led to a special issue of Computational Linguistics devoted to the topic in 1997. Since then, efforts such as the Discourse Resource Initiative have enabled corpus-based work on discourse and dialogue to progress more rapidly than in the past, leading to more and more empirical investigations of discourse phenomena and validation of theory on real language samples. This thematic session is intended to provide a forum for this most recent work, much of which currently appears scattered in the proceedings of various computational linguistics conferences. The overall goal is to enable a coherent assessment of progress in corpus-based work on discourse and dialogue to date, especially in the light of its relevance to practical applications such as discourse parsing, summarization, and generation.

Papers are invited that deal with corpus-based work on any aspect of discourse and dialogue analysis, including co-reference, segmentation, structure , parsing, generation, etc. We are especially interested in papers that provide an assessment of previous similar work and relate their methodology to the broad theoretical trends in the field.

