FuSe - a Multi-Layered Parallel Treebank
While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and Germ...
|Division/Institute:||FB 09: Philologie
|Date of publication on miami:||27.07.2004|
|Edition statement:||[Electronic ed.]|
|Source:||Proc. Second Workshop on Treebanks and Linguistic Theories (14-15 November 2003), 213-216|
|Subjects:||Korpuslinguistik; Computerlinguistik; syntaktische Annotation; semantische Annotation|
|DDC Subject:||400: Sprache|
While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and German. While we conﬁne ourselves to annotating only one language pair, the design will be such that additional languages can be added, provided there exist appropriate translations. Our working title for the treebank is FuSe, which stands for functional semantic annotation and connotes that two or more languages are fused with each other. Although our main motivation is to contribute to linguistic research rather than to develop a corpus which is tailor-made for a particular N L P-application, we believe that the corpus will prove useful for research in several ﬁelds of application, the most obvious one being machine translation. The linguistic annotation of the FuSe corpus will contain the following layers: POS tags, constituent structure, functional relations, predicate-argument structure, and alignment information. The alignment layer is the only one which is deﬁned for a language pair rather than for a single language. Apart from this layer, the subcorpora are complete monolingual resources in their own right. In the following we will concentrate on the predicate-argument structure and on the representation of alignment information.