FuSe - a Multi-Layered Parallel Treebank

While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and Germ...

Authors: Cyrus, Lea
Feddes, Hendrik
Schumacher, Frank
Division/Institute:FB 09: Philologie
IKM-Service
Document types:Article
Media types:Text
Publication date:2003
Date of publication on miami:27.07.2004
Modification date:06.04.2022
Edition statement:[Electronic ed.]
Source:Proc. Second Workshop on Treebanks and Linguistic Theories (14-15 November 2003), 213-216
Subjects:Korpuslinguistik; Computerlinguistik; syntaktische Annotation; semantische Annotation
DDC Subject:400: Sprache
License:InC 1.0
Language:Englisch
Format:PDF document
URN:urn:nbn:de:hbz:6-85659523905
Permalink:https://nbn-resolving.de/urn:nbn:de:hbz:6-85659523905
Digital documents:0311_tlt.pdf

While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and German. While we confine ourselves to annotating only one language pair, the design will be such that additional languages can be added, provided there exist appropriate translations. Our working title for the treebank is FuSe, which stands for functional semantic annotation and connotes that two or more languages are fused with each other. Although our main motivation is to contribute to linguistic research rather than to develop a corpus which is tailor-made for a particular N L P-application, we believe that the corpus will prove useful for research in several fields of application, the most obvious one being machine translation. The linguistic annotation of the FuSe corpus will contain the following layers: POS tags, constituent structure, functional relations, predicate-argument structure, and alignment information. The alignment layer is the only one which is defined for a language pair rather than for a single language. Apart from this layer, the subcorpora are complete monolingual resources in their own right. In the following we will concentrate on the predicate-argument structure and on the representation of alignment information.