-
Notifications
You must be signed in to change notification settings - Fork 265
Description
Hello all,
at IBM Research, we have been working on a layer of unified semantic annotations for a range of languages. We use a data-driven approach in which we re-use existing English Proposition Bank frame and role labels for new target languages, followed by a process of manual curation (ACL 2015, ACL 2016, EMNLP 2016).
For instance, consider the German sentence "Seine Arbeit wird von ehrenamtlichen Helfern und Regionalgruppen des Vereins unterstützt" (His work is supported by volunteers and regional groupings of the association). In CoNLL format, it looks like this, with English PropBank labels in the last two columns:
| Id | Form | POS | HeadId | Deprel | Frame | Role |
|---|---|---|---|---|---|---|
| 1 | Seine | DET | 2 | det:poss | _ | _ |
| 2 | Arbeit | NOUN | 11 | nsubjpass | _ | A1 |
| 3 | wird | AUX | 11 | auxpass | _ | _ |
| 4 | von | ADP | 6 | case | _ | _ |
| 5 | ehrenamtlichen | ADJ | 6 | amod | _ | _ |
| 6 | Helfern | NOUN | 11 | nmod | _ | A0 |
| 7 | und | CONJ | 6 | cc | _ | _ |
| 8 | Regionalgruppen | NOUN | 6 | conj | _ | _ |
| 9 | des | DET | 10 | det | _ | _ |
| 10 | Vereins | NOUN | 8 | nmod | _ | _ |
| 11 | unterstützt | VERB | 0 | root | support.01 | _ |
| 12 | . | PUNCT | 11 | punct | _ | _ |
The German verb 'unterstützt' is labeled as evoking the 'support.01' frame with two roles: "Seine Arbeit" (his work) is labeled A1 (project being supported) and "ehrenamtlichen Helfern und Regionalgruppen des Vereins" (volunteers and regional groupings of the association) is labeled A0 (the helper).
With such data, we can create SRL systems that predict English PropBank labels for many different languages. See a recent demo screencast of this SRL for English, French and German here.
Contribute to UD?
We are now looking into releasing parts of this data to the research community. In particular, we are thinking of contributing this layer of annotation to the universal dependencies data sets (the sentence above is from the German UD dataset).
For this, we would like to know 1) if there is interest from your side to include such labels into the data sets and 2) if so, how such a contribution could be organized. Please let us know your thoughts on this!
Cheers,
Alan
__
Alan Akbik
IBM Research Almaden
http://alanakbik.github.io/