Evaluation Campaign at VarDial
For the first time we are organizing a comprehensive evaluation campaign on similar languages, varieties and dialects with multiple tasks.
We are widening the scope of the previous DSL shared tasks which focused on the identification of similar languages and language varieties such as the DSL 2014 and DSL 2015, and the DSL 2016 which also included Arabic dialects.
Results
We are pleased to announce results of the four VarDial Evaluation Campaign shared tasks. Please click on the following links to obtain a ranks with the results of each task.
Arabic Dialect Identification (ADI)
Cross-lingual Dependency Parsing (CLP)
Discriminating between Similar Languages (DSL)
German Dialect Identification (GDI)
The evaluation campaign report can be found here and the bib entry is the following:
@InProceedings{zampieri-EtAl:2017:VarDial,
author = {Zampieri, Marcos and Malmasi, Shervin and Ljube\v{s}i\'{c}, Nikola and Nakov, Preslav and Ali, Ahmed and Tiedemann, J\"{o}rg and Scherrer, Yves and Aepli, No\"{e}mi},
title = {Findings of the VarDial Evaluation Campaign 2017},
booktitle = {Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)},
month = {April},
year = {2017},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {1--15}
}
We are offering four shared tasks this year:
- (DSL) Discriminating between Similar Languages
Task Organizers: Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić
Contact: shervin.malmasi(at)mq.edu.au
Task Description: Fourth iteration of the DSL task featuring a multilingual dataset containing excerpts of journalistic texts. Languages included this year grouped by similarity are: Bosnian, Croatian, and Serbian, Malay and Indonesian, Persian and Dari, Canadian and Hexagonal French, Brazilian and European Portuguese, Argentine, Peninsular, and Peruvian Spanish. For more information please check the previous DSL shared task reports (2014, 2015, and 2016).
Tracks: Closed and Open Training - (ADI) Arabic Dialect Identification
Task Organizers: Ahmed Ali, Preslav Nakov
Contact: amali(at)hbku.edu.qa
Task Description: Second iteration of the task included in the DSL 2016. This year we will be releasing acoustic data along with speech transcripts for the following Arabic dialects: Egyptian, Gulf, Levantine, and North-African, and Modern Standard Arabic (MSA)
Tracks: Closed and Open Training - (GDI) German Dialect Identification
Task Organizers: Yves Scherrer, Noëmi Aepli
Contact: yves.scherrer(at)gmail.com
Task Description: In addition to Arabic dialects, we propose an analogous task on the identification of four Swiss German dialect areas: Basel, Bern, Lucerne, Zurich. We will provide manually annotated speech transcripts for all dialect areas.
Tracks: Closed Training only - (CLP) Cross-lingual Dependency Parsing
Task Organizers: Jörg Tiedemann
Contact: jorg.tiedemann(at)helsinki.fi
Task Description: The task is to develop models for parsing selected target languages without annotated training data in that language but annotated data in one or two closely related languages. We will include the following language pairs:
Target language = Croatian, Source language = Slovenian
Target language = Slovak, Source language = Czech
Target language = Norwegian, Source languages = Danish and Swedish
Tracks: Closed and Open Tranining
Submissions
To participate we request teams to fill the registration form. A maximum of 3 submissions is allowed in each task and track (open or closed).
After the shared task we will be inviting teams to submit system description papers. Papers should contain maximum 10 pages (8 pages of content + 2 pages of references). Submissions should be formatted according to the EACL template.
Important Dates
- Training Set Release: December 27, 2016
- Test Set Release: January 25, 2017
- Submissions: January 27, 2017
- Results Announced: January 29, 2017
- System Paper Submission: February 10, 2017
Please fill the registration form to participate.