VarDial 2016 @ COLING - Osaka, Japan

Important Dates

Submission deadline
September 29th, 2016

Acceptance notification
October 21st, 2016

Camera-ready deadline
October 31st, 2016

Past Editions

VarDial 2014
LT4VarDial 2015

Next Edition

VarDial 2017

Contact

VarDial
vardialworkshop
@gmail.com

DSL Shared Task
dsl.sharedtask
@gmail.com

Follow @vardialworkshop

DSL Shared Task 2016 (Finished)

Following the success of the first two editions (held in 2014 and 2015), in VarDial 2016 we organized the third edition of the DSL shared task featuring two sub-tasks.

In the DSL shared task participants are asked to train systems to discriminate between similar languages, language varieties, and dialects.

Results

The tables containing the results are available here.

The shared task report can be found here and the bib entry is the following:

@InProceedings{malmasi-EtAl:2016:VarDial3,
     author = {Malmasi, Shervin and Zampieri, Marcos and Ljube\v{s}i\'{c}, Nikola and Nakov, Preslav and Ali, Ahmed and Tiedemann, J\"{o}rg},
     title = {Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task},
     booktitle = {Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)},
     month = {December},
     year = {2016},
     address = {Osaka, Japan},
     pages = {1--14}
}

The Tasks

This year we divided the DSL Shared Task into two sub-tasks.

Sub-task 1: Discriminating between similar languages and national language varieties.
Sub-task 2: Arabic dialect identification.

Sub-task 1: Similar Languages and Language Varieties

For the Sub-task 1, we released a new version of the DSL corpus collection (DSLCC). The corpus contains 20,000 instances per country (18,000 training + 2,000 development). Each instance is an excerpt extracted from journalistic texts with the country of origin of the text.

The languages and varieties included in this year's edition grouped by similarity are:

Bosnian, Croatian, and Serbian
Malay and Indonesian
Portuguese: Brazil and Portugal
Spanish: Argentina, Mexico, and Spain
French: France and Canada

For sub-task 1 two test sets (A and B) were released. Each of them contain 1,000 unidentified instances of each language to be classified according to the country of origin.

Test set A (in-domain): newspaper texts.
Test set B (out-of-domain): social media data.

Sub-task 2: Arabic dialects

This year, for the first time the DSL shared task included a sub-task on Arabic dialects.

As dialects are mostly used in conversational speech, in sub-task 2 we provided a dataset containing ASR transcripts.

We released training and testing data for the following Arabic dialects: Egyptian, Gulf, Levantine, and North-African, and Modern Standard Arabic (MSA)

Test set C: ASR texts from Arabic dialects.

Submissions

We will considered two types of submission:

Closed submission: Using ONLY the training corpus provided by the DSL organizers.
Open submission: Using ANY corpus for training (including previous versions of the DSLCC).

A total of six submissions (3 for closed and 3 for open) was allowed for each training set (A, B, C).

After the shared task participants were invited to submit a paper to the VarDial workshop describing their findings (8 pages + 2 for references). Submissions should be formatted according to the COLING template.

Dates

Training set release: ~~August 2nd, 2016~~ August 5th, 2016
Test set release: ~~August 29th, 2016~~ September 5th, 2016
Results submission due: ~~August 31st, 2016~~ September 7th, 2016
Results announced: ~~September 2nd, 2016~~ September 9th, 2016
Paper submission deadline: September 29th, 2016
Acceptance Notification: ~~October 14th, 2016~~ October 21st, 2016
Camera-ready versions: ~~October 30th, 2016~~ October 31st, 2016

DSL Shared Task Organizers

Marcos Zampieri (Saarland University and DFKI, Germany)
Preslav Nakov (Qatar Computing Research Institute, Qatar)
Shervin Malmasi (Harvard Medical School, United States)
Liling Tan (Saarland University, Germany)
Nikola Ljubešić (Jozef Stefan Institute, Slovenia and University of Zagreb, Croatia)
Jörg Tiedemann (University of Helsinki, Finland)
Ahmed Ali (Qatar Computing Research Institute, Qatar)

VarDial 2016

Menu: