Menu:

Important Dates

Submission deadline
September 29th, 2016

Acceptance notification
October 21st, 2016

Camera-ready deadline
October 31st, 2016

Past Editions

VarDial 2014
LT4VarDial 2015

Next Edition

VarDial 2017

Contact

VarDial
vardialworkshop
@gmail.com

DSL Shared Task
dsl.sharedtask
@gmail.com

DSL Shared Task 2016 (Finished)

Following the success of the first two editions (held in 2014 and 2015), in VarDial 2016 we organized the third edition of the DSL shared task featuring two sub-tasks.

In the DSL shared task participants are asked to train systems to discriminate between similar languages, language varieties, and dialects.

Results

The tables containing the results are available here.

The shared task report can be found here and the bib entry is the following:

@InProceedings{malmasi-EtAl:2016:VarDial3,
     author = {Malmasi, Shervin and Zampieri, Marcos and Ljube\v{s}i\'{c}, Nikola and Nakov, Preslav and Ali, Ahmed and Tiedemann, J\"{o}rg},
     title = {Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task},
     booktitle = {Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)},
     month = {December},
     year = {2016},
     address = {Osaka, Japan},
     pages = {1--14}
}

The Tasks

This year we divided the DSL Shared Task into two sub-tasks.

Sub-task 1: Similar Languages and Language Varieties

For the Sub-task 1, we released a new version of the DSL corpus collection (DSLCC). The corpus contains 20,000 instances per country (18,000 training + 2,000 development). Each instance is an excerpt extracted from journalistic texts with the country of origin of the text.

The languages and varieties included in this year's edition grouped by similarity are:

For sub-task 1 two test sets (A and B) were released. Each of them contain 1,000 unidentified instances of each language to be classified according to the country of origin.

Sub-task 2: Arabic dialects

This year, for the first time the DSL shared task included a sub-task on Arabic dialects.

As dialects are mostly used in conversational speech, in sub-task 2 we provided a dataset containing ASR transcripts.

We released training and testing data for the following Arabic dialects: Egyptian, Gulf, Levantine, and North-African, and Modern Standard Arabic (MSA)

Submissions

We will considered two types of submission:

A total of six submissions (3 for closed and 3 for open) was allowed for each training set (A, B, C).

After the shared task participants were invited to submit a paper to the VarDial workshop describing their findings (8 pages + 2 for references). Submissions should be formatted according to the COLING template.

Dates

DSL Shared Task Organizers