The workshop aims to provide a forum for scholars working on a range of topics related to the study of linguistic variation. We anticipate discussion on computational methods and on language resources for closely related languages and language variants. Corpus-driven exploitation of different degrees of linguistic variation such as lexicon and grammar is another topic of interest.
The workshop does not draw a clear separating line between languages, language varieties and dialects, and this is on purpose, as in many cases this is a political rather than a linguistic distinction. From a computational perspective, problems faced by systems processing for example Croatian and Serbian are very similar to those that occur when dealing with Dutch and Flemish, with Brazilian and European Portuguese, or with the various dialects of Arabic.
Examples of language varieties include pluricentric languages like English, Spanish, French or Portuguese and examples of pairs of related languages include Swedish-Norwegian, Bulgarian-Macedonian, Serbian-Bosnian, Russian-Ukrainian, Irish-Gaelic Scottish, Malay-Indonesian, Turkish–Azerbaijani, Mandarin-Cantonese, Hindi–Urdu, and many other.
Papers presented at the 2014 editions of LT4CloseLang and VarDial focused on a number of relevant topics, including machine translation between closely related languages, adaptation of POS taggers and parsers for similar languages and language varieties, compilation of corpora for language varieties, spelling normalization, and finally the discrimination or identification of similar languages which is the topic of the DSL shared task.
Together with LT4VarDial workshop we will be organizing the second edition of the Discriminating Between Similar Languages (DSL) Shared Task.