PhD scholarship in Natural Language Processing for Sign Languages

The field of Natural Language Processing (NLP) has considerably advanced in recent years thanks to a paradigmatic shift caused by the availability of massive amounts of text, deep learning models and powerful computational resources. NLP technology is currently available for many domains and languages, notably translation between spoken languages has considerably evolved. However, where Sign Languages (SLs) are of concern it is fair to say that NLP is in its infancy. There are many reasons for the current situation due to the specific characteristics of SLs and to the lesser availability of resources for most SLs. The field of SL processing has long been the concern of computer vision research: tasks such as sign language detection, sign language identification, sign language segmentation have all been addressed within a computer vision paradigm. However, given that SLs are natural languages, we firmly believe that a multi-disciplinary approach which includes linguistics and computational linguistics research in addition to computer vision should be considered. The area of Natural language processing for SLs aims to analyse sequences of signs in order to, for example, associate lexical categories to signs or disambiguate them or establishing dependency relations between them to produce linguistically rich representations to support for example the identification of specific types of information in signed utterances (e.g. who did what to whom, when, and how). In general, non-visual representations of SLs have been adopted in order to support the above mentioned processes, that is symbolic, instead of video-based representations are used to further analyse the output of computer vision processes. Such representations could be automatically produced if high quality data-sets were available for training NLP approaches. Departing from our work on Sign Language Translation in the proposed project aims at addressing with an interdisciplinary team the following objectives: (O1) Implement data collection, linguistic annotation, and data augmentation mechanisms to increase the availability of SL resources; (O2) Investigate current architectures for Sign Language Recognition (SLR) and adapt them to available datasets; (O3) Develop Natural Language Processing (e.g. PoS tagging, parsing, sense disambiguation) for the languages of the project; (O4) Adopt hybrid approaches to Sign Language Translation combining Machine Learning and Linguistic Information; and (O5) Implement technological demonstrators such as for example Information Extraction for SLs. The hired PhD will work on some of those objectives.

Requirements:
- Bachelor degree in Computer Science or Linguistics
- Master in Computational Linguistics, Natural Language Processing, Machine Learning, Linguistics with solid mathematical or computational background
- Knowledge of current Deep Learning techniques in Natural Language Processing, Machine Learning, and Statistics is desirable

Admission to the PhD program of the Department of Information and Communication Technologies at UPF is a prerequisite to enjoy the contract.
This project is carried out in collaboration with the LSC Lab (Laboratori de llengua de signes catalana) of the Department of Translation and Language Sciences, UPF. The candidate will be co-supervised by Prof. Horacio Saggion and Prof. Josep Quer.

More details available at https://www.upf.edu/web/mdm-dtic/positions

This position is co-funded by the PhD fellowship program of the Department of Information and Communication Technologies at Universitat Pompeu Fabra (DTIC-UPF), and the María de Maeztu Strategic Research Programme at DTIC-UPF on Artificial and Natural Intelligence for ICT and beyond (CEX2021-001195-M).