Best of All Worlds for Low-resource Natural Language Processing

access_time September 16, 2020 at 02:00PM até September 16, 2020 at 03:30PM
place https://videoconf-colibri.zoom.us/j/99137153363 Password: 820632

The state of the art of several Natural Language Processing (NLP) tasks in the latest years has been dominated by deep learning systems, whose performance is enabled by the availability of massive amounts of labeled data, especially for English. Annotation is a costly process, often requiring linguistic or domain expertise, which leads many languages to lack large annotated corpora for certain domains and/or tasks. Thus, it would be desirable to enable NLP applications based on small annotated corpora (e.g., with only a few hundred sentences or less), or even none at all, not only for training purposes, but also for the evaluation of concurrent approaches. In this thesis, we address the problem of deploying the best approach for an NLP low-resource scenario by taking advantage of already existing human feedback, without having to evaluate concurrent approaches on large enough (annotated) test set. We tackle this as a problem of prediction with expert advice, making it our goal to dynamically converge to the performance of the best approach across different scenarios. We are currently applying this proposal to two distinct scenarios. The first scenario is that of using Active Learning to query the most useful sentences to be annotated by a human in sequence labeling tasks. In this scenario, the goal is to converge towards the best individual query strategy. The second scenario is a conversational agent that aims at retrieving an appropriate answer to a user request, from a collection of answers. Here, our goal is to converge to the best criterion for selecting an answer. In preliminary experiments for both scenarios, we observed convergence towards the best performing individual approaches. As our next steps, we intend to extend the experiments made so far, both in terms of scenarios evaluated and evaluation methodology.

local_offer CAT exam
person Candidato: Vânia Mendonça
supervisor_account Orientador: Prof. Luísa Coheur