Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

27 de marzo de 2022

In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Leer más »

Páginas: 1 2 3

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

30 de julio de 2020

This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack’s modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks.

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Leer más »

Páginas: 1 2 3

Language (Technology) is Power: A Critical Survey of «Bias» in NLP

7 de junio de 2020

We survey 146 papers analyzing «bias» in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing «bias» is an inherently normative process.

Language (Technology) is Power: A Critical Survey of «Bias» in NLP Leer más »

Páginas: 1 2 3

‘WinoGrande: An Adversarial Winograd Schema Challenge at Scale’

7 de febrero de 2020

An article by the Allen Institute of Artificial Intelligence highlights a pending issue: machines do not really understand what humans write or read.

‘WinoGrande: An Adversarial Winograd Schema Challenge at Scale’ Leer más »

Leer, asistir y comentar: una arquitectura profunda para la generación automática de comentarios

15 de octubre de 2019

La generación automática de comentarios de noticias es una nueva plataforma para las técnicas de generación de lenguaje natural. En este documento, proponemos un procedimiento de «lectura, comentario y atención» para la generación de comentarios de noticias y formalizamos el procedimiento con una red de lectura y una red de generación. La red de lectura comprende un artículo de noticias y extrae algunos puntos importantes de él, luego la red de generación crea un comentario al atender los puntos discretos extraídos y el título de la noticia.

Leer, asistir y comentar: una arquitectura profunda para la generación automática de comentarios Leer más »

Páginas: 1 2 3

Natural Language Processing (Paper)