Natural Language Processing (Paper)

R0:884a5f6297216a3c5e883ae639b13721-Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack’s modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks.

Leer, asistir y comentar: una arquitectura profunda para la generación automática de comentarios

La generación automática de comentarios de noticias es una nueva plataforma para las técnicas de generación de lenguaje natural. En este documento, proponemos un procedimiento de “lectura, comentario y atención” para la generación de comentarios de noticias y formalizamos el procedimiento con una red de lectura y una red de generación. La red de lectura comprende un artículo de noticias y extrae algunos puntos importantes de él, luego la red de generación crea un comentario al atender los puntos discretos extraídos y el título de la noticia.