Numbered

R0:884a5f6297216a3c5e883ae639b13721-Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications

R0:7194c033807f71949262faf563aef58e-Scientific Visualization: Python + Matplotlib

Scientific Visualization: Python + Matplotlib

The Python scientific visualisation landscape is huge. It is composed of a myriad of tools, ranging from the most versatile and widely used down to the more specialised and confidential. Some of these tools are community based while others are developed by companies. Some are made specifically for the web, others are for the desktop only, some deal with 3D and large data, while others target flawless 2D rendering.

R0:62a6aa3b4882ad9b194a4ae5c97b4d58-Ethics-based auditing of automated decision-making systems: intervention points and policy implications

Ethics-based auditing of automated decision-making systems: intervention points and policy implications

Organisations increasingly use automated decision-making systems (ADMS) to inform decisions that affect humans and their environment. While the use of ADMS can improve the accuracy and efficiency of decision-making processes, it is also coupled with ethical challenges. Unfortunately, the governance mechanisms currently used to oversee human decision-making often fail when applied to ADMS.

R0:8c6cf8db215cbc24a8aec1e0786e1356-AI and the Future of Skills, Volume 1

AI and the Future of Skills, Volume 1

The OECD launched the Artificial Intelligence and the Future of Skills project to develop a programme that could assess the capabilities of AI and robotics and their impact on education and work. This report represents the first step in developing the methodological approach of the project.

R0:a8fc240769ba4448b373719f7fbe640d-Do Vision Transformers See Like Convolutional Neural Networks?

Do Vision Transformers See Like Convolutional Neural Networks?

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers.

R0:f70f5b9bc071317c0c1c9b1d7f122949-Highly accurate protein structure prediction with AlphaFold

Highly accurate protein structure prediction with AlphaFold

Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

R0:e3d9ea294a21c145042e5f31369de739-CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models. In summary, our work provides the following contributions: (i) an extensive benchmark of 11 popular counterfactual explanation methods, (ii) a benchmarking framework for research on future counterfactual explanation methods, and (iii) a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons of these methods. We have open-sourced CARLA and our experimental results on Github, making them available as competitive baselines. We welcome contributions from other research groups and practitioners.

R0:5e6fade87218b43e4b8d96158080cc85-A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.