SELECT PROFILE down
Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network
Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model’s performance.
Auto Quantum Circuits
«AutoQML, self-assembling circuits, hyper-parameterized Quantum ML platform, using cirq, tensorflow and tfq. Trillions of possible qubit registries, gate combinations and moment sequences, ready to be adapted into your ML flow. Here I demonstrate climatechange, jameswebbspacetelescope and microbiology vision applications… [Thus far, a circuit with 16-Qubits and a gate sequence of [ YY ] – [ XX ] – [CNOT] has performed the best, per my blend of metrics…].
Federated Learning: Issues in Medical Application
In this presentation, the current issues to make federated learning flawlessly useful in the real world will be briefly overviewed. They are related to data/system heterogeneity, client management, traceability, and security. Also, we introduce the modularized federated learning framework, we currently develop, to experiment various techniques and protocols to find solutions for aforementioned issues. The framework will be open to public after development completes.
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications
Scientific Visualization: Python + Matplotlib
The Python scientific visualisation landscape is huge. It is composed of a myriad of tools, ranging from the most versatile and widely used down to the more specialised and confidential. Some of these tools are community based while others are developed by companies. Some are made specifically for the web, others are for the desktop only, some deal with 3D and large data, while others target flawless 2D rendering.
Ethics-based auditing of automated decision-making systems: intervention points and policy implications
Organisations increasingly use automated decision-making systems (ADMS) to inform decisions that affect humans and their environment. While the use of ADMS can improve the accuracy and efficiency of decision-making processes, it is also coupled with ethical challenges. Unfortunately, the governance mechanisms currently used to oversee human decision-making often fail when applied to ADMS.
Do Vision Transformers See Like Convolutional Neural Networks?
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers.
The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming
We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.
Highly accurate protein structure prediction with AlphaFold
Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
Model-based Decision Making with Imagination for Autonomous Parking
Autonomous parking technology is a key concept within autonomous driving research. This paper will propose an imaginative autonomous parking algorithm to solve issues concerned with parking.
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models. In summary, our work provides the following contributions: (i) an extensive benchmark of 11 popular counterfactual explanation methods, (ii) a benchmarking framework for research on future counterfactual explanation methods, and (iii) a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons of these methods. We have open-sourced CARLA and our experimental results on Github, making them available as competitive baselines. We welcome contributions from other research groups and practitioners.
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.
How to avoid machine learning pitfalls: a guide for academic researchers
This document gives a concise outline of some of the common mistakes that occur when using machine learning techniques, and what can be done to avoid them. It is intended primarily as a guide for research students, and focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results
YOLOX: Exceeding YOLO Series in 2021
We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP.
Framework based on parameterized images on ResNet to identify intrusions in smartwatches or other related devices
The continuous appearance and improvement of mobile devices in the form of smartwatches, smartphones and other similar devices has led to a growing and unfair interest in putting their users under the magnifying glass and control of applications.
YOLOP: You Only Look Once for Panoptic Driving Perception
A panoptic driving perception system is an essential part of autonomous driving. A high-precision and real-time perception system can assist the vehicle in making the reasonable decision while driving. We present a panoptic driving perception network (YOLOP) to perform traffic object detection, drivable area segmentation and lane detection simultaneously. It is composed of one encoder for feature extraction and three decoders to handle the specific tasks. Our model performs extremely well on the challenging BDD100K dataset, achieving state-of-the-art on all three tasks in terms of accuracy and speed. Besides, we verify the effectiveness of our multi-task learning model for joint training via ablative studies.
labml.ai Deep Learning Paper Implementations
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better.
Data as the main focus of “State of the art of data science in Spanish language and its application in the field of Artificial Intelligence”
According to the results, there is an evidence of cultural bias for data science in Spanish language. The outcome of the consultation, which carried out on 12 April 2021, confirms that only 10 out of 23.771 datasets “speaks” Spanish.”
‘Framework’ basado en imágenes parametrizadas sobre ResNet para identificar intrusiones en ‘smartwatches’ u otros dispositivos afines
La continua aparición y mejora de dispositivos móviles en forma de ‘smartwatches’, ‘smartphones’ y otros dispositivos similares ha propicio un creciente y desleal interés en poner bajo la lupa y el control de los aplicativos a sus usuarios. De forma ofuscada por los fabricantes.
Los datos como eje principal en el “Estado del arte de la ciencia de datos en el idioma español y su aplicación en el campo de la Inteligencia Artificial”
Los resultados de este estudio son una evidencia del sesgo cultural que existe entre la lengua inglesa y la española en la ciencia de datos. De los 23.771 conjuntos de datos que se encontraron con fecha de consulta 12/04/2021, tan solo 10 se encontraban en castellano
EvalML: a library for automated machine learning and model understanding
EvalML is an AutoML library that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions, it is a library for automated machine learning (AutoML) and model understanding, written in Python
State of the art of data science in Spanish language and its application in the field of AI
The study of art provides results that indicate the absence of involvement of Spanish language with AI and all the subareas, which consequently adversely affect to the education of future professionals.
Partial Differential Equations is All You Need for Generating Neural Architectures — A Theory for Physical Artificial Intelligence Systems
In this work, we generalize the reaction-diffusion equation in statistical physics, Schrödinger equation in quantum mechanics, Helmholtz equation in paraxial optics into the neural partial differential equations (NPDE), which can be considered as the fundamental equations in the field of artificial intelligence research
Federated Quantum Machine Learning
We present the federated training on hybrid quantum-classical machine learning models although our framework could be generalized to pure quantum machine learning model. Specifically, we consider the quantum neural network (QNN) coupled with classical pre-trained convolutional model.
El estado del arte de la ciencia de datos en el idioma español y su aplicación en el campo de la Inteligencia Artificial
El estudio arroja resultados que indican la falta de involucración del Español con la IA así como de todas las subáreas, afectando negativamente a la formación de futuros profesionales.
Something went wrong. Please refresh the page and/or try again.