English

r0:035d2ee6677504e68a7eb8820884a335-Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model’s performance.

Auto Quantum Circuits

«AutoQML, self-assembling circuits, hyper-parameterized Quantum ML platform, using cirq, tensorflow and tfq. Trillions of possible qubit registries, gate combinations and moment sequences, ready to be adapted into your ML flow. Here I demonstrate climatechange, jameswebbspacetelescope and microbiology vision applications… [Thus far, a circuit with 16-Qubits and a gate sequence of [ YY ] – [ XX ] – [CNOT] has performed the best, per my blend of metrics…].

Federated Learning: Issues in Medical Application

In this presentation, the current issues to make federated learning flawlessly useful in the real world will be briefly overviewed. They are related to data/system heterogeneity, client management, traceability, and security. Also, we introduce the modularized federated learning framework, we currently develop, to experiment various techniques and protocols to find solutions for aforementioned issues. The framework will be open to public after development completes.

CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization

CNN Explainer tightly integrates a model overview that summarizes a CNN’s structure, and on-demand, dynamic visual explanation views that help users understand the underlying components of CNNs. Through smooth transitions across levels of abstraction, our tool enables users to inspect the interplay between low-level mathematical operations and high-level model structures.

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications

AI and the Future of Skills, Volume 1

The OECD launched the Artificial Intelligence and the Future of Skills project to develop a programme that could assess the capabilities of AI and robotics and their impact on education and work. This report represents the first step in developing the methodological approach of the project.

From Zero to Research Scientist full resources guide

This guide is designated to anybody with basic programming knowledge or a computer science background interested in becoming a Research Scientist with on Deep Learning and NLP.