Microsoft NLP Best Practices

This repository contains examples and best practices for building NLP systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language


Trainings for Cybersecurity Specialists

“ENISA CSIRT training material was introduced in 2008. In 2012, 2013 and 2014 it was complemented with new exercise scenarios containing essential material for success in the CSIRT community and in the field of information security. In these pages you will find the ENISA CSIRT training material, containing Handbooks for teachers, Toolsets for students and Virtual Images to support hands on training sessions. ” The materials continue to be updated in 2020 and are appropriate for use by cybersecurity specialists and decision-makers.


Best Practices in Dataviz: An R Perspective

By the end of this you will have had a whirlwind tour of the very tip of the data visualization best-practices iceberg. We will go over a broad range of topics generally applicable to data science usecases but not dive too deep into any single one. One thing to keep in mind the whole time is none of this is absolutely set in stone, most often in the real world you have to bend or break some of these rules to do what you want.


Machine Learning From Scratch

An extensive list of fundamental machine learning models and algorithms from scratch in vanilla Python.


Google Engineering Practices Documentation

Google has many generalized engineering practices that cover all languages and all projects. These documents represent their collective experience of various best practices that they have developed over time. It is possible that open source projects or other organizations would benefit from this knowledge.


Altair: Declarative Visualization in Python

Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite JSON specification. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code. Altair is developed by Jake Vanderplas and Brian Granger in close collaboration with the UW Interactive Data Lab.


AutoML: a introduction tutorial about H2O Driverless AI

“H2O has been the driver for building models at scale. We are talking about billions of claims. You can’t do this with standard off the shelf open source techniques”. (H2o.ai).



Scikit-Learn Tutorial (Materials for my scikit-learn tutorial)
By Jake VanderPlas who is the director of Open Software at the University of Washington’s eScience institute, and researches and teaches in a variety of areas, including Astronomy, Astrostatistics, Machine Learning, and Scalable Computation.