Data Science at the Command Line

27 de March de 2020 27 de March de 2020

Today, data scientists can choose from an overwhelming collection of exciting technologies and programming languages. Python, R, Hadoop, Julia, Pig, Hive, and Spark are but a few examples. You may already have experience in one or more of these. If so, then why should you still care about the command line for doing data science? What does the command line have to offer that these other technologies and programming languages do not?

Pages: 1 2 3

R Packages

25 de March de 2020 25 de March de 2020

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this book you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better. This is where we are developing the 2nd edition of this book.

Pages: 1 2 3

R Programming Succinctly

23 de March de 2020 23 de March de 2020

The R programming language on its own is a powerful tool that can perform thousands of statistical tasks, but by writing programs in R, you gain tremendous power and flexibility to extend its base functionality. Senior Succinctly series author and editor James McCaffrey shows you how in R Programming Succinctly.

Pages: 1 2 3

Efficient R programming

11 de March de 2020 11 de March de 2020

There are many excellent R resources for visualization, data science, and package development. Hundreds of scattered vignettes, web pages, and forums explain how to use R in particular domains. But little has been written on how to simply make R work effectively-until now.

SQL Notes for Professionals book

10 de March de 2020 10 de March de 2020

This SQL Notes for Professionals book is compiled from Stack Overflow Documentation. (166 pages, published on May 2018)

Pages: 1 2 3

TPOT is a Python Automated Machine Learning tool

4 de March de 2020 4 de March de 2020

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning (AutoML) tool that optimizes machine learning pipelines using genetic programming.

Python Data Science Handbook (Essential Tools for Working with Data)

20 de February de 2020 20 de February de 2020

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.

R for Data Science

17 de February de 2020 17 de February de 2020

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science.

R Notes for Professionals book

17 de February de 2020 17 de February de 2020

This R Notes for Professionals book is compiled from Stack Overflow Documentation. (475 pages, published on May 2018)

Pages: 1 2 3

Select Star SQL

16 de February de 2020 16 de February de 2020

This is an interactive book which aims to be the best place on the internet for learning SQL. It is free of charge, free of ads and doesn’t require registration or downloads. It helps you learn by running queries against a real-world dataset to complete projects of consequence. It is not a mere reference page — it conveys a mental model for writing SQL.

Pages: 1 2

Data Mining (Article)