Free shipping on orders over $99
Cleaning Data for Effective Data Science

Cleaning Data for Effective Data Science

Doing the Other 80% of the Work with Python, R, and Command-Line Tools

by David Mertz
Paperback
Publication Date: 31/03/2021

Share This Book:

  $87.72
or 4 easy payments of $21.93 with
afterpay

A comprehensive guide for data scientists to master effective data cleaning tools and techniques

Key Features

  • Master data cleaning techniques in a language-agnostic manner
  • Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing
  • Work with detailed, commented, well-tested code samples in Python and R

Book Description

It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results.

The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.

You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration.

Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.

By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.

What you will learn

  • How to think carefully about your data and ask the right questions
  • Identify problem data pertaining to individual data points
  • Detect problem data in the systematic "shape" of the data
  • Remediate data integrity and hygiene problems
  • Prepare data for analytic and machine learning tasks
  • Impute values into missing or unreliable data
  • Generate synthetic features that are more amenable to data science, data analysis, or visualization goals.

Who this book is for

This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing.

Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. A glossary, references, and friendly asides should help bring all readers up to speed.

The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.

ISBN:
9781801071291
9781801071291
Category:
Data capture & analysis
Format:
Paperback
Publication Date:
31-03-2021
Language:
English
Publisher:
Packt Publishing Limited
Country of origin:
United Kingdom
Dimensions (mm):
92.46x74.93mm

This title is in stock with our Australian supplier and should arrive at our Sydney warehouse within 2 - 3 weeks of you placing an order.

Once received into our warehouse we will despatch it to you with a Shipping Notification which includes online tracking.

Please check the estimated delivery times below for your region, for after your order is despatched from our warehouse:

ACT Metro: 2 working days
NSW Metro: 2 working days
NSW Rural: 2-3 working days
NSW Remote: 2-5 working days
NT Metro: 3-6 working days
NT Remote: 4-10 working days
QLD Metro: 2-4 working days
QLD Rural: 2-5 working days
QLD Remote: 2-7 working days
SA Metro: 2-5 working days
SA Rural: 3-6 working days
SA Remote: 3-7 working days
TAS Metro: 3-6 working days
TAS Rural: 3-6 working days
VIC Metro: 2-3 working days
VIC Rural: 2-4 working days
VIC Remote: 2-5 working days
WA Metro: 3-6 working days
WA Rural: 4-8 working days
WA Remote: 4-12 working days

Reviews

Be the first to review Cleaning Data for Effective Data Science.