Massively Parallel Databases and MapReduce Systems addresses the design principles and core features of systems for analyzing very large datasets using massively-parallel computation and storage techniques on large clusters of nodes. It first discusses how the requirements of data analytics have evolved since the early work on parallel database systems. It then describes some of the major technological innovations that have each spawned a distinct category of systems for data analytics.
Each unique system category is described along a number of dimensions including data model and query interface, storage layer, execution engine, query optimization, scheduling, resource management, and fault tolerance. It concludes with a summary of present trends in large-scale data analytics.
This is an ideal reference for anyone with a research or professional interest in large-scale data analytics.
Share This Book: