Mastering Parallel Programming with R
上QQ阅读APP看书,第一时间看更新

What this book covers

Chapter 1, Simple Parallelism with R, starts our journey by quickly showing you how to exploit the multicore processing capability of your own laptop using core R's parallelized versions of lapply(). We also briefly reach out and touch the immense computing capacity of the cloud through Amazon Web Services.

Chapter 2, Introduction to Message Passing, covers the standard Message Passing Interface (MPI), which is a key technology that implements advanced parallel algorithms. In this chapter, you will learn how to use two different R MPI packages, Rmpi and pbdMPI, together with the OpenMPI implementation of the underlying communications subsystem.

Chapter 3, Advanced Message Passing, will complete our tour of MPI by developing a detailed Rmpi worked example, illustrating the use of nonblocking communications and localized patterns of interprocess message exchange, which is required to implement spatial Grid parallelism.

Chapter 4, Developing SPRINT, an MPI-based R Package for Supercomputers, introduces you to the experience of running parallel code on a real supercomputer. This chapter also provides a detailed exposition of developing SPRINT, an R package written in C for parallel computation that can run on laptops, as well as supercomputers. We'll also show you how you can extend this package with your own natively-coded high performance parallel algorithms and make them accessible to R.

Chapter 5, The Supercomputer in Your Laptop, will show how to unlock the massive parallel and vector processing capability of the Graphics Processing Unit (GPU) inside your very own laptop direct from R using the ROpenCL package, an R wrapper for the Open Computing Language (OpenCL).

Chapter 6, The Art of Parallel Programming, concludes this book by providing the basic science behind parallel programming and its performance, the art of best practice by highlighting a number of potential pitfalls you'll want to avoid, and taking a glimpse into the future of parallel computing systems.

Online Chapter, Apache Spa-R-k, is an introduction to Apache Spark, which now succeeds Hadoop as the most popular distributed memory big data parallel computing environment. You will learn how to setup and install a Spark cluster and how to utilize Spark's own DataFrame abstraction direct from R. This chapter can be downloaded from Packt's website at https://www.packtpub.com/sites/default/files/downloads/B03974_BonusChapter.pdf

You don't need to read this book in order from beginning to end, although you will find this easiest with respect to the introduction of concepts, and the increasing technical depth of programming knowledge applied. For the most part, each chapter has been written to be understandable when read on it's own.