更新时间:2021-08-06 19:53:18
coverpage
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files eBooks discount offers and more
Preface
What this book covers
What you need for this book?
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Hadoop 2.X
The inception of Hadoop
The evolution of Hadoop
Hadoop 2.X
Hadoop distributions
Summary
Chapter 2. Advanced MapReduce
MapReduce input
The RecordReader class
Hadoop's "small files" problem
Filtering inputs
The Map task
The Reduce task
MapReduce output
MapReduce job counters
Handling data joins
Chapter 3. Advanced Pig
Pig versus SQL
Different modes of execution
Complex data types in Pig
Compiling Pig scripts
Development and debugging aids
The advanced Pig operators
User-defined functions
Pig performance optimizations
Best practices
Chapter 4. Advanced Hive
The Hive architecture
Data types
File formats
The data model
Hive query optimizers
Advanced DML
UDF UDAF and UDTF
Chapter 5. Serialization and Hadoop I/O
Data serialization in Hadoop
Avro serialization
Compression
Chapter 6. YARN – Bringing Other Paradigms to Hadoop
The YARN architecture
Developing YARN applications
Monitoring YARN
Job scheduling in YARN
YARN commands
Chapter 7. Storm on YARN – Low Latency Processing in Hadoop
Batch processing versus streaming
Apache Storm
Storm on YARN
Chapter 8. Hadoop on the Cloud
Cloud computing characteristics
Hadoop on the cloud
Amazon Elastic MapReduce (EMR)
Chapter 9. HDFS Replacements
HDFS – advantages and drawbacks
Amazon AWS S3
Implementing a filesystem in Hadoop
Implementing an S3 native filesystem in Hadoop
Chapter 10. HDFS Federation
Limitations of the older HDFS architecture
Architecture of HDFS Federation
HDFS high availability
HDFS block placement
Chapter 11. Hadoop Security
The security pillars
Authentication in Hadoop
Authorization in Hadoop
Data confidentiality in Hadoop
Audit logging in Hadoop
Chapter 12. Analytics Using Hadoop
Data analytics workflow
Machine learning
Apache Mahout
Document analysis using Hadoop and Mahout
RHadoop