| Preface | p. ix |
| Introduction | p. 1 |
| HDFS | p. 7 |
| Goals and Motivation | p. 7 |
| Design | p. 8 |
| Daemons | p. 9 |
| Reading and Writing Data | p. 11 |
| The Read Path | p. 12 |
| The Write Path | p. 13 |
| Managing Filesystem Metadata | p. 14 |
| Namenode High Availability | p. 16 |
| Namenode Federation | p. 18 |
| Access and Integration | p. 20 |
| Command-Line Tools | p. 20 |
| FUSE | p. 23 |
| REST Support | p. 23 |
| MapReduce | p. 25 |
| The Stages of MapReduce | p. 26 |
| Introducing Hadoop MapReduce | p. 33 |
| Daemons | p. 34 |
| When It All Goes Wrong | p. 36 |
| YARN | p. 37 |
| Planning a Hadoop Cluster | p. 41 |
| Picking a Distribution and Version of Hadoop | p. 41 |
| Apache Hadoop | p. 41 |
| Cloudera's Distribution Including Apache Hadoop | p. 42 |
| Versions and Features | p. 42 |
| What Should I Use? | p. 44 |
| Hardware Selection | p. 45 |
| Master Hardware Selection | p. 46 |
| Worker Hardware Selection | p. 48 |
| Cluster Sizing | p. 50 |
| Blades, SANs, and Virtualization | p. 52 |
| Operating System Selection and Preparation | p. 54 |
| Deployment Layout | p. 54 |
| Software | p. 56 |
| Hostnames, DNS, and Identification | p. 57 |
| Users, Groups, and Privileges | p. 60 |
| Kernel Tuning | p. 62 |
| vm.swappiness | p. 62 |
| vm.overcommit_memory | p. 62 |
| Disk Configuration | p. 63 |
| Choosing a Filesystem | p. 64 |
| Mount Options | p. 66 |
| Network Design | p. 66 |
| Network Usage in Hadoop: A Review | p. 67 |
| 1 Gb versus 10 Gb Networks | p. 69 |
| Typical Network Topologies | p. 69 |
| Installation and Configuration | p. 75 |
| Installing Hadoop | p. 75 |
| Apache Hadoop | p. 76 |
| CDH | p. 80 |
| Configuration: An Overview | p. 84 |
| The Hadoop XML Configuration Files | p. 87 |
| Environment Variables and Shell Scripts | p. 88 |
| Logging Configuration | p. 90 |
| HDFS | p. 93 |
| Identification and Location | p. 93 |
| Optimization and Tuning | p. 95 |
| Formatting the Namenode | p. 99 |
| Creating a /tmp Directory | p. 100 |
| Namenode High Availability | p. 100 |
| Fencing Options | p. 102 |
| Basic Configuration | p. 104 |
| Automatic Failover Configuration | p. 105 |
| Format and Bootstrap the Namenodes | p. 108 |
| Namenode Federation | p. 113 |
| MapReduce | p. 120 |
| Identification and Location | p. 120 |
| Optimization and Tuning | p. 122 |
| Rack Topology | p. 130 |
| Security | p. 133 |
| Identity, Authentication, and Authorization | p. 135 |
| Identity | p. 137 |
| Kerberos and Hadoop | p. 137 |
| Kerberos: A Refresher | p. 138 |
| Kerberos Support in Hadoop | p. 140 |
| Authorization | p. 153 |
| HDFS | p. 153 |
| MapReduce | p. 155 |
| Other Tools and Systems | p. 159 |
| Tying It Together | p. 164 |
| Resource Management | p. 167 |
| What Is Resource Management? | p. 167 |
| HDFS Quotas | p. 168 |
| MapReduce Schedulers | p. 170 |
| The FIFO Scheduler | p. 171 |
| The Fair Scheduler | p. 173 |
| The Capacity Scheduler | p. 185 |
| The Future | p. 193 |
| Cluster Maintenance | p. 195 |
| Managing Hadoop Processes | p. 195 |
| Starting and Stopping Processes with Init Scripts | p. 195 |
| Starting and Stopping Processes Manually | p. 196 |
| HDFS Maintenance Tasks | p. 196 |
| Adding a Datanode | p. 196 |
| Decommissioning a Datanode | p. 197 |
| Checking Filesystem Integrity with fsck | p. 198 |
| Balancing HDFS Block Data | p. 202 |
| Dealing with a Failed Disk | p. 204 |
| MapReduce Maintenance Tasks | p. 205 |
| Adding a Tasktracker | p. 205 |
| Decommissioning a Tasktracker | p. 206 |
| Killing a MapReduce Job | p. 206 |
| Killing a MapReduce Task | p. 207 |
| Dealing with a Blacklisted Tasktracker | p. 207 |
| Troubleshooting | p. 209 |
| Differential Diagnosis Applied to Systems | p. 209 |
| Common Failures and Problems | p. 211 |
| Humans (You) | p. 211 |
| Misconfiguration | p. 212 |
| Hardware Failure | p. 213 |
| Resource Exhaustion | p. 213 |
| Host Identification and Naming | p. 214 |
| Network Partitions | p. 214 |
| "Is the Computer Plugged In?" | p. 215 |
| E-SPORE | p. 215 |
| Treatment and Care | p. 217 |
| War Stories | p. 220 |
| A Mystery Bottleneck | p. 221 |
| There's No Place Like 127.0.0.1 | p. 224 |
| Monitoring | p. 229 |
| An Overview | p. 229 |
| Hadoop Metrics | p. 230 |
| Apache Hadoop 0.20.0 and CDH3 (metrics1) | p. 231 |
| Apache Hadoop 0.20.203 and Later, and CDH4 (metrics 2) | p. 237 |
| What about SNMP? | p. 239 |
| Health Monitoring | p. 239 |
| Host-Level Checks | p. 240 |
| All Hadoop Processes | p. 242 |
| HDFS Checks | p. 244 |
| MapReduce Checks | p. 246 |
| Backup and Recovery | p. 249 |
| Data Backup | p. 249 |
| Distributed Copy (distcp) | p. 250 |
| Parallel Data Ingestion | p. 252 |
| Namenode Metadata | p. 254 |
| Appendix: Deprecated Configuration Properties | p. 257 |
| Index | p. 267 |
| Table of Contents provided by Ingram. All Rights Reserved. |