In this blog series we’ll discuss the performance of Hadoop NodeGroup for both hardware and virtualized clusters. We, at Altiscale, performed the work we’ll describe as a precursor to our launch of a new initiative that will employ NodeGroup to increase the performance, scalability, and customizability of container-aware Hadoop. This blog, Part 1.1 of the series, introduces and discusses the configuration of rack-aware replica placement with NodeGroup implementation, and evaluates the performance of DFSIO benchmark when NodeGroup is enabled.
Part 1.2 of this blog series will discuss the steps we took to investigate, analyze, and resolve the performance problem we encountered once we began using NodeGroup in production. Please note: these steps can also be used as a general guideline for resolving other types of performance bottlenecks in Hadoop.
Using Rack-Aware Replica Placement with Containerized Systems
At Altiscale, we’re constantly working to improve Hadoop performance; to that end, we’ve employed rack-aware replica placement, a feature that improves data reliability, availability, and network bandwidth utilization. With rack-aware replica placement, each DataNode determines to which rack it belongs at startup and notifies the NameNode of its rack ID upon registration. As a result, NameNode and ResourceManager know to which rack each slave node belongs.
As we’ve previously discussed, Altiscale advocates using containerized systems created and managed via Docker to adequately isolate applications coexisting on platforms such as MapReduce and YARN. Therefore, we needed to ensure that rack-aware replica placement would work efficiently with containers on the same nodes.
Rack-aware replica placement proposes a three-layer network topology containing Datacenter/Rack/Node. For virtualized Hadoop, VMware proposes an additional, “hypervisor” layer. This four-layer network topology contains Datacenter/Rack/NodeGroup/Host.
In order to configure rack awareness on a Hadoop cluster, two modifications are necessary:
1. Create a rack topology script: This can be as easy as a shell script that determines the rack of each node. Hadoop uses this information to place replica blocks on redundant racks.
2. Add the topology script property to core-site.xml:
However, for VMware’s NodeGroup configuration both HDFS’s block placement and network topology implementation must be modified. Therefore, in addition to the above steps (note that instead of rack-topology script, the new script should be modified to be NodeGroup aware), two additional properties are needed:
1. Add the following to hdfs-site.xml:
2. Add the following to core-site.xml:
Evaluating Hadoop Cluster Performance with NodeGroup
Once we set these parameters, we ran experiments designed to determine how these changes affected the performance of our Hadoop clusters. We hoped to find no effect on our hardware clusters and improved performance on our containerized clusters. However, this was not the case—even for the hardware clusters! As the following plot shows, a 10-iteration run of DFSIO benchmark took much longer when NodeGroup was enabled, rather than disabled:
The blue lines in the plot indicate that the processes were idle on all of the DataNodes. A comparison of the plots shows that the idle time at the beginning of each of the iterations of DFSIO was much higher when we enabled the NodeGroup. This caused us to look at the output of the MapReduce jobs. Figure 3. shows the difference:
As can be clearly seen in Figure 3, once NodeGroup was enabled, 3 minutes and 41 seconds were spent when the map phase started (between “map 0% reduce 0%” and “map 1% reduce 0%”). This was the primary cause of the execution time difference between these two configurations.
With this discovery, we’ve defined the performance problem we encountered once we began using NodeGroup in production. We can see that the problem consisted of a performance degradation on our hardware clusters and virtualized, occurring primarily at the start of each job. In the next part (Part 1.2) of this blog series we’ll discuss how we further analyzed and resolved this problem.