Blog
Blog
A big blog for Big Data.
 
node-group-blog
Analytics, Big Data, Hadoop

How to Identify and Resolve Hadoop NodeGroup Performance Problems Part 1.1: Performance on Hardware Clusters with No Virtualization

By | March 1st, 2016 | Analytics, Big Data, Hadoop

In this blog series we’ll discuss the performance of Hadoop NodeGroup for both hardware and virtualized clusters. We, at Altiscale, performed the work we’ll describe as a precursor to our launch of a new initiative that will employ NodeGroup to increase the performance, scalability, and customizability of container-aware Hadoop. This blog, Part 1.1 of the series, introduces and discusses the configuration of rack-aware replica placement with NodeGroup implementation, and evaluates the performance of DFSIO benchmark when NodeGroup is enabled.

Part 1.2 of this blog series will discuss the steps we took to investigate, analyze, and resolve the performance problem we encountered once we began using NodeGroup in production. Please note: these steps can also be used as a general guideline for resolving other types of performance bottlenecks in Hadoop.

Using Rack-Aware Replica Placement with Containerized Systems

At Altiscale, we’re constantly working to improve Hadoop performance; to that end, we’ve employed rack-aware replica placement, a feature that improves data reliability, availability, and network bandwidth utilization. With rack-aware replica placement, each DataNode determines to which rack it belongs at startup and notifies the NameNode of its rack ID upon registration. As a result, NameNode and ResourceManager know to which rack each slave node belongs.

As we’ve previously discussed, Altiscale advocates using containerized systems created and managed via Docker to adequately isolate applications coexisting on platforms such as MapReduce and YARN. Therefore, we needed to ensure that rack-aware replica placement would work efficiently with containers on the same nodes.

Rack-aware replica placement proposes a three-layer network topology containing Datacenter/Rack/Node. For virtualized Hadoop, VMware proposes an additional, “hypervisor” layer. This four-layer network topology contains Datacenter/Rack/NodeGroup/Host.

In order to configure rack awareness on a Hadoop cluster, two modifications are necessary:

1. Create a rack topology script: This can be as easy as a shell script that determines the rack of each node. Hadoop uses this information to place replica blocks on redundant racks.

2. Add the topology script property to core-site.xml:

1
2
net.topology.script.file.name 
/etc/hadoop/conf/rack-topology.sh

However, for VMware’s NodeGroup configuration both HDFS’s block placement and network topology implementation must be modified. Therefore, in addition to the above steps (note that instead of rack-topology script, the new script should be modified to be NodeGroup aware), two additional properties are needed:

1. Add the following to hdfs-site.xml:

1
dfs.block.replicator.classname org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeGroup

2. Add the following to core-site.xml:

a.

1
2
net.topology.impl            
org.apache.hadoop.net.NetworkTopologyWithNodeGroup

b.

1
2
net.topology.script.file.name
/usr/bin/hdp_resolve_nodegrps

Evaluating Hadoop Cluster Performance with NodeGroup

Once we set these parameters, we ran experiments designed to determine how these changes affected the performance of our Hadoop clusters. We hoped to find no effect on our hardware clusters and improved performance on our containerized clusters. However, this was not the case—even for the hardware clusters! As the following plot shows, a 10-iteration run of DFSIO benchmark took much longer when NodeGroup was enabled, rather than disabled:

stat_nodegroup_true_ncjumbo

Fig 1. DFSIO performance with NodeGroup enabled on a non-containerized (hardware) cluster

 

stat_nodegroup_false_ncjumbo

Fig 2. DFSIO performance with NodeGroup disabled on a non-containerized (hardware) cluster

 

The blue lines in the plot indicate that the processes were idle on all of the DataNodes. A comparison of the plots shows that the idle time at the beginning of each of the iterations of DFSIO was much higher when we enabled the NodeGroup. This caused us to look at the output of the MapReduce jobs. Figure 3. shows the difference:

YARN-DFSIO-logs-NodeGroupPerformance-FINAL2

Fig 3. Output of DFSIO MapReduce job with NodeGroup enabled (left) vs. with NodeGroup disabled (right)

As can be clearly seen in Figure 3, once NodeGroup was enabled, 3 minutes and 41 seconds were spent when the map phase started (between “map 0% reduce 0%” and “map 1% reduce 0%”). This was the primary cause of the execution time difference between these two configurations.

With this discovery, we’ve defined the performance problem we encountered once we began using NodeGroup in production. We can see that the problem consisted of a performance degradation on our hardware clusters and virtualized, occurring primarily at the start of each job. In the next part (Part 1.2) of this blog series we’ll discuss how we further analyzed and resolved this problem.