Question-50: Select the correct statements with regards to various bottlenecks in CDP Private Cloud Cluster Base with regards to CPU, Disk and Network.

  1. In general, CPU resources are not a bottleneck for MapReduce and HBase.
  2. In most of the cases performance bottleneck is Drive and Network.
  3. In most of the cases performance bottleneck is CPU.
  4. In efficient Hive queries can cause performance bottleneck.

Answer:

Exp: Other than cost, there is no negative for buying more and better CPUs. However, the ROI on additional CPU power must be evaluated carefully. – In general, CPU resources (and lack thereof) do not bottleneck MapReduce and HBase. The bottleneck almost always is drive and/or network performance. There are certainly exceptions to this, such as inefficient Hive queries. Other compute frameworks like Impala, Spark, and Cloudera Search may be CPU-bound depending on the workload.

Within a given MapReduce job, a single task typically uses one thread at a time. With Spark this is different, as a single task might use multiple threads in parallel. As outlined earlier, the number of slots allocated per node may be a function of the number of drives in the node. As long as there is no huge disparity in the number of cores (threads) and the number of drives, there is no need for additional cores. In addition, a MapReduce task is going to be I/O bound for typical jobs. Thus, a given thread used by the task will have a large amount of idle time while waiting for an I/O response. 


Other Popular Courses