Question-51: In CDP Private Cloud Base cluster, which of the following statements are wrong with respect to CPU core and multithreading.

  1. CPU clock speed does not matter, since performance is a function of Drive or network.
  2. CPU clock speed does matter, and you should try to purchase the fastest CPUs available.
  3. Computationally intensive Spark jobs benefit more from faster CPUs than I/O bound MapReduce applications.
  4. Within a given MapReduce job, a single task typically uses one thread at a time.

Answer: 

Exp:

Cluster Bottleneck –Other than cost, there is no negative for buying more and better CPUs. However, the ROI on additional CPU power must be evaluated carefully. – In general, CPU resources (and lack thereof) do not bottleneck MapReduce and HBase. The bottleneck almost always is drive and/or network performance. There are certainly exceptions to this, such as inefficient Hive queries. Other compute frameworks like Impala, Spark, and Cloudera Search may be CPU-bound depending on the workload.

Additional Cores/Threads –Within a given MapReduce job, a single task typically uses one thread at a time. With Spark this is different, as a single task might use multiple threads in parallel. As outlined earlier, the number of slots allocated per node may be a function of the number of drives in the node. As long as there is no huge disparity in the number of cores (threads) and the number of drives, there is no need for additional cores. In addition, a MapReduce task is going to be I/O bound for typical jobs. Thus, a given thread used by the task will have a large amount of idle time while waiting for an I/O response.

Clock Speed – Because Cloudera clusters often begin with a small number of use cases and associated workloads and grow over time, it makes sense to purchase the fastest CPUs available. Actual CPU usage is use case and workload dependent. For instance, computationally intensive Spark jobs benefit more from faster CPUs than I/O bound MapReduce applications.


Other Popular Courses