Question-43: As you are aware while running MapReduce jobs data send to reducers, which of the following needs to be adjusted according to volume of data send to Reducers.

  1. Disk space under /lib directory
  2. Disk space under /usr directory
  3. Raw Disk space for temporary storage
  4. Raw Disk space for YARN mandatory storage

Answer: 

Exp: Each worker node typically has several physical disks dedicated to raw storage for Hadoop. This number is used to calculate the total available storage for each cluster. Also, the calculations listed below assume 10% disk space allocated for YARN temporary storage. Cloudera recommends allocating 10-25% of the raw disk space for temporary storage as a general guideline. This can be changed within Cloudera Manager and should be adjusted after analyzing production workloads. For example, MapReduce jobs that send less data to reducers allow for adjusting this number percentage down considerably. Also Compressing raw data can effectively increase HDFS storage capacity. While Cloudera Manager provides tools such as Static Resource Pools, which utilize Linux Cgroups, to allow multiple components to share hardware, in high volume production clusters it can be beneficial to allocate dedicated hosts for roles such as Solr, HBase, and Kafka. 


Other Popular Courses