Popular Certifications Big Data

Big Data & Data Engineer

Most popular courses for Cloud Computing

Cloud Computing

Most popular Certifications in SAS

SAS & Analytics

Most popular Certifications in Programming

Certifications in Programming

Popular Certification By Category

Learn Big Data

You can learn Anywhere, with this courses.

Browse All BigData Courses

Learn Cloud Computing

You can learn Anywhere, with this courses.

Browse All Cloud Computing Courses

Learn NoSQL

You can learn Anywhere, with this courses.

Browse All NoSQL Courses

Learn Programming

You can learn Anywhere, with this courses.

Browse All Programming Courses

Learn Data Analytics

You can learn Anywhere, with this courses.

Browse All Data Analytics Courses

Popular Snowflake Courses & Certifications

Snowflake

Popular Technology Mock Interviews

Mock Interviews

Popular NoSQL Courses

NoSQL

Popular Certifications Data Science & Analytics

Data Science & Analytics

Question-58: Select correct statements with regards to Cloudera Private Cloud Base and various components in it.

  1. ZooKeeper is sensitive to disk latency.
  2. The NameNode memory should be increased over time as HDFS has more files and blocks stored.
  3. The block size can also be specified by an HDFS client on a per-file basis.
  4. Erasure Coding (EC) is an alternative to the 3x replication scheme.
  5. Hadoop optimizes performance and redundancy when rack awareness is configured for clusters that span across multiple racks.
  6. When setting up a multi-rack environment, place each master node on a different rack.

Answer:

Exp:

ZooKeeper:  ZooKeeper is sensitive to disk latency. While it only uses a modest amount of resources, having ZooKeeper swap out or wait for a disk operation can result in that ZooKeeper node being considered ‘dead’ by its quorum peers. For this reason, Cloudera recommends against deploying ZooKeeper on worker nodes where loads are unpredictable and are prone to spikes. It is acceptable to deploy Zookeeper on master nodes where load is more uniform and predictable (or on any node where it can have unobstructed access to disk).

HDFS

  • Java Heap Sizes: 
  • NameNode Metadata Locations: When a quorum-based high availability HDFS configuration is used, JournalNodes handle the storage of metadata writes. The NameNode daemons require a local location to store metadata. Cloudera recommends that only a single directory be used if the underlying disks are configured as RAID, or two directories on different disks if the disks are mounted as JBOD.
  • Block Size: HDFS stores files in blocks that are distributed over the cluster. A block is typically stored contiguously on disk to provide high read throughput. The choice of block size influences how long these high throughput reads run for, and over how many nodes a file is distributed. When reading the many blocks of a single file, a too low block size spends more overall time in slow disk seek, and a large block size has reduced parallelism. Data processing that is I/O heavy benefits from larger block sizes, and data processing that is CPU heavy benefits from smaller block sizes. The default provided by Cloudera Manager is 128MB. The block size can also be specified by an HDFS client on a per-file basis.
  • Replication: Bottlenecks can occur on a small number of nodes when only small subsets of files on HDFS are being heavily accessed. Increasing the replication factor of the files so that their blocks are replicated over more nodes can alleviate this. This is done at the expense of storage capacity on the cluster. This can be set on individual files, or recursively on directories with the -R parameter, by using the Hadoop shell command hadoop fs -setrep. By default, the replication factor is 3.
  • Erasure Coding: Erasure Coding (EC) is an alternative to the 3x replication scheme. It is important that edge nodes and client gateways have codec support so that they can do the calculations. Erasure Coding levies additional demands on the number of nodes or racks required to achieve fault tolerance. Erasure Coding will observe rack topology, but the resulting block placement policy (BPP) differs from replication. With EC, the BPP tries to place all blocks as evenly on all racks as possible. Cloudera recommends that racks have a consistent number of nodes. Racks with fewer DataNodes are busier and fill faster than racks with more DataNodes.
  • Rack Awareness: Hadoop optimizes performance and redundancy when rack awareness is configured for clusters that span across multiple racks, and Cloudera recommends doing so. Rack assignments for nodes can be configured within the Cloudera Manager. When setting up a multi-rack environment, place each master node on a different rack. In the event of a rack failure, the cluster continues to operate using the remaining master(s).

Other Popular Courses