Question-41: Select the correct statements with the Cloudera CDP Private Base Network setup?

  1. Cloudera recommends that Hadoop be placed in a separate physical network with its own core switches.
  2. Cloudera Hadoop supports the concept of rack locality and takes advantage of the network topology to minimize network congestion.
  3. Having redundant core switches in a full mesh configuration allow the cluster to continue operating in the event of a core switch failure
  4. Cloudera recommends allowing access to the Cloudera Enterprise cluster through edge nodes only.
  5. If you completely disconnect the cluster from the Internet, you block access for software updates which makes maintenance difficult

Answer: 1,2,3,4,5 

Exp: Cloudera Private Cloud Base Deployment recommended topology is as below. Each host is connected to two top-of-rack (TOR) switches which are in turn connected to a collection of spine switches which are then connected to the enterprise network. This deployment model allows each host to maximum throughput, minimize latency, while encouraging scalability. The specifics of the network topology are described in the subsequent sections. 

Network Specification for Cloudera Private Cloud base as below.

  • Dedicated Network Hardware

Hadoop can consume all available network bandwidth. For this reason, Cloudera recommends that Hadoop be placed in a separate physical network with its own core switches.

  • Switch Per Rack

Hadoop supports the concept of rack locality and takes advantage of the network topology to minimize network congestion. Ideally, nodes in one rack should connect to a single physical switch. Two top- ofrack (ToR) switches can be used for high availability. Each rack switch (ToR switch) uplinks to a core switch with a significantly bigger backplane. Cloudera recommends 10 GbE (or faster) connections between the servers and ToR switches. ToR uplink bandwidth to the core switch (two switches in a HA configuration) will often be oversubscribed.

  • Uplink Oversubscription

How much oversubscription is appropriate depends on the workload. Cloudera’s recommendation is that the ratio between the total access port bandwidth and uplink bandwidth be as close to 1:1 as is possible. This is especially important for heavy ETL workloads, and MapReduce jobs that have a lot of data sent to reducers. Oversubscription ratios up to 4:1 are generally fine for balanced workloads, but network monitoring is needed to ensure uplink bandwidth is not the bottleneck for Hadoop. The following table provides some examples as a point of reference: 

  • Redundant Network Switches

Having redundant core switches in a full mesh configuration allow the cluster to continue operating in the event of a core switch failure. Redundant ToR switches prevent the loss of an entire rack of processing and storage capacity in the event of a ToR switch failure. General cluster availability can still be maintained in the event of the loss of a rack, as long as master nodes are distributed across multiple racks.

  • Accessibility

The accessibility of your Cloudera Enterprise cluster is defined by the network configuration and depends on the security requirements and the workload. Typically, there are edge/client nodes that have direct access to the cluster. Users go through these edge nodes through the client applications to interact with the cluster and the data residing there. These edge nodes could be running a web application for real-time serving workloads, BI tools, or simply the Hadoop command-line client used to submit or interact with HDFS.

Cloudera recommends allowing access to the Cloudera Enterprise cluster through edge nodes only. You can configure this in the security groups for the hosts that you provision. The rest of this document describes the various options in detail.

  • Internet Connectivity

Clusters that do not require heavy data transfer between the internet or services outside of the immediate network and HDFS might need access to services like software repositories for updates or other low-volume outside data sources. Customers who intend to leverage the multi-cloud/hybrid-cloud functionality in CDP, must ensure adequate network bandwidth be present between their data centers and the public cloud vendors’ networks. Details on this topic are out of scope of this document. Engage with your Cloud vendor’s technical sales team and Cloudera Sales engineering team to determine the requirements in such scenarios.

If you completely disconnect the cluster from the Internet, you block access for software updates which makes maintenance difficult. 


Other Popular Courses