• Cloudera Data Platform (CDP) will support Apache NiFi for its Flow Management capabilities.
  • Apache Kafka for its Streams Messaging Capabilities. 
  • Cloudera Runtime: This chosen best components from both Cloudera and Hortonworks world. 
  • All Analytics Experiences and Edge to AI services like Data Hub, DE, DW, OD, ML are implemented using this Cloudera Runtime Components. 

 

SDX (Shared Data Experience)

  • This is a security and governance layer.
  • This will make sure that you can define consistent things like metadata and schemas which is important for Data applications. 
  • It should be consistent across al this scenarios like DW, DE, ML, DO etc.
  • SDX also helps you to migration from On-premises env to public cloud environment.
  • It also make sure you have consistent and central way of managing security like access policy through Apache Ranger. 
  • Make sure you remain compliant with your governance through Apache Atlas. 

Control Plane

There are various services which are being run in a Control Plane. 

  • Management Console:
    • This is a hosted service in a Cloudera Data Platform which gives you ability to create Data Hub Cluster.
  • Workload Manager:
    • This is a hosted service which gives us ability to manage workload.
  • Replication Manager:
    • Start Replication from On-prem to Cloud
  • Data Catalog:
    • Gives ability to browse Data Catalog. 

 

Cloudera Data Hub & Bringing NiFi and Kafka to it

What is a Data Hub in Public Cloud?

Management Console

  • CDP Management Console is a Hosted WebService which is managed and run by Cloudera. You can reach that by just going to cdp.cloudera.com. 
  • Once you have login for that management console, then you can create an environment with Cloud Provider of your choice. That we will use to create that workloads, Virtual Private Clusters. 
  • The key point is we are really separating out Management Console, which is run, managed and hosted by Cloudera to workloads, where you process and store data. 
  • This all will be deployed in your AWS, Azure or GCP account.
  • So there is no data which is flowing through Cloudera Management Console.
  • You dont spin up any Cluster in Cloudera Hosted Environment all of that is fully under your control and your respective cloud account. 
  • All of this clusters are connected to shared data experience. 
  • Many of this clusters use the Object store as the primary store and we will be using Volumes which are attached to virtual machine later. 
  • All the Data Hub virtual cluster you are going to create using Management Console they are going to be based on virtual machines. And key point is, it is easy for you to create these Data hub cluster. And these are going to be fully secure.
  • You dont have to worry about setting up Kerberos environment anymore.
  • You dont have to worry about setting up TLS encryption for your cluster. 
  • All of this is setup by this Data Hub Cluster Service.

NiFi & Kafka use Cases for CDP Data Hub

  • Kafka
    • This is a de-facto framewThiork for distributed messaging system.
    • This is used in different environment like On-Prem and Cloud envirnment.
  • NiFi
    • NiFi is used for moving large amount of data from one place to another place by doing some routing and filtering of the data.
  • By adding NiFi and Kafka to CDP Data Hub, we can enable few more use cases
  • Kafka Use Cases 
    • Set up Kafka in Data Hub as DR cluster for On-prem Kafka cluster.
    • Migrating existing on-prem  Kafka Clusters to the Cloud. If you are not looking for Hybrid setup.
    • Levelrage Cloudera Streams Replication on on-prem cluster.
  • NiFi Use cases
    • Set up DataFlows that span on-prem ad public cloud environments.
    • Merge on-prem sources with cloud sources.
    • Migrate on-prem data sets and make them available for cloud applications. 

Other Popular Courses