Introduction to CDP: Cloudera Data Platform

Important: CDP-DC was renamed to CDP Private Cloud Base.

  • Cloudera Data Platform: There are basically two version of CDP (Cloudera Data Platform)
    • Cloudera Data Platform (Public)
    • Cloudera Data Platform (Private)

CDP Private Cloud

  • CDP Private Cloud is for both
    • Analytics
    • Data Management
  • CDP Private Cloud is deployed in private Data Centres.
  • There is integration of below two
    • CDP Private Cloud Base.
    • CDP Private Cloud Data Services
  • This combined solution can be used for Data Analytics and Artificial intelligence.

CDP Private Cloud Base

  • CDP Private Cloud Base is the on-premise version of Cloudera Data Platform.
  • CDP Private Cloud Base was previously known as CDP Data Center.
  • You can run different kind of custom workloads.
  • CDP Private Cloud Base supports a variety of Hybrid solutions
  • Compute task are separated from data storage and data can be accessed from remote clusters.
  • CDP Private Cloud Base is comprised of a variety of components such as
    • Bare Metals
    • Cluster Management
    • Apache HDFS
    • Apache Ozone Object Storage
    • SDX (Security, Governance and MetatData)
    • Hive3
    • HBase
  • Custom Cluster: On CDP Private Cloud Base you can select any combination of these available services to create clusters that address your business requirements and workloads.
  • Preconfigured Services: In fact, you can have various pre-configured packages of services are also available for common workloads.
    • Cloudera Data Engineering (CDE)
      • With this cluster you can do Data processing, Developing and serving predictive data models. This includes services HDFS, Ranger, Atlas, Hive and Hue.
      • This is an all-inclusive data engineering toolset to orchestrate and automate complex data pipeline securely at any scale.
    • Data Mart
      • Using this you can browse, query and explore your data in interactive way. This includes services like HDFS, Ranger, Atlas, Hive and Hue.
    • Operational Database
      • This includes services like HDFS, Ranger, Atlas and HBase.
    • Cloudera Machine Learning (CML)
      • This service optimizes ML workflows for deploying, service and monitoring models. 
    • Cloudera DataFlow (CDF)
      • Provides real-time streaming data analysis at high volume and high scale. 
    • Cloudera Data Warehouse (CDW)
      • Delivers self-service analytics on massive amounts of data to thousands of users without compromising cost, speed and security. 

Cloudera Private cloud Plus Edition

  • Private Cloud Plus edition includes Base Edition, as well as easy-to-use containerized machine learning and data warehousing analytics. And a hybrid management control plane for a better user experience and lower data center costs. 

Advantages of CDP Architecture

  • The advantage of CDP architecture is that it is modular, and the same constructs can be transposed from a private data center to the public cloud and everything in between for a seamless hybrid experience. 
  • CDP Private Cloud is designed to take advantage of today's hybrid environments and allow organizations to effectively utilize their existing on-premises infrastructure while effortlessly bursting into public cloud when required. 

Components of Cloudera Data Platform Private Cloud:

There are mainly two major categories of the components

  1. Tools: CDP Private Cloud Base also includes the following tools to manage and secure your deployment.
    • Cloudera Manager:
      • Cloudera Manager is a Web Application.
      • Using Cloudera Manager, you can
        • Monitor
        • Manage
        • Configure your clusters and services
      • Cloudera Manager also provides API, to programmatically perform above activities.
      • You can manage one or more cluster using Cloudera Manager.
      • Other things which you can do
        • Manage installations
        • Cluster components upgrades.
        • Maintenance Workflows
        • Encryption
        • Access Controls
        • Data Replications
      • Virtual Privat Cluster: You can also use Cloudera Manager to create a Virtual Private Cluster that allows you to separate compute resources from data storage and to share data storage among compute resources.
    • Apache Atlas:
      • This component is helpful for finding the entire data lineage.
      • This is used for Data Governance.
      • Apache Atlas works as a common metadata store which is designed to exchange metadata both inside and outside of the Hadoop Stack.
      • Apache Ranger and Apache Atlas are closely integrated which enables you to define, administer and manage security and compliance policies consistently across all components of the Hadoop stack.
    • Apache Ranger:
      • This is used to manage Access Control using user interface and also policy administrations.
      • Ranger provides the Auditing, Authentication and Authorization functionality for CDP Private Cloud Base Clusters.
      • Ranger has Centralized Reporting Capability.
      • You or Security Administrator can define security polciies at the database, table, column and file levels, and can administer permissions for specific LDAP-based groups or individual users.
      • Time based rules are supported.
      • Geolocation based policy rules are supported.
  1. Cloudera Runtime:
    • Cloudera Runtime includes 50+ open source projets.
    • These are the components which are used for various activities. These are combination of services from Hortonworks and Cloudera itself.
  1. Hive3, Impala, Hue, DAS (Data Analytics Studio)
  2. Spark3, Zeppelin
  3. HBase, Phoenix
  4. Kafka
  5. Knox, Ranger, RMS, Atlas
  6. Apache Airflow
  7. Ozone
  8. Encryption: Ranger KMS, KTS etc.
  1. Additional Components separate installations using Parcels
    1. NiFi (Cloudera Data Flow) Or Cloudera Flow Management
    2. CSA (Cloudera Stream Analytics using Flink and SQL Stream Builder)
    3. CDSW (Cloudera Data Science Workbench): For Data Science and Machine Learning Load.
    4. Data Visualization (DataViz): To create Reports and Dashboards.

CDP Private Cloud Data Services

  • CDP Private Cloud Data Services is a CDP product which brings many of the public cloud benefits to the Data Center.
  • As you know CDP Private Cloud separate out Compute and Storage Load.
  • This Data Services Capability provides, containerized compute analytic applications that scale dynamically and be upgraded independently.
  • CDP Private Cloud Data Services help users to rapidly provision and deploy Data Services like
    • Cloudera Data Warehouse
    • Cloudera Machine Learning
    • Cloudera Data Engineering
  • Simply, using management console.
  • Important: A CDP Private Cloud Data Services Deployment requires you to have a Private Cloud Base Cluster and Container Based Clusters to run the Data Services.
  • Container Based Cluster: For this you can have either of the below
    • Red Hat OpenShift Cluster
    • Embedded Container Services (ECS) for the containers.

CDP Cluster

  • A CDP Cluster is a distributed computing service that has access to shared data lake and runs on either 
    • Virtual Machines: Cloudera Data Hub
    • Containers: Cloudera Analytics Experience. 
  • Below image helps you to understand what is Cloudera Analytic Experiences and Private Cloud Base.
  • CDP Private Cloud Offers data warehouse (dW), Machine Learning(ML), Data Engineering (DE), Data Flow (DF), Operation Database (OD) as analytics experiences.
  • Data Hub is part of Private Cloud Base.
  • Both Data Experiences and Data Hub is helpful in building custom business applications. 

SDX: Shared Data Experience

  • SDX is a data access control layer that sits on top of the backend object store.
  • This provides coherent data security and governance for all the application running within the environment. 
  • SDX is used for safeguard data privacy, ensure regulatory compliance, and prevent cybersecurity threats.

 

References:

https://www.adaltas.com/en/2021/07/19/cloudera-data-platform-overview/quicktechie.com

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/cdip_ucs_m6_cloudera.html/quicktechie.com

https://community.ibm.com/community/user/hybriddatamanagement/blogs/priya-tiruthani1/2020/10/01/what-is-cdppc#:~:text=CDP/quicktechie.com

https://blog.cloudera.com/a-reference-architecture-for-the-cloudera-private-cloud-base-data-platform/

https://lenovopress.lenovo.com/lp1458.pdf

https://portal.nutanix.com/page/documents/solutions/details?targetId=RA-2078-Cloudera-with-Nutanix:RA-2078-Cloudera-with-Nutanix 

https://core.vmware.com/resource/cloudera-data-platform-vmware-cloud-foundation-powered-vmware-vsan#_Toc58429710

https://docs.oracle.com/en/solutions/best-practices-cloudera-on-oci/index.html#GUID-1EC73133-4D4A-4CE4-BE56-135EC7C6E7EE

https://community.arubanetworks.com/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=65f6ba44-903e-b754-2a00-caafcf0511b0&forceDialog=0

https://lenovopress.lenovo.com/lp1458.pdf#page=23&zoom=100,80,640

https://infohub.delltechnologies.com/l/white-paper-data-management-with-cloudera-data-platform-on-intel-powered-dell-emc-infrastructure/cdp-private-cloud-base-components-1

https://my.clouderacn.cn/knowledge-hub.html 

 https://www.ibm.com/docs/en/spectrum-scale-bda?topic=base-overview

 

 

 

 

 

 


Other Popular Courses