CDF is a real time Data Streaming Platform. This system is used to receive data and in real-time provide actionable insights and analytics in real-time. It has primarily three components
- Cloudera Edge Management
- This is focused on Edge Data Collection, Edge Processing and provide Edge intelligence.
- This are needed where your Edge Devices are IOT devices, Web Streams or Click Streams etc.
- You have 1000's of end points and trying to get the data from those endpoints. Even you want to process data at the edge itself, so that you dont transfer all the data over the wire. Edge intelligence helps processing data at the edge and collect/transfer only required data on wire.
- MiNiFi works as data collection engine on the Edge.
- Cloudera Flow Management
- Now you need to collect all this data and that is where this NiFi or Flow Management comes into picture.
- It helps in handling excessive amount of data like TB's of data in a mater of Hour.
- NiFi works as Data Ingestion engine.
- Stream Processing and Analytics
- This is where we are providing real-time actionable data intelligence.
- You need to streamline all this data into Data Lake or Hadoop or Non-Hadoop storage.
- Stream engine like Kafka comes in handy for that.
- We have different kind of Stream analytics engine you can pick and choose and look at this real-time data like
- Apache Flink
- Spark Streaming
- Kafka Streams
- As this flow through above components, you can provide real-time analytics.
- This is about Cloudera Data Flow.
- Cloudera also moving this capability to Cloudera Data Platform (CDP) as well.
- You should know primarily how NiFi and Kafka can be enabled on Cloudera Data Hub.