Question-8: Which of the following types of data collected by Telemetry Publisher to send data to Workload XM?

  1. YARN MapReduce
  2. Spark
  3. Oozie
  4. Hive Queries
  5. Impala Queries

Answer: 1,2,3,4,5

Exp:

Telemetry Publisher Service collects and sends the following metrics data to Workload XM.

  1. YARN MapReduce Jobs: As you know all the MapReduce jobs are executed by YARN, are saved in YARN Job History Server. So, Telemetry Publisher Pulls the configuration and jhist file.
    • Jhist:
      • This is a Job History File.
      • Contains the job and task counters.
      • It is created in HDFS.
    • MapReduce Task logs:
      • MapReduce task logs are also created in HDFS.
      • Be default it is disabled to fetch MapReduce Task logs.
      • To send MapReduce Task logs, you need to enable it then only it can send data to Workload XM.
  1. Spark Applications: Again, in this case as well Telemetry Publisher has to Polls
    • Applications which are completed, stored in Spark History Server.
    • Event logs for Spark can also be stored in HDFS.
    • Telemetry Publisher collects event logs from HDFS and send them back to Workload XM.
    • By default, for Spark Application Data Collection is not enabled.
  2. Oozie Workflow:
    • Similarly, Telemetry Publisher polls Oozie servers for recently completed Oozie workflows and send the metrics data to Workload XM.
  3. Hive Queries:
    • HIveServer2 creates a query detail file after query completed.
    • Cloudera Manager Agent periodically searches for the query details files.
    • Agent send these files to Telemetry Publisher.
    • To get these query files Hive Query Audits must be enabled.
  4. Impala Queries:
    • Impala creates Query profiles for recently completed queries.
    • Cloudera Manager Agent sends this query profile to Telemetry Publisher.