Question-10: Which of the following data or resources which can be configured for redacting sensitive data?

  1. Log & Query Redaction
  2. YARN MapReduce Job Properties
  3. Spark Event Logs
  4. Spark Executor logs

Answer: 1,2,3,4

Exp: Removing Sensitive information from Metrics

  • It is possible that applications you are running using Hive, Impala, MapReduce, Spark or Oozie have some sensitive information in Diagnostic Data.
  • So, it is necessary to configure the redaction.
  • It is recommended that even if you are not sending metrics or diagnostic data to Telemetry Publisher.
  • Job configurations of logs can have sensitive information and that needs to redacted.
  • Following are the list of data and resources which can be configured to redacting sensitive data before sending it to Telemetry publisher.
    • Log & query redaction: You have to create regular expression for filtering out the data. This needs to be done on the query and logs which are collected by Telemetry Publisher.
    • YARN MapReduce Job properties: As you know, Telemetry publisher pull job configuration data from the HDFS. Hence, before storing job configuration information in HDFS, you have to redact sensitive information.
    • Spark Event logs & Spark executor logs: Again, this can be filtered using regular expression for Spark2 jobs only. This can filter both event and executor logs.
      • Be default this is enabled. However, you can override by using safety valves in Cloudera Manager or in the Spark application itself.

Other Popular Courses