Question-10: Which of the following data or resources which can be configured for redacting sensitive data?
- Log & Query Redaction
- YARN MapReduce Job Properties
- Spark Event Logs
- Spark Executor logs
Answer: 1,2,3,4
Exp: Removing Sensitive information from Metrics
- It is possible that applications you are running using Hive, Impala, MapReduce, Spark or Oozie have some sensitive information in Diagnostic Data.
- So, it is necessary to configure the redaction.
- It is recommended that even if you are not sending metrics or diagnostic data to Telemetry Publisher.
- Job configurations of logs can have sensitive information and that needs to redacted.
- Following are the list of data and resources which can be configured to redacting sensitive data before sending it to Telemetry publisher.
- Log & query redaction: You have to create regular expression for filtering out the data. This needs to be done on the query and logs which are collected by Telemetry Publisher.
- YARN MapReduce Job properties: As you know, Telemetry publisher pull job configuration data from the HDFS. Hence, before storing job configuration information in HDFS, you have to redact sensitive information.
- Spark Event logs & Spark executor logs: Again, this can be filtered using regular expression for Spark2 jobs only. This can filter both event and executor logs.
- Be default this is enabled. However, you can override by using safety valves in Cloudera Manager or in the Spark application itself.