During the process of Express Upgrade for a cluster that was deployed using Blueprint configured to be a HDFS HA, for a cluster with versions HDP 2.4.2 and Ambari 2.2.2, the NN failed to restart as shown below.

13debf893a605e8a88df18a7d8d214f571e05289; compiled by 'jenkins' on 2016-04-25T05:46Z\nSTARTUP_MSG: java = 1.8.0_60\n************************************************************/\n16/09/14 18:40:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]\n16/09/14 18:40:33 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -nonInteractive]\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:35 INFO ipc.Client: Retrying connect to server: <IP ADDRESS>:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)

RCA:

Problem is that during a Blueprint deployment by Cloudbreak, it keeps the following 2 configs in hadoop-env

dfs_ha_initial_namenode_active dfs_ha_initial_namenode_standby

When it's time to perform an EU/RU (or start NN in general), then it thinks that NameNode HA is still not complete. Blueprint with HA out of the box needs a step to delete these configs after the deployment is done.

BUG: https://issues.apache.org/jira/browse/AMBARI-18394

WORKAROUND:

Workaround is to remove configs "dfs_ha_initial_namenode_active" and "dfs_ha_initial_namenode_standby" using configs.sh during the middle of EU and retry the step to restart NN.


 


Other Popular Courses