No Valid Credentials Provided Error

Upon Enabling Kerberos in Ambari, some components started (Name Node), but other components that are are failing, such as MapReduce and Hive.

Here is an example of the error output when we try to start these services.

Fail: Execution of 'hadoop fs -mkdir `rpm -q hadoop | grep -q "hadoop-1" || echo "-p"` /app-logs /mapred /mapred/system /mr-history/tmp /mr-history/done && hadoop fs -chmod -R 777 /app-logs && hadoop fs -chmod 777 /mr-history/tmp && hadoop fs -chmod 1777 /mr-history/done && hadoop fs -chown mapred /mapred && hadoop fs -chown hdfs /mapred/system && hadoop fs -chownyarn:hadoop/app-logs && hadoop fs -chownmapred:hadoop/mr-history/tmp /mr-history/done' returned 1. mesg: ttyname: Invalid argument 15/04/28 16:12:33 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Resolution

  1. Can you do a kinit using hdfs service principal e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM
  2. After the kinit, do a klist and ensure that the expired and renewed date are not the same as the ticketed date.
  3. If the Renew date is in the past or the same as the Ticketed date execute a kinit -R.
  4. Try a hadoop fs -ls command. If successful, try to Restart services in Ambari
  5. If you services do not restart continue below.
  6. Find where your hadoop-env.sh file is located, usually is cd /etc/hadoop/conf.empty
    find / “name=hadoop-env.sh"
  7. Edit hadoop-env.sh ( vi hadoop-env.sh). Add debug param sun.security.krb5.debug=true to HADOOP_OPTS variable, that is,
     export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}”
  8. Try a kinit as hdfs, then hadoop fs -ls command. Look at the debug statements produced. If it says Keytype =18, then the error is due to wrong JCE policy files.
  9. This can happen when you have AES256 encryption enabled an you recently upgraded java. Upgrading java will overwrite the JCE policy files which include support for AES256 encryption. To fix this simply re-install your JCE policy jars back into "/usr/java/default/jre/lib/security/“ or the JAVA_HOME in your hadoop-env.sh file on each node.
  10. Get the right JCE files: For JDK 8 use http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html. For JDK 7 use http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html

Note:

Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets.

----------------------------------------------------------------

 

Unknown Password or Unable to Obtain Password for User Error Upon Restarting Hadoop Services

Assuming you can kinit successfully, and can see the ticket cached in the klist, this error occurs for one of several reasons:

  1. IP address in /etc/hosts and IP address for hostname are different
  2. The kerberos prinicipal setting in hdfs-site.xml is wrong. Verify dfs.namenode.kerberos and dfs.datanode.kerberos properties.
  3. Wrong file ownerships and/or permissions on /etc/security/keytabs directory. The problem is that the keytabs were created and owned by local hdfs, hbase, and ambari-qa owners. However these uids are different from the uids of the corresponding Active Directory Users. The files need to be owned by the Active Directories UIDs.

Resolution

  1. Archive and Clear out all logs. For Teradata these are in /var/opt/teradata/log/hadoop/hdfs and /var/opt/teradata/log/hbase. Normally logs are located in /var/log/hadoop-hdfs. The reason being is that the logs would have been created using the local UIDs which would create a problem.
  2. Perform a ls -l on the /etc/security/keytabs directory. Make note of which keytabs are owned by hdfs, hbase, and ambari-qa
  3. Then perform ls -n on /etc/security/keytabs. Make note of the UIDs for hdfs, hbase and ambari-qa.
  4. Take a look at the /etc/passwd file also and note the UIDs for hdfs, hbase, and ambari-qa.
  5. Next perform a touch on a test file. Name it testuid.
  6. Perform a chown hdfs testuid. Note the uid. Do a chown for hbase, and ambari0qa. It would be different from the ones found in /etc/security/keytabs. These are the AD UIDs.
  7. Go back to /etc/security/keytabs
  8. Perform a chown <AD-UID> <keytab>, that is, use the new AD UID found for each of hdfs, hbase, and ambari-qa.
  9. Then perform ls -n on /etc/security/keytabs. Make sure the new AD UIDs for hdfs, hbase and ambari-qa are reflected in the keytabs.
  10. Ensure that your kinits work for hbase, hdfs and amabri-qa e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM

Further Resolution

  1. If the services and components do not restart you need to change the permissions on all files owned by hdfs, hbase and ambari-qa on the ENTIRE cluster.
  2. Modify and run this script(changing the appropriate uids for hdfs, hbase and ambari-qa). Be careful. It changes many files throughout the cluster.

----------------------------------------------------------------

Second Instances of WebHCat and Oozie fails after Kerberos is enabled

Failures occur when two WebHCat servers or two Oozie servers is deployed with Kerberos....

The issue occurs when in Ambari, you use _HOST as the domain name in the WebHCat and Oozie configs for principals since they DO NOT get substituted appropriately when each service starts. An of this example would be using HTTP/_HOST@EXAMPLE.COM or oozie/_HOST@EXAMPLE.COM as principals. Normally this is appropriate since if WebHcat server runs on node 1, this should translate to HTTP/node1.example.com@EXAMPLE.COM or on node 2 to HTTP/node2.example.com@EXAMPLE.COM. Unfortunately this is a bug, as the substitution does not occur.

You would need to go directly to the second instance of each server and manually edit the webhcat-site.xml or oozie-site.xml file with the second nodes principals for spengo and oozie respectfully, that is HTTP/node2.example.com@EXAMPLE.COM and oozie/node2.example.com@EXAMPLE.COM

Unfortunately if you restart or make any changes in Ambari after that, it would push the wrong configurations to the second instances of each. Since you cannot use _HOST you are forced to use node 1 principals which do not work for node 2.  Thus it would overwrite the fixes made to get this resolved on second host. Be mindful of this upon restarts by Ambari. Always save your own versions of webhcat-site.xml and oozie-site.xml.

Resolution

WebHcat

  1. WebHcat can only have one value for templeton.kerberos.principal in custom webhcat-site.xml
  2. Normally you would have the _HOST as the domain name in the principal. WebHcat does not resolve _HOST. In Ambari, set the templeton.kerberos.principal to be HTTP/node1.example.com@EXAMPLE.COM, and restart WebHcat.
  3. Log onto node 2 where the second WebHcat server is running and perform the following
    1. su hcat
    2. edit webhcat-site.xml located in /etc/hive-webhcat/conf
    3. Change all principal names from node 1 to node 2
    4. export HADOOP_HOME=/usr
    5. /usr/lib/hive-catablog/sbin/webhcat-server.sh stop
    6. /usr/lib/hive-catablog/sbin/webhcat-server.sh start

Oozie

  1. Oozie can only have one value for the principals in custom oozie-site.xml for properties oozie.authentication.kerberos.principal and oozie.service.HadoopAccessorService.kerberos.principal HTTP/node1.example.com@EXAMPLE.COM and oozie.service.HadoopAccessorService.kerberos.principaloozie/node1.example.com@EXAMPLE.COM, and restart oozie.
  2. Log onto node 2 where the second Oozie server is running and perform the following
    1. su oozie
    2. edit oozie-site.xml located in /etc/oozie/conf
    3. Change all principal names from node 1 to node 2
    4. export HADOOP_HOME=/usr
    5. /usr/lib/oozie/bin/oozied.shstop
    6. /usr/lib/oozie/bin/oozied.sh start

----------------------------------------------------------------

After Enabling Hue for Kerberos and LDAP, the File Browser Errors out

When you log into Hue with an AD account (after configuring for LDAP) you receive the following error:

2015-05-06 09:50:25,698  INFO [][hue:] GETFILESTATUS Proxy user [hue] DoAs user [admin]
2015-05-06 09:50:25,712  WARN [][hue:] GETFILESTATUS FAILED [GET:/v1/user/admin] response [Internal Server Error] SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]

Resolution

  1. With NameNode HA, HTTPFS needs to be configured. If there is no NameNode HA, WebHDFS needs to be configured.
  2. You need to configure hadoop-httpfs to use kerberos. Make changes in httpfs-site.xml on the Hue box to change from simple authentication to kerberos
  3. Edit the/etc/hadoop-httpfs/conf.empty/httpfs-site.xmlfile onHue Node

    </property>

    <property> <name>httpfs.hadoop.authentication.type</name> <value>simple</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.principal</name> <value>httpfs/huenode.EXAMPLE.com@EXAMPLE.COM</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/httpfs.service.keytab</value> </property> <property> <name>httpfs.authentication.kerberos.name.rules</name> <value>RULE:[rm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[dn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[oozie@.*EXAMPLE.COM)s/.*/oozie/ RULE:[jhs@.*EXAMPLE.COM)s/.*/mapred/

    DEFAULT</value> </property> </configuration>

  4. And then restart Hadoop-httpfs…. It appears that we need a keytab for httpfs however.

    ----------------------------------------------------------------

    Where Can I find the commands that Ambari runs for Kerberos

  1. what commands Ambari runs to add the key tabs for AD option and it is no where to be found in the logs... ?

    Answer: ktadd in /var/lib/ambari-server/resources/common-services/KERBEROS/package/scripts/kerberos_common.py -> function create_keytab_file

----------------------------------------------------------------

Help. I have long running jobs and my Tokens are expiring leading to Job Failures

Possible Resolution Steps

  1. First stop – NTP. Do a pdsh and reset and restart ntp service on all nodes.
  2. Check the JDK. Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets.
  3. Change the max renewable life and ticket life time
    > kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@<REALM.COM>
    
    >klist
    Check for the expiration of krbtgt/<REALM.COM>@<REALM.COM> principal.  Is it Seven days or one day?
    
    Look at max_renewable_life in /var/kerberos/krb5kdc/kdc.conf. Is it 7d? 14d?  Is it different from the krbtgt/<REALM.COM>@<REALM.COM> expiration length
    
    Change max_renewable_life in /var/kerberos/krb5kdc/kdc.conf to 14d
    
    Change the principal krbtgt/<REALM.COM>@<REALM.COM> maxrenewlife to renew after the same time as max_renewable_life
    If it is MIT kerberos you would have to use kadmin(https://web.mit.edu/kerberos/krb5-1.12/doc/admin/admin_commands/kadmin_local.html and https://blog.godatadriven.com/kerberos_kdc_install.html )  If it is AD as the administrator for the commands
    Kadmin –p admin
    Then kadmin: modprinc -maxrenewlife "7 days"  krbtgt/<REALM.COM>@<REALM.COM> 
    What about ticket_lifetime in vi /etc/krb5.conf? Is there a renew_lifetime?  max_life?  
    
    You can change it to be more than 24h and 
    
    restart krb5kdc  service
  4. Double check the chron job. You can find examples to compare vis Google (e.g. http://wiki.grid.auth.gr/wiki/bin/view/Groups/ALL/HowToAutomaticallyRenewKerberosTicketsAndAFSTokens...
 

 


Other Popular Courses