Friday, March 30, 2018

Different locations of map reduce logs


HDP:

sudo -u hdfs hadoop fs -ls /app-logs/hdfs/logs

(7 days)

CDH5:

sudo -u hdfs hadoop fs -ls /tmp/logs/hdfs/logs/

(7 days)

EMR (AWS):

sudo -u hdfs hadoop fs -ls /var/logs/hdfs/logs/

(? days)

Default value for "yarn.log-aggregation.retain-seconds" is 7 days. 

If log aggregation is not enabled, the logs are in local file system.  For example, MapR put logs into following places:

/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/

Thursday, March 29, 2018

SPARK_HOME, etc.

export SPARK_HOME==/usr/hdp/2.6.3.0-235/spark2




JavaOptions=-Dhdp.version=2.6.3.0-235 -Dspark.driver.extraJavaOptions=-Dhdp.version=2.6.3.0-235 -Dspark.yarn.am.extraJavaOptions=-Dhdp.version=2.6.3.0-235


More info:
https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java





Thursday, March 22, 2018

Additional steps for Ranger/Kerberos enabled Hadoop

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/ch06s01s01s01.html


Add values for the following properties in the "Custom kms-site" section. These properties allow the specified system users (hive, oozie, the user we are using and others) to proxy on behalf of other users when communicating with Ranger KMS. This helps individual services (such as Hive) use their own keytabs, but retain the ability to access Ranger KMS as the end user (use access policies associated with the end user).
  • hadoop.kms.proxyuser.{hadoop-user}.users
  • hadoop.kms.proxyuser.{hadoop-user}.groups
  • hadoop.kms.proxyuser.{hadoop-user}.hosts